Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove duplicate elements 1

Status
Not open for further replies.

InvaderM

Programmer
Nov 25, 2009
3
US
I'm trying to remove duplicate elements from my xml. I included a sample of it below.

The element I'm after is the target element "main > child > date > record > target". What I want to do is remove all occurrences of targets with the same content except the first time it occurs.

For example remove all the target elements with "A" leaving only the first.


I'm open to using xsl or php. But can anyone recommend how I should go about this?

<?xml version="1.0" encoding="UTF-8"?>
<main>
<name>John Doe</name>
<child year="2008">
<date date="Sat Jul 18">
<record>
<target>A</target>
</record>
</date>
<date date="Sun Jul 19">
<record>
<target>B</target>
</record>
<record>
<target>A</target>
</record>
</date>
</child>
<child year="2009">
<date date="Sat Jan 18">
<record>
<target>C</target>
</record>
</date>
<date date="Sun Jan 19">
<record>
<text>Some text</text>
<target>A</target>
</record>
</date>
</child>
</main>
 
May be you have not noticed that the intended "target" to be removed is still unclear, you might mean the target-tag itself, its parent record-tag, or its grand-parent date-tag, or...
 
It's just the target tags I want removed "<target>A</target>"

The parents should remain.
 
[1] The procedure using xslt is as follows.
[1.1] You can use xsl:key to do the grouping of target elements.
[1.2] When a template matching the target, check if it is the first in the document order. If yes process by copying it, else do nothing.
[1.3] The above are made in conjunction with an identity transformation. Only then, the template matching target is automatically overridden by the template in [1.2] due to priority rule.

[2] Concretely, it would look like this.
[tt]
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="[ignore][/ignore]">
<xsl:eek:utput method="xml" encoding="utf-8" indent="yes" />
<xsl:key name="gettargets" match="target" use="." />
[green]<!-- basically an identity transformation here -->
<xsl:template match="/">
<xsl:apply-templates select="*" />
</xsl:template>
<xsl:template match="node()|@*|comment()|processing-instruction()|text()">
<xsl:copy>
<xsl:apply-templates select="node()|@*|comment()|processing-instruction()|text()" />
</xsl:copy>
</xsl:template>[/green]
[blue]<!-- this is the functional template for the purpose -->
<xsl:template match="target">
<xsl:if test="count(.|key('gettargets',.)[1])=1">
<xsl:copy>
<xsl:apply-templates select="node()|@*|comment()|processing-instruction()|text()" />
</xsl:copy>
</xsl:if>
</xsl:template>[/blue]
</xsl:stylesheet>
[/tt]
 
You're a genius. That worked beautifully. Thank You tsuji!

I spent days trying to figure this out with no success. Let me know if there's anything I can do to return the favor.

Avaz
Web Developer
 
Don't mention it and glad you've got it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top