Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove duplicates from XML

Status
Not open for further replies.

sallcock

Technical User
Sep 21, 2007
5
GB
I have an XML file that has duplicate record in it. In SQL Server I would do a SELECT DISTINCT. How can I achieve this with XML.

For example

<Pupils>
<Pupil pupilid="20">
<Name>Test test</Name>
</Pupil>
<Pupil pupilid="19">
<Name>Test Pupil</Name>
</Pupil>
<Pupil pupilid="20">
<Name>Test test</Name>
</Pupil>
<Pupil pupilid="20">
<Name>Test test</Name>
</Pupil>
</Pupils>

How would I return distinct <Pupil> records using the pupilid?
 
This is a form of grouping. For more information about grouping see faq426-6585. If your goal is to produce a set of distinct pupils based upon pupilid, then the inner xsl:for-each loop in the example may be eliminated.

Give it a try...

Tom Morrison
 
Thankyou for your reply. I have looked into this method and I have a few questions. The example matches by an element where I want to match by attribute (pupilid), how would I do that do I just need the @ infront?

Also I want to return XML that is exactly the same as the original minus the duplicates, how can this be achieved?

Many Thanks
 
If you do not mind the penalty of being O(n^2) operation (n being the number of Pupil tags involved), it is done like this.
[tt]
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="<xsl:eek:utput method="xml" version="1.0" encoding="utf-8" indent="yes" />
<xsl:template match="/">
<xsl:apply-templates select="*" />
</xsl:template>
<xsl:template match="Pupils">
<xsl:copy>
<xsl:apply-templates select="*|@*" />
</xsl:copy>
</xsl:template>
<xsl:template match="@*">
<xsl:copy-of select="." />
</xsl:template>
<xsl:template match="Pupil">
<xsl:if test="count(preceding-sibling::pupil[@pupilid=current()/@pupilid]) = 0">
<xsl:copy-of select="." />
</xsl:if>
</xsl:template>
</xsl:stylesheet>
[/tt]
On conceding the penality, the gain of it is memory intensive to a lesser degree than grouping using key. And I will leave Tom to follow up on the method he suggested.
 
Thanks for your reply. Both methods work fine but what i'm having trouble with is getting the output to have all my xml tags etc so rather than

<Pupils>
<Pupil pupilid="20">
<Name>Test test</Name> ....

i'm just getting 20testtest

am i missing something obvious?
 
Is that what appeared in the text file? or what you see from some browser... In the text file, it shouldn't.
 
that is how it appears in a browser but if i do view source i will see the original xml. how can i get the text version that had the duplicates removed?
 
Browsers use their default stylesheet to view .xml. Use ie or ff to view it and you should be fine.
 
i am viewing it in ie and it is just coming back all together without the tags.
 
Then it cannot be answered without you specify what the application does, how the transformation output and piped to... it does not related to the transformation itself.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top