Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Different type of xml 1

Status
Not open for further replies.

elucid

Programmer
Jan 6, 2009
9
0
0
RO
Hello everyone

I have a pretty hard question. I have a little wrong xml file and I have to interrogate it or correct it. I can't do it manually so I could use some help into one of the options.

it looks like this
<root>
<rot>
<literal>eat
<sense>3</sense>
</literal>
</rot>
........//multiple rots like this
</root>

What I want to do is either correct the xml with something like:
<literal l="eat"><sense>3</sense></literal>

or just interrogate the "bad xml and extract the words like "eat" from there. I tried in C# with Linq to use Contains. But it extracts all words containing eat from the xml and there are alot.
I mention the xml has 9000 entries now("rot"/s) and I want to add more to it...it will have @40k entries.

Can you help?
 
You can run an xsl such as this to put it in the desired structure.
[tt]
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="[ignore][/ignore]">
<xsl:eek:utput method="xml" indent="yes" encoding="utf-8" />

<xsl:template match="/">
<xsl:apply-templates select="*" />
</xsl:template>
<xsl:template match="*|@*|comment|processing-instruction()|text()">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>

<xsl:template match="literal">
<xsl:copy>
<xsl:attribute name="l">
<xsl:value-of select="normalize-space(text()[1])" />
</xsl:attribute>
<xsl:apply-templates select="*|@*|comment()|processing-instruction()|text()[position() &gt; 1]" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>[/tt]
 
I did xsl verifications like 2 years ago. And I did it directly on web. I dont know how can I run that now. I am doing a program in c# that uses that new xml . Can I get a hint how could I do that?
 
I think I didnt make myself clear. I want to know how to use the xsl you made for me. I created an c# program that applies the xsl transformation to my xml but the result is the same. And I don't understand the commands you put un select, and match.
 
And I come with an update again cause I dont see the edit button anywhere.
<root>
<SYNSET><ID>ENG20-00004609-n</ID><POS>n</POS><SYNONYM><LITERAL>via??<SENSE>1</SENSE></LITERAL></SYNONYM><DEF>forme de via??, v?zute în mod global; "Nu exist? via?? pe Marte"</DEF><STAMP>Dan Cristea</STAMP><BCS>1</BCS><ILR>ENG20-00003009-n<TYPE>hypernym</TYPE></ILR><DOMAIN>biology</DOMAIN><SUMO>Organism<TYPE>=</TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-00004824-n</ID><POS>n</POS><SYNONYM><LITERAL>celul?<SENSE>1</SENSE></LITERAL></SYNONYM><DEF>Element constitutiv fundamental al organismelor vii, alc?tuit din membran?, citoplasm? ?i nucleu, reprezentând cea mai simpl? unitate anatomic?.</DEF><STAMP>Dan Cristea</STAMP><BCS>1</BCS><ILR>ENG20-00003009-n<TYPE>hypernym</TYPE></ILR><ILR>ENG20-00003226-n<TYPE>holo_part</TYPE></ILR><ILR>ENG20-05681603-n<TYPE>category_domain</TYPE></ILR><DOMAIN>biology</DOMAIN><SUMO>Cell<TYPE>=</TYPE></SUMO></SYNSET>
</root>

These are 2 lines of my xml. what do I have to modify in your xsl. I saw it works for that little part of the sample xml :) . Many thanx! you saved me lots of trouble.
 
I did the transformation of the little text..but I cant find some indication of what ++++ match="*|@*|comment|processing-instruction()|text()" ++++ is meaning . I searched w3schools and nothing. I must do just some minor changes to that..but I dont know what it means. I furthermore have to extract the value of the literal name and compare to other in another xml and then extract the definitions and lematize them. I did that last part but using my old method with contains in c# which found not just 1 entry as it should but all entries containing that word in them (like 25 if I remember exactly).
The xml without all other parts looks like this.I would appreciate even if you'd tell me how I can find what you meant in those locations *|@*|....|text()|..

So:
<root>
<SYNSET>
<SYNONYM>
<LITERAL>eat
<SENSE>3
</SENSE>
</LITERAL>
</SYNONYM>
</SYNSET>
<SYNSET>
....
</SYNSET>
</root>
 
If you want to change all those with mixed content model the same way (the leading mixed content becoming attribute l), then change this line and that's all.
>[tt] <xsl:template match="literal">[/tt]
[tt] <xsl:template match="[blue]LITERAL|ILR|SUMO[/blue]">[/tt]
 
I didn't mean that. I still want to repair the xml file that way. The idea is that there is another path for "literal". And I guess this is why it doesnt work the path is root/SYNSET/SYNONYM/LITERAL. if I put simple LITERAL in match nothing happens.
 
>if I put simple LITERAL in match nothing happens.
No. That's due to something else, such as namespace. But you keep out-smarting the forum.
 
I got it. I see the xsl isn't working if it is written with big letters "LITERAL" only works for lower ones "literal" and I wonder why. And don't be mean to me, I asked nicely and I explained my problem and other details.
 
But you keep say some thing non-sense. Why on earth it won't work if literal is in upper-case? If it is in upper-case in the xml document, it should be in upper-case. It is a matter of fact, not speculation.
 
I am speechless. You form an opinion on anything and speak it aloud. I would learn before doing that.
 
Come on I need some help. Besides the fact that xsl can't see diacritics it doesn't even see upper care letters. Need more assistance here. Thank you.
 
>[tt] <xsl:template match="literal">[/tt]
[tt] <xsl:template match="literal[blue]|LITERAL[/blue]">[/tt]
 
I see the xsl isn't working if it is written with big letters "LITERAL" only works for lower ones "literal" and I wonder why.
Then this should be a good lesson for you (and everyone else.) XML is CASE SENSITIVE. The tags <literal> and <LITERAL> signify different elements.

If you kludge your code to try to make them the same, you'll have your work cut out for you forever. Work with the technology, not against it.
 
Besides the fact that xsl can't see diacritics it doesn't even see upper care letters.
Sure XSL can handle diacritics. Just do a web search for "XSL unicode diacritics" and you will find lots of information.

The XML recommendations explicitly says XML supports Unicode/ISO 10646. All XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646.

ISO 10646 is an international standard which covers encodings for most known languages and includes diacritics.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top