Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Learning XSL, trouble with attributes 1

Status
Not open for further replies.

Alexfz

Programmer
Aug 2, 2007
8
US
Hi everyone,

I am just starting to learn XSL, but I am having a problem handling attributes in the example I am trying to work with. I have searched around for tutorials and such that may help me with this problem, but no luck so far.

I am using the XML file outputted by a data mining program called Lingpipe, and trying to style that data. Here is an example of the data that should give you a pretty good idea of what I'm working with:
Code:
<output>
 <s i="0">
  <ENAMEX TYPE="ORGANIZATION">Ford Motor Company</ENAMEX> is an American multinational corporation and the world's third largest automaker based on worldwide vehicle sales.
 </s> 
 <s i="1">
  In 2006, <ENAMEX TYPE="ORGANIZATION">Ford</ENAMEX> was the second-ranked automaker in the <ENAMEX TYPE="LOCATION">US</ENAMEX> with a 17.5% market share, behind <ENAMEX TYPE="ORGANIZATION">General Motors</ENAMEX> (24.6%) but ahead of <ENAMEX TYPE="ORGANIZATION">Toyota</ENAMEX> (15.4%) and <ENAMEX TYPE="ORGANIZATION">DaimlerChrysler</ENAMEX> (14.4%)[3].
 </s> 
 <s i="2">
  <ENAMEX TYPE="ORGANIZATION">Ford</ENAMEX> was also the seventh-ranked American-based company in the 2007 Fortune 500 list, based on global revenues of $160.1 billion <ENAMEX TYPE="LOCATION">[4].</ENAMEX>
 </s>
</output>

This generated page came from entering part of the wikipedia article on Ford into the program. It separates the facts into bullet points, and sifts out names, organizations, locations, etc, and marks each of those with a corresponding TYPE attribute on the ENAMEX element.

So far, I have been able to separate the bullet point numbers (the "i" attribute in the "s" element), and then put each fact below it's number.

However, I am trying to find a way to make all words in the ENAMEX elements a different color, depending on the TYPE attribute. For example, all locations could be brown, and all organizations could be blue.

I tried a test to see if I could make all of the organization names bold. I doubted it would work, and I was right. Here is what I have:
Code:
<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="[URL unfurl="true"]http://www.w3.org/1999/XSL/Transform">[/URL]

<xsl:template match="/">
 <html>
 <body>

<xsl:for-each select="output/s">

<p>
<b><xsl:value-of select="@i"/></b>
</p>
<p>
<xsl:value-of select="."/>
<xsl:for-each select="ENAMEX">
<xsl:if test="@TYPE='ORGANIZATION'">
 <b><xsl:value-of select="."/></b>
</xsl:if>
</xsl:for-each>
</p>

</xsl:for-each>

 </body>
 </html>
</xsl:template>
</xsl:stylesheet>

All this does is list all of the words tagged as organizations at the end of each bit of information, although it does at least bold them.

I cannot figure out how to make this work in the middle of the sentence. Any help would be greatly appreciated.

Thank you.
 
Have a look at thread426-1391362. My last post in that thread shows a recursive mechanism that deals with the interspersed text() and element nodes in a 'linear' fashion. The job in that thread was slightly different but the technique can be applied in this case as well. Have a look at that XSLT and see if it helps. I'll be happy to answer questions.

Tom Morrison
 
Thanks for the link.

I read through your posts, and did some searches for some parts in your code that I had never seen before. I would say that unless there is an easier solution, I'm over my head with this one. I should probably learn more about XSL and XML before trying to do this.

If you could give me some links to some tutorials that go beyond the basics (the sort of things that I had in my first post), I would really appreciate it. I have found tons of XSL tutorials on Google, but they all only cover what I have already learned.

Thanks again.
 
First of all, a stylesheet that does something close to what you requested:
Code:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="[URL unfurl="true"]http://www.w3.org/1999/XSL/Transform">[/URL]

<xsl:template match="/">
 <html>
 <body>
 <xsl:apply-templates/>
 </body>
 </html>
</xsl:template>

<xsl:template match="output">
<xsl:apply-templates select="s"/>
</xsl:template>

<xsl:template match="s">

<p>
<b><xsl:value-of select="@i"/></b>
</p>
<p>
<xsl:call-template name="formatEmbedded">
    <xsl:with-param name="nodeList" select="child::node()"/>
</xsl:call-template>
</p>
</xsl:template>

<xsl:template name="formatEmbedded">
    <xsl:param name="nodeList"/>

    <xsl:variable name="thisNode" select="$nodeList[1]"/>
    <xsl:variable name="remainingNodes" select="$nodeList[position() != 1]"/>

    <xsl:choose>
    <xsl:when test="$thisNode">
        <xsl:choose>
        <xsl:when test="local-name($thisNode) = 'ENAMEX'">
        <!-- process ENAMEX element -->
            <xsl:choose>
            <xsl:when test="$thisNode/@TYPE = 'ORGANIZATION'">
			<span style="color: blue"><xsl:value-of select="$thisNode/text()"/></span>
			</xsl:when>
            <xsl:when test="$thisNode/@TYPE = 'LOCATION'">
			<span style="color: brown"><xsl:value-of select="$thisNode/text()"/></span>
			</xsl:when>
            <xsl:otherwise>
			<span style="color: red"><xsl:value-of select="$thisNode/text()"/></span>
			</xsl:otherwise>
            </xsl:choose>
            <xsl:call-template name="formatEmbedded">
                <xsl:with-param name="nodeList" select="$remainingNodes"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
        <!-- process node that is not ENAMEX element -->
            <xsl:value-of select="$thisNode"/>
            <xsl:call-template name="formatEmbedded">
                <xsl:with-param name="nodeList" select="$remainingNodes"/>
            </xsl:call-template>
        </xsl:otherwise>
        </xsl:choose>
    </xsl:when>
    </xsl:choose>
</xsl:template>
</xsl:stylesheet>
{code Output]<html><body><p><b>0</b></p><p>
<span style="color: blue">Ford Motor Company</span>is an American multinational corporation and the world's third largest automaker based on worldwide vehicle sales.</p><p><b>1</b></p><p>In 2006, <span style="color: blue">Ford</span> was the second-ranked automaker in the <span style="color: brown">US</span> with a 17.5% market share, behind <span style="color: blue">General Motors</span> (24.6%) but ahead of <span style="color: blue">Toyota</span> (15.4%) and <span style="color: blue">DaimlerChrysler</span> (14.4%)[3].</p><p><b>2</b></p><p>
<span style="color: blue">Ford</span>was also the seventh-ranked American-based company in the 2007 Fortune 500 list, based on global revenues of $160.1 billion <span style="color: brown">[4].</span></p></body></html>[/code]

You can find nine practical examples of recursion here.



Tom Morrison
 
Oops. Flubbed the
Code:
 tag on the output.  Sorry.  [blush]

Further in re "searches for some parts in your code" it really helps to use an XML tool to look at your input document, since it is not the 'normal' neat tree structure most examples use.   Your <s> elements each have several child nodes, namely the <ENAMEX> elements and a [i]text() [/i]node for each of the text fragments between (but not included in) the <ENAMEX> elements (the text inside an <ENAMEX> element is in a text node which is a child of that <ENAMEX> element).

If you haven't used a tool such as Stylus Studio (my preference) or XML Spy, or one of the others, you should do so.  (Those two have free eval periods.)

Tom Morrison
[URL unfurl="true"]www.liant.com[/URL]
 
Wow, thanks a lot for your help, I really appreciate it.

If you wouldn't mind, I would like to know exactly what these lines do:
Code:
 <xsl:variable name="thisNode" select="$nodeList[1]"/>
    <xsl:variable name="remainingNodes" select="$nodeList[position() != 1]"/>

I can guess by the variable names that $nodeList[1] refers to the current node, and $nodeList[position() != 1] refers to all nodes but the current. What I cannot understand is how the complete list of nodes was grabbed and stored in the $nodeList variable. I see that you gave the nodeList param a value of some child node, but I cannot figure out which node would be used, or how this ended up leading to the storage of all nodes in the nodeList param.

If you can help me with this, I should be able to understand what you wrote much better. I'm trying to translate each line in what you wrote into english, and trying to make sense of it that way.

I tried searching around on the net again before I posted this, but still all of the tutorials I find seem to never go beyond the absolute basics. They are either to beginner or to advanced. The link you posted seems to be having some server problems right now, so I can't check it out yet.

Maybe a book would be what I need to keep at this? I'm really interested in continuing.

I'm also downloading Stylus Studio right now, I'll give it a try.

Thanks again, you've been really helpful.
 
Flattery will get you somewhere! [bigsmile]

Let's start with this:
Code:
<xsl:call-template name="formatEmbedded">
    <xsl:with-param name="nodeList" select="child::node()"/>
</xsl:call-template>
This is in a template that matches the <s> element, so the reference context for the XPath expression is an <s> element. So, we are invoking a template by name (somewhat similar to calling a subprogram in a procedural language), and passing it a parameter named $nodeList the value of which is a node set containing all the child nodes of the <s> context element, in document order.
Code:
<xsl:variable name="thisNode" select="$nodeList[1]"/>
<xsl:variable name="remainingNodes" select="$nodeList[position() != 1]"/>
Remember that the template is called with a node set in $nodeList, so these two lines create two local variables for ease of reference and use. The first takes the node in position 1 ([1] is shorthand for [position()=1] and places it in $thisNode. The second takes all but the first node and places that node set into $remainingNodes for use on the recursive call to this template (using classic tail recursion).
Code:
<xsl:when test="$thisNode">
This test will become false when the template is called with an empty node set in $nodeList, thereby stopping the recursive calls.
I'm also downloading Stylus Studio right now, I'll give it a try.
When you get it d/l, load the ENAMEX xml document, then click on the 'Tree' tab at the bottom. Expand the <s> elements and you will see all the nodes named 'ENAMEX' and '#text' immediately subordinate to <s>. That is the node set passed to the original call-template.

Tom Morrison
 
Haha, thanks much again.

I got Stylus Studio going, and the tree view actually helped me understand a lot of it. Much better than sifting through a bunch of stuff in Notepad.

I got everything working now, and I've got a decent understanding of how it all works and comes together.

I guess that my last question would be if you knew a good way or place to keep reading up on this, I've still had no luck finding good resources :(



 
A good reference (not as good tutorial, though it does have tutorial material) is XSLT, published by O'Reilly. I also use Michael Kay's XSLT as a reference (it is the reference). I have not found a really good tutorial book. The examples with Stylus Studio can also be helpful.

...and Tek-Tips Forums!

Tom Morrison
 
Just wanted to stop by and say a final thanks for the help. I'll definitely check out the books and the examples in Stylus.

Time to get back to work...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top