Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Adding content to elements from different parts of XML stream

Status
Not open for further replies.

darrellblackhawk

Programmer
Aug 30, 2002
846
US
Hello,

I have some XML documents that are organized in a unique way.

The parts that I need to parse out into elements are <p> paragraph elements and <media> image elements.

The problem is that the two different element types are in different parts of the XML.

The <p> paragraph elements come first; there might be up to 20 of them, and then the <img> elements come after.

The need is to step through the document, store the <p> paragraph elements first and then match them to the <img> image elements as the XSL continues stepping through the document while creating new elements which will be rendered by the browser.

Restating the above steps:
1- Test whether the element is a <p> paragraph or a <media> reference.
2- Since the <p> paragraphs will be located first, store them for later processing.
3- When elements of type <media> are discovered, create output elements; such as <div>'s with embedded tables, to be placed in the output result tree.

You'll also notice that there are two initial <p> paragraph elements in the example below, but they're not a problem to deal with (i.e. position()>2 or just matching on body.content and selecting <p> elements position()>1 which avoids the first <p> element in the <abstract> element).

The XML structure is NITF which is from the news industry; although I'm unsure if this particular file is a standard. Any identifying information has been scrubbed from the structure. As stated, I don't need to deal with any other elements, so the <head>, <docdata>, etc. will be removed from consideration by the element test(s).

The only issue I have is with saving the initial <p> paragraph elements and then including them in output elements when the <media> elements are processed.

Here's an example of the XML structure:

<nitf>
<head>
<docdata>
<doc-id id-string="xxy.DTL"/>
<date.issue norm="issue-code"/>
<date.release norm="release-code"/>
<doc.copyright year="2007" holder="Copyright holder"/>
</docdata>
<pubdata type="web" date.publication="publication-date" name="Publisher" ex-ref="xxx.DTL"/>
</head>
<body>
<body.head>
<headline>
<hl1>Heading</hl1>
</headline>
<byline>
<person>Wire service</person>
</byline>
<abstract>
<p>
First paragraph...
</p>
</abstract>
</body.head>
<body.content>
<p>
Wednesday, May 30, 2007
</p>
<p>
First paragraph...
</p>
<p>
Second paragraph...
</p>
<p>
Third paragraph...
</p>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image1.jpg" alternate-text="Image1 Alternate text...">
</media-reference>
<media-caption>
<B>Image1 caption head</B> Image1 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image2.jpg" alternate-text="Image2 Alternate text...">
</media-reference>
<media-caption>
<B>Image2 caption head</B> Image2 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image3.jpg" alternate-text="Image3 Alternate text...">
</media-reference>
<media-caption>
<B>Image3 caption head</B> Image3 caption text
</media-caption>
</media>
</body.content>
<body.end>
<tagline>
Copyright 2007
<a href=" . All rights reserved.
</tagline>
</body.end>
</body>
</nitf>


Any assistance is appreciated.

Darrell
 
Darrell said:
store the <p> paragraph elements first and then match them to the <img> image elements

First of all, I will presume that you mean <media> image elements.

I infer from your example document that the <media> element has the same parent as the <p> element for each <p> element of interest. Is this inference correct? If so, then you can easily find the <media> element if you know the parent of the <media> element, and vice versa. If you need to retain this information (rather than discover it on the fly) the generate-id() function is a convenient way to retain a unique identity of a node, or assert the unique identity of a node in an XPath predicate expression.

Give it a try, and come back to us for additional assistance.

Tom Morrison
 
Thanks for the quick response.

Unfortunately, I'm new to XSLT, so I need a more specific code example.

I'm fairly astute, but I'm still getting my head around this paradigm.

Thanks.

p.s. time to by a book :)
 
I guess what I need is the method to store the paragraph elements and then mesh them with the image elements when they're encountered further down in the document.

Yes the <p> and <media>(image) elements have the same parent.

Darrell
 
I figured it out. May not be the best way, but it works.
I'll post the resolution when I'm finished getting this out the door.

Darrell
 
Darrell,

Glad you have something working.

If you are processing a <media> element, then, from the context of the <media> element, the XPath expression to the text content of the first <p> element (in document order) having the same parent would be
Code:
../p[1]/text()
The paragraph elements are already 'stored' so all you need to do is address them using XPath.

Tom Morrison
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top