Adding content to elements from different parts of XML stream

darrellblackhawk · May 30, 2007

Hello,

I have some XML documents that are organized in a unique way.

The parts that I need to parse out into elements are paragraph elements and <media> image elements.

The problem is that the two different element types are in different parts of the XML.

The paragraph elements come first; there might be up to 20 of them, and then the <img> elements come after.

The need is to step through the document, store the paragraph elements first and then match them to the <img> image elements as the XSL continues stepping through the document while creating new elements which will be rendered by the browser.

Restating the above steps:
1- Test whether the element is a paragraph or a <media> reference.
2- Since the paragraphs will be located first, store them for later processing.
3- When elements of type <media> are discovered, create output elements; such as <div>'s with embedded tables, to be placed in the output result tree.

You'll also notice that there are two initial paragraph elements in the example below, but they're not a problem to deal with (i.e. position()>2 or just matching on body.content and selecting elements position()>1 which avoids the first element in the <abstract> element).

The XML structure is NITF which is from the news industry; although I'm unsure if this particular file is a standard. Any identifying information has been scrubbed from the structure. As stated, I don't need to deal with any other elements, so the <head>, <docdata>, etc. will be removed from consideration by the element test(s).

The only issue I have is with saving the initial paragraph elements and then including them in output elements when the <media> elements are processed.

Here's an example of the XML structure:

<nitf>
<head>
<docdata>
<doc-id id-string="xxy.DTL"/>
<date.issue norm="issue-code"/>
<date.release norm="release-code"/>
<doc.copyright year="2007" holder="Copyright holder"/>
</docdata>
<pubdata type="web" date.publication="publication-date" name="Publisher" ex-ref="xxx.DTL"/>
</head>
<body>
<body.head>
<headline>
<hl1>Heading</hl1>
</headline>
<byline>
<person>Wire service</person>
</byline>
<abstract>

First paragraph...

</abstract>
</body.head>
<body.content>

Wednesday, May 30, 2007


First paragraph...


Second paragraph...


Third paragraph...

<media media-type="image">
<media-reference mime-type="image/jpeg" source="image1.jpg" alternate-text="Image1 Alternate text...">
</media-reference>
<media-caption>
Image1 caption head Image1 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image2.jpg" alternate-text="Image2 Alternate text...">
</media-reference>
<media-caption>
Image2 caption head Image2 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image3.jpg" alternate-text="Image3 Alternate text...">
</media-reference>
<media-caption>
Image3 caption head Image3 caption text
</media-caption>
</media>
</body.content>
<body.end>
<tagline>
Copyright 2007
<a href="

http://xyz/copyright/">Copyright</a>

. All rights reserved.
</tagline>
</body.end>
</body>
</nitf>

Any assistance is appreciated.

Darrell

k5tm · May 30, 2007

Darrell said:
store the paragraph elements first and then match them to the <img> image elements

First of all, I will presume that you mean <media> image elements.

I infer from your example document that the <media> element has the same parent as the element for each element of interest. Is this inference correct? If so, then you can easily find the <media> element if you know the parent of the <media> element, and vice versa. If you need to retain this information (rather than discover it on the fly) the generate-id() function is a convenient way to retain a unique identity of a node, or assert the unique identity of a node in an XPath predicate expression.

Give it a try, and come back to us for additional assistance.

Tom Morrison

http://www.liant.com

darrellblackhawk · May 30, 2007

Thanks for the quick response.

Unfortunately, I'm new to XSLT, so I need a more specific code example.

I'm fairly astute, but I'm still getting my head around this paradigm.

Thanks.

p.s. time to by a book

darrellblackhawk · May 30, 2007

I guess what I need is the method to store the paragraph elements and then mesh them with the image elements when they're encountered further down in the document.

Yes the and <media>(image) elements have the same parent.

Darrell

darrellblackhawk · May 30, 2007

I figured it out. May not be the best way, but it works.
I'll post the resolution when I'm finished getting this out the door.

Darrell

k5tm · May 31, 2007

Darrell,

Glad you have something working.

If you are processing a <media> element, then, from the context of the <media> element, the XPath expression to the text content of the first element (in document order) having the same parent would be

Code:

../p[1]/text()

The paragraph elements are already 'stored' so all you need to do is address them using XPath.

Tom Morrison

http://www.liant.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Adding content to elements from different parts of XML stream

darrellblackhawk

Programmer

k5tm

Programmer

darrellblackhawk

Programmer

darrellblackhawk

Programmer

darrellblackhawk

Programmer

k5tm

Programmer

Similar threads

Part and Inventory Search

Sponsor