darrellblackhawk
Programmer
Hello,
I have some XML documents that are organized in a unique way.
The parts that I need to parse out into elements are <p> paragraph elements and <media> image elements.
The problem is that the two different element types are in different parts of the XML.
The <p> paragraph elements come first; there might be up to 20 of them, and then the <img> elements come after.
The need is to step through the document, store the <p> paragraph elements first and then match them to the <img> image elements as the XSL continues stepping through the document while creating new elements which will be rendered by the browser.
Restating the above steps:
1- Test whether the element is a <p> paragraph or a <media> reference.
2- Since the <p> paragraphs will be located first, store them for later processing.
3- When elements of type <media> are discovered, create output elements; such as <div>'s with embedded tables, to be placed in the output result tree.
You'll also notice that there are two initial <p> paragraph elements in the example below, but they're not a problem to deal with (i.e. position()>2 or just matching on body.content and selecting <p> elements position()>1 which avoids the first <p> element in the <abstract> element).
The XML structure is NITF which is from the news industry; although I'm unsure if this particular file is a standard. Any identifying information has been scrubbed from the structure. As stated, I don't need to deal with any other elements, so the <head>, <docdata>, etc. will be removed from consideration by the element test(s).
The only issue I have is with saving the initial <p> paragraph elements and then including them in output elements when the <media> elements are processed.
Here's an example of the XML structure:
<nitf>
<head>
<docdata>
<doc-id id-string="xxy.DTL"/>
<date.issue norm="issue-code"/>
<date.release norm="release-code"/>
<doc.copyright year="2007" holder="Copyright holder"/>
</docdata>
<pubdata type="web" date.publication="publication-date" name="Publisher" ex-ref="xxx.DTL"/>
</head>
<body>
<body.head>
<headline>
<hl1>Heading</hl1>
</headline>
<byline>
<person>Wire service</person>
</byline>
<abstract>
<p>
First paragraph...
</p>
</abstract>
</body.head>
<body.content>
<p>
Wednesday, May 30, 2007
</p>
<p>
First paragraph...
</p>
<p>
Second paragraph...
</p>
<p>
Third paragraph...
</p>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image1.jpg" alternate-text="Image1 Alternate text...">
</media-reference>
<media-caption>
<B>Image1 caption head</B> Image1 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image2.jpg" alternate-text="Image2 Alternate text...">
</media-reference>
<media-caption>
<B>Image2 caption head</B> Image2 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image3.jpg" alternate-text="Image3 Alternate text...">
</media-reference>
<media-caption>
<B>Image3 caption head</B> Image3 caption text
</media-caption>
</media>
</body.content>
<body.end>
<tagline>
Copyright 2007
<a href=" . All rights reserved.
</tagline>
</body.end>
</body>
</nitf>
Any assistance is appreciated.
Darrell
I have some XML documents that are organized in a unique way.
The parts that I need to parse out into elements are <p> paragraph elements and <media> image elements.
The problem is that the two different element types are in different parts of the XML.
The <p> paragraph elements come first; there might be up to 20 of them, and then the <img> elements come after.
The need is to step through the document, store the <p> paragraph elements first and then match them to the <img> image elements as the XSL continues stepping through the document while creating new elements which will be rendered by the browser.
Restating the above steps:
1- Test whether the element is a <p> paragraph or a <media> reference.
2- Since the <p> paragraphs will be located first, store them for later processing.
3- When elements of type <media> are discovered, create output elements; such as <div>'s with embedded tables, to be placed in the output result tree.
You'll also notice that there are two initial <p> paragraph elements in the example below, but they're not a problem to deal with (i.e. position()>2 or just matching on body.content and selecting <p> elements position()>1 which avoids the first <p> element in the <abstract> element).
The XML structure is NITF which is from the news industry; although I'm unsure if this particular file is a standard. Any identifying information has been scrubbed from the structure. As stated, I don't need to deal with any other elements, so the <head>, <docdata>, etc. will be removed from consideration by the element test(s).
The only issue I have is with saving the initial <p> paragraph elements and then including them in output elements when the <media> elements are processed.
Here's an example of the XML structure:
<nitf>
<head>
<docdata>
<doc-id id-string="xxy.DTL"/>
<date.issue norm="issue-code"/>
<date.release norm="release-code"/>
<doc.copyright year="2007" holder="Copyright holder"/>
</docdata>
<pubdata type="web" date.publication="publication-date" name="Publisher" ex-ref="xxx.DTL"/>
</head>
<body>
<body.head>
<headline>
<hl1>Heading</hl1>
</headline>
<byline>
<person>Wire service</person>
</byline>
<abstract>
<p>
First paragraph...
</p>
</abstract>
</body.head>
<body.content>
<p>
Wednesday, May 30, 2007
</p>
<p>
First paragraph...
</p>
<p>
Second paragraph...
</p>
<p>
Third paragraph...
</p>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image1.jpg" alternate-text="Image1 Alternate text...">
</media-reference>
<media-caption>
<B>Image1 caption head</B> Image1 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image2.jpg" alternate-text="Image2 Alternate text...">
</media-reference>
<media-caption>
<B>Image2 caption head</B> Image2 caption text
</media-caption>
</media>
<media media-type="image">
<media-reference mime-type="image/jpeg" source="image3.jpg" alternate-text="Image3 Alternate text...">
</media-reference>
<media-caption>
<B>Image3 caption head</B> Image3 caption text
</media-caption>
</media>
</body.content>
<body.end>
<tagline>
Copyright 2007
<a href=" . All rights reserved.
</tagline>
</body.end>
</body>
</nitf>
Any assistance is appreciated.
Darrell