Extracting text from a node

klunde · Jun 16, 2005

Hi

I am in the need of sending some information enclosed within the "wrong" node in an xml file since the parser system that receives this files can't handle additional metadata and therefor I wonder how I can extract the data out from the node again from the file I receive back from the parser.

I can configure how to encapsulate the data I add, but I thought I would do this:

<ARTICLE_DATA>
<MAINTITLE>Her er overskriften</MAINTITLE>
<ARTICLETEXT>

[PubTime/10:00]
[ArticleType/message]
Her kommer selve artikkelteksten...

</ARTICLETEXT>
</ARTICLE_DATA>

The two metadata is PubTime and ArticleType. These texts will always be constant while the right side of the slash will vary. When I get this file back I will need to extract (and remove) both [..] parts from the and into some variables so that I can insert them elsewhere in my resulting xml file - but how?

Regards
</Morten>

</Morten>

JontyMC · Jun 16, 2005

What technologies you wanna use? XSL, .Net, Java, ASP, JavaScript?

Jon

"Asteroids do not concern me, Admiral. I want that ship, not excuses.

CubeE101 · Jun 16, 2005

Is there a reason why you don't use something like:

Code:

<ARTICLE_DATA>
     <MAINTITLE>Her er overskriften</MAINTITLE>
     <ARTICLETEXT>
          <P [b]PubTime="10:00" ArticleType="message"[/b]>
               Her kommer selve artikkelteksten...
          </P>
     </ARTICLETEXT>
</ARTICLE_DATA>

Or

Code:

<ARTICLE_DATA>
     <MAINTITLE>Her er overskriften</MAINTITLE>
     [b]<PubTime>10:00</PubTime>
     <ArticleType>message</ArticleType>[/b]
     <ARTICLETEXT>
          <P>
               Her kommer selve artikkelteksten...
          </P>
     </ARTICLETEXT>
</ARTICLE_DATA>

Otherwise, you could look into Regular Expressions with scripting...

http://www.google.com/search?hl=en&q=Regular+Expressions

http://msdn.microsoft.com/library/d...tml/reconintroductiontoregularexpressions.asp

Visit My Site
PROGRAMMER:

Red-eyed, mumbling mammal capable of conversing with inanimate objects.

klunde · Jun 16, 2005

I'm gonna use XSL to transform it back. And the reason for not doing is that I send one xml file and get it back with the part intact, but not the rest.

</Morten>

</Morten>

CubeE101 · Jun 17, 2005

If you want to use VBScript, here is one solution:

Code:

'// Create XML DOM Document and load the XML File "Temp.xml"
Set xmlDoc = CreateObject("msxml.domdocument")
xmlDoc.async = False
xmlDoc.Load "test.xml"

'// Set a reference to the <P> Node in the XML
Set PNode = xmlDoc.selectSingleNode("//P")

'// Get a copy of the Text in the <P> Node
PText = PNode.Text

'// Seperate the Text into an array, using the line breaks... chr(10)
MyData = Split(PText, Chr(10))

'// Loop through the Array and search for the meta data...
For i = 0 To UBound(MyData)

  '// Trim the blank spaces from around the text... "   test   " becomes "test"
  Temp = Trim(MyData(i))
  If InStr(1, Temp, "[PubTime") Or InStr(1, Temp, "[ArticleType") Then

    '// Show the data it finds
    MsgBox "Found " & Temp

    '// Remove the Meta Data from the copy of <P> Node's Text
    PText = Replace(PText, MyData(i) & Chr(10), "")

    '// Remove the "[" and "]" characters, then split with the "/" character
    tData = Split(Replace(Replace(Temp, "[", ""), "]", ""), "/")

    '// Create a new element using the meta data, and append to the xml Documents Root
    xmlDoc.documentElement.appendChild(xmlDoc.createElement(tData(0))).Text = tData(1)
  End If
Next

'// Set New Text Back to the <P> Node
PNode.Text = PText

'// Indent New XML
xmldoc.loadxml replace(xmlDoc.xml,"><", ">" & chr(10) & "<")

'// Show New XML
MsgBox xmlDoc.xml

The result should look like this:

Code:

<ARTICLE_DATA>
	<MAINTITLE>Her er overskriften</MAINTITLE>
	<ARTICLETEXT>
		<P>               Her kommer selve artikkelteksten...</P>
	</ARTICLETEXT>
	<PubTime>10:00</PubTime>
	<ArticleType>message</ArticleType>
</ARTICLE_DATA>

Without Comments and message boxes: (with XSL Transform)

Code:

Set xmlDoc = CreateObject("msxml.domdocument")
xmlDoc.async = False
xmlDoc.Load "test.xml"
Set PNode = xmlDoc.selectSingleNode("//P")
PText = PNode.Text
MyData = Split(PText, Chr(10))
For i = 0 To UBound(MyData)
  Temp = Trim(MyData(i))
  If InStr(1, Temp, "[PubTime") Or InStr(1, Temp, "[ArticleType") Then
    PText = Replace(PText, MyData(i) & Chr(10), "")
    tData = Split(Replace(Replace(Temp, "[", ""), "]", ""), "/")
    xmlDoc.documentElement.appendChild(xmlDoc.createElement(tData(0))).Text = tData(1)
  End If
Next
PNode.Text = PText

[b]'// Transform with an XSL Doc... "test.xsl"
Set xslDoc = CreateObject("msxml.domdocument")
xslDoc.async = False
xslDoc.Load "test.xsl"

OutputText = xmlDoc.transformNode(xslDoc)[/b]

if this is placed in an html document <script> tag, you can use:
document.write xmlDoc.transformNode(xslDoc)
To display the XSL Transformed modified XML

Visit My Site
PROGRAMMER:

Red-eyed, mumbling mammal capable of conversing with inanimate objects.

klunde · Jun 19, 2005

I worked on this during the weekend and I've come up with this:

Code:

  	<xsl:variable name="rtvHeadline" select="ARTICLETEXT/P" />
 	<xsl:variable name="_pubTime" select="substring-after($rtvHeadline,'[pubTime:')" />
 	<xsl:variable name="pubTime" select="substring-before($_pubTime,']')" />
 	<xsl:variable name="_artType" select="substring-after($rtvHeadline,'[artType:')" />
 	<xsl:variable name="artType" select="substring-before($_artType,']')" />
 	<xsl:variable name="rtvText" select="substring-after($_artType,']')" />

</Morten>

klunde · Jun 19, 2005

A small notice, I did change the / separator into :

</Morten>

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Extracting text from a node

klunde

IS-IT--Management

JontyMC

Programmer

CubeE101

Programmer

klunde

IS-IT--Management

CubeE101

Programmer

klunde

IS-IT--Management

klunde

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor