Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Unusual XML file

Status
Not open for further replies.

pva1958

Programmer
Dec 22, 2011
2
0
0
US
Hello,

I'm an ASP.NET developer. I don't have much experience with XML, but I know how to read the XML files and display them in the Gridview.

The file I'm dealing with now is quite unusual, and I'm not sure how to read it. It has only 3 tags: <hashtable>, <entry> and <string>. It doesn't even have a real root. Below is a beginning of this file:

- <hashtable>
- <entry>
<string>6</string>
- <hashtable>
- <entry>
<string>creationdt</string>
<string>1257162838</string>
</entry>
- <entry>
<string>resellerid</string>
<string>44453</string>
</entry>
- <entry>
<string>endtime</string>
<string>1333238399</string>
</entry>

This doesn't look like a valid XML file, but it's definitely readable, since the whole system of the API provider is based on these files.

Below is the message I've got from the provider:

"Our XML file consists of multiple parameters each of which cannot be converted to a tag.

Hence we have divided the API response section into 3 tags mentioned.

The parsing has to be done in the following manner:

1: The first response <string> tag within an <Entry> is your variable Key.
It can be anything from creationdt, resellerid, etc.

2: The second response <string> tag within an <Entry> is the value.
This needs to be used for displaying your response to the client.

You will need to deploy your code accordingly."

Any help on how to parse this file is much appreciated.

Thank you.

 
That looks a strange one indeed and I'm no expert but trying to make sense of it (depending on if the pattern is the same all through)

Code:
- <hashtable>
- <entry>
  <string>6</string>
- <hashtable>

Is this telling us that there are 6 strings to follow allowing you to identify the length of the block to parse for each client response maybe?

Code:
- <entry>
  <string>creationdt</string>
  <string>1257162838</string>
  </entry>
So the first 2 strings of the 6 are creation date and a date string (is it the date string that you return here) 1257162838

Code:
- <entry>
  <string>resellerid</string>
  <string>44453</string>
  </entry>
string 3 and 4 are reseller ID and the ID value so again you return the ID 44453

Code:
- <entry>
  <string>endtime</string>
  <string>1333238399</string>
  </entry>

Ok so now we are at string 5 and 6 so end time and you return the time string 1333238399


This is all guess work but that's about the best I can make of it based on what you have provided.

Since we have now processed the 6 strings from hashtable we are at the end so do we get a new hashtable block with the next number of values to parse ???


-IF- So maybe you read in the hashtable to get the string value, then read in each <entry> block up to the value of strings read from <hashtable> string value and return the second string value from each entry block ......


I hope that makes sense ...

Laurie.
 
I would first preprocess the file to normalize it.
First, delete all \n (end of lines), i.e. you get all lines into one string just like this
Code:
<hashtable><entry><string>6</string><hashtable><entry><string>creationdt</string><string>1257162838</string></entry><entry><string>resellerid</string><string>44453</string></entry><entry><string>endtime</string><string>1333238399</string></entry>
Then you could transform the file with regular expressions - for example with Vbscript... or any language which support regex.
Here for example I use for it the sed utility:
Code:
$ sed -e 's/^<hashtable>\s*<entry>\s*<string>/<numstrings>/; s/<\/string>\s*<ha
shtable>/<\/numstrings>/; s/^\s*/<hashtable_root>/; s/\s*$/<\/hashtable_root>/' 
example_file.xml
<hashtable_root><numstrings>6</numstrings><entry><string>creationdt</string><string>1257162838</string></entry><entry><string>resellerid</string><string>44453</string></entry><entry><string>endtime</string><string>1333238399</string></entry></hashtable_root>
As you see I got this result:
Code:
<hashtable_root>
 <numstrings>6</numstrings>
 <entry>
  <string>creationdt</string>
  <string>1257162838</string>
 </entry>
 <entry>
  <string>resellerid</string>
  <string>44453</string>
 </entry>
  <entry><string>endtime</string>
  <string>1333238399</string>
 </entry>
</hashtable_root>
This is the normal XML with root node, which you can parse.
 
I played with it a little bit and here is an working example in VBscript:

parse_nostandard_xml.vbs
Code:
[COLOR=#0000ff]'get XML into string[/color]
xml_string [COLOR=#804040][b]=[/b][/color] file2str[COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]"example.xml"[/color][COLOR=#804040][b])[/b][/color]
out_line [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"* Original xml_string = '"[/color] [COLOR=#804040][b]&[/b][/color] xml_string [COLOR=#804040][b]&[/b][/color] [COLOR=#ff00ff]"'"[/color]
wscript[COLOR=#804040][b].[/b][/color]echo out_line
wscript[COLOR=#804040][b].[/b][/color]echo

[COLOR=#0000ff]'transform string into normal XML[/color]
xml_string [COLOR=#804040][b]=[/b][/color] normalize_XML[COLOR=#804040][b]([/b][/color]xml_string[COLOR=#804040][b])[/b][/color]
out_line [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"* Normalized xml_string = '"[/color] [COLOR=#804040][b]&[/b][/color] xml_string [COLOR=#804040][b]&[/b][/color] [COLOR=#ff00ff]"'"[/color]
wscript[COLOR=#804040][b].[/b][/color]echo out_line
wscript[COLOR=#804040][b].[/b][/color]echo

[COLOR=#0000ff]'parse normal XML[/color]
out_line [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"* Now parsing XML:"[/color] 
wscript[COLOR=#804040][b].[/b][/color]echo out_line

[COLOR=#804040][b]set[/b][/color] xml_doc [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]CreateObject[/color][COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]"Microsoft.XMLDOM"[/color][COLOR=#804040][b])[/b][/color]

[COLOR=#0000ff]'load XML from string[/color]
xml_doc[COLOR=#804040][b].[/b][/color]loadXML[COLOR=#804040][b]([/b][/color]xml_string[COLOR=#804040][b])[/b][/color]

[COLOR=#0000ff]'create list of <entry> elements[/color]
[COLOR=#804040][b]set[/b][/color] node_list [COLOR=#804040][b]=[/b][/color] xml_doc[COLOR=#804040][b].[/b][/color]getElementsByTagName[COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]"entry"[/color][COLOR=#804040][b])[/b][/color]

[COLOR=#804040][b]if[/b][/color] node_list[COLOR=#804040][b].[/b][/color]length [COLOR=#804040][b]>[/b][/color] [COLOR=#ff00ff]0[/color] [COLOR=#804040][b]then[/b][/color]
  out_line [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"Number of entries found: "[/color] [COLOR=#804040][b]&[/b][/color] node_list[COLOR=#804040][b].[/b][/color]length
  wscript[COLOR=#804040][b].[/b][/color]echo out_line
  [COLOR=#804040][b]for[/b][/color] [COLOR=#804040][b]each[/b][/color] entry [COLOR=#804040][b]in[/b][/color] node_list
   [COLOR=#0000ff] 'parse each element childs[/color]
    string_num [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]0[/color]
    [COLOR=#804040][b]for[/b][/color] [COLOR=#804040][b]each[/b][/color] child [COLOR=#804040][b]in[/b][/color] entry[COLOR=#804040][b].[/b][/color]ChildNodes
      [COLOR=#804040][b]if[/b][/color] child[COLOR=#804040][b].[/b][/color]NodeName [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"string"[/color] [COLOR=#804040][b]then[/b][/color]
        string_num [COLOR=#804040][b]=[/b][/color] string_num [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
        [COLOR=#804040][b]select[/b][/color] [COLOR=#804040][b]case[/b][/color] string_num
          [COLOR=#804040][b]case[/b][/color] [COLOR=#ff00ff]1[/color]
           [COLOR=#0000ff] 'parse 1.string into variable name[/color]
            var_name [COLOR=#804040][b]=[/b][/color] child[COLOR=#804040][b].[/b][/color]Text
          [COLOR=#804040][b]case[/b][/color] [COLOR=#ff00ff]2[/color]
           [COLOR=#0000ff] 'parse 2.string into variable value[/color]
            var_value [COLOR=#804040][b]=[/b][/color] child[COLOR=#804040][b].[/b][/color]Text
        [COLOR=#804040][b]end[/b][/color] [COLOR=#804040][b]select[/b][/color]
      [COLOR=#804040][b]end[/b][/color] [COLOR=#804040][b]if[/b][/color]
    [COLOR=#804040][b]next[/b][/color]
   [COLOR=#0000ff] 'write the variable and value[/color]
    out_line [COLOR=#804040][b]=[/b][/color] var_name [COLOR=#804040][b]&[/b][/color] [COLOR=#ff00ff]" = "[/color] [COLOR=#804040][b]&[/b][/color] var_value
    wscript[COLOR=#804040][b].[/b][/color]echo out_line    
  [COLOR=#804040][b]next[/b][/color]
  wscript[COLOR=#804040][b].[/b][/color]echo [COLOR=#ff00ff]"...Done."[/color]
[COLOR=#804040][b]else[/b][/color]
  err_msg [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]chr[/color][COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]34[/color][COLOR=#804040][b])[/b][/color] [COLOR=#804040][b]&[/b][/color] [COLOR=#ff00ff]"entry"[/color] [COLOR=#804040][b]&[/b][/color] [COLOR=#008080]chr[/color][COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]34[/color][COLOR=#804040][b])[/b][/color] [COLOR=#804040][b]&[/b][/color] [COLOR=#ff00ff]" tag not found !"[/color]
  wscript[COLOR=#804040][b].[/b][/color]echo[COLOR=#804040][b]([/b][/color]err_msg[COLOR=#804040][b])[/b][/color]
[COLOR=#804040][b]end[/b][/color] [COLOR=#804040][b]if[/b][/color]

[COLOR=#0000ff]'at end release objects from memory[/color]
[COLOR=#804040][b]set[/b][/color] xml_doc [COLOR=#804040][b]=[/b][/color] [COLOR=#804040][b]nothing[/b][/color]
[COLOR=#804040][b]set[/b][/color] node_list [COLOR=#804040][b]=[/b][/color] [COLOR=#804040][b]nothing[/b][/color]


[COLOR=#0000ff]'----------------------- functions ------------------------[/color]
[COLOR=#804040][b]function[/b][/color] file2str[COLOR=#804040][b]([/b][/color]fname[COLOR=#804040][b])[/b][/color]
  [COLOR=#804040][b]set[/b][/color] oFSO [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]CreateObject[/color][COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]"Scripting.FileSystemObject"[/color][COLOR=#804040][b])[/b][/color]
 [COLOR=#0000ff] 'open the input file[/color]
  [COLOR=#804040][b]set[/b][/color] oInFile [COLOR=#804040][b]=[/b][/color] oFSO[COLOR=#804040][b].[/b][/color]OpenTextFile[COLOR=#804040][b]([/b][/color]fname[COLOR=#804040][b])[/b][/color]
  file2str [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]""[/color] 
 [COLOR=#0000ff] 'for each line in the input file[/color]
  [COLOR=#804040][b]do[/b][/color] [COLOR=#804040][b]while[/b][/color] [COLOR=#804040][b]not[/b][/color] oInFile[COLOR=#804040][b].[/b][/color]AtEndOfStream
   [COLOR=#0000ff] 'read the line and concatenate it with others[/color]
    file2str [COLOR=#804040][b]=[/b][/color] file2str [COLOR=#804040][b]&[/b][/color] oInFile[COLOR=#804040][b].[/b][/color][COLOR=#a020f0]ReadLine[/color][COLOR=#804040][b]()[/b][/color]
  [COLOR=#804040][b]loop[/b][/color]
 [COLOR=#0000ff] 'close the input file[/color]
  oInFile[COLOR=#804040][b].[/b][/color][COLOR=#804040][b]close[/b][/color]
 [COLOR=#0000ff] 'at end  release object from memory[/color]
  [COLOR=#804040][b]set[/b][/color] oFSO [COLOR=#804040][b]=[/b][/color] [COLOR=#804040][b]nothing[/b][/color]
[COLOR=#804040][b]end[/b][/color] [COLOR=#804040][b]function[/b][/color]

[COLOR=#804040][b]function[/b][/color] normalize_XML[COLOR=#804040][b]([/b][/color]xml_str[COLOR=#804040][b])[/b][/color]
  [COLOR=#804040][b]set[/b][/color] re [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]createobject[/color][COLOR=#804040][b]([/b][/color][COLOR=#ff00ff]"vbscript.regexp"[/color][COLOR=#804040][b])[/b][/color]

 [COLOR=#0000ff] '1. replacement: create beginning of tag <numstrings>[/color]
  re[COLOR=#804040][b].[/b][/color]pattern [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"^<hashtable>\s*<entry>\s*<string>"[/color]
  replace_with [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"<numstrings>"[/color]
  xml_str [COLOR=#804040][b]=[/b][/color] re[COLOR=#804040][b].[/b][/color][COLOR=#008080]Replace[/color][COLOR=#804040][b]([/b][/color]xml_str[COLOR=#804040][b],[/b][/color] replace_with[COLOR=#804040][b])[/b][/color]
  
 [COLOR=#0000ff] '2. replacement: create end of tag </numstrings>[/color]
  re[COLOR=#804040][b].[/b][/color]pattern [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"<\/string>\s*<hashtable>"[/color]
  replace_with [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"</numstrings>"[/color]
  xml_str [COLOR=#804040][b]=[/b][/color] re[COLOR=#804040][b].[/b][/color][COLOR=#008080]Replace[/color][COLOR=#804040][b]([/b][/color]xml_str[COLOR=#804040][b],[/b][/color] replace_with[COLOR=#804040][b])[/b][/color]

 [COLOR=#0000ff] '3. replacement: create beginning of root node <hashtable_root>[/color]
  re[COLOR=#804040][b].[/b][/color]pattern [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"^^\s*"[/color]
  replace_with [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"<hashtable_root>"[/color]
  xml_str [COLOR=#804040][b]=[/b][/color] re[COLOR=#804040][b].[/b][/color][COLOR=#008080]Replace[/color][COLOR=#804040][b]([/b][/color]xml_str[COLOR=#804040][b],[/b][/color] replace_with[COLOR=#804040][b])[/b][/color]  

 [COLOR=#0000ff] '4. replacement: : create end of root node </hashtable_root>[/color]
  re[COLOR=#804040][b].[/b][/color]pattern [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"\s*$"[/color]
  replace_with [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]"</hashtable_root>"[/color]
  xml_str [COLOR=#804040][b]=[/b][/color] re[COLOR=#804040][b].[/b][/color][COLOR=#008080]Replace[/color][COLOR=#804040][b]([/b][/color]xml_str[COLOR=#804040][b],[/b][/color] replace_with[COLOR=#804040][b])[/b][/color]

 [COLOR=#0000ff] 'return modified string[/color]
  normalize_XML [COLOR=#804040][b]=[/b][/color] xml_str
 [COLOR=#0000ff] 'at end release object from memory[/color]
  [COLOR=#804040][b]set[/b][/color] re [COLOR=#804040][b]=[/b][/color] [COLOR=#804040][b]nothing[/b][/color]
[COLOR=#804040][b]end[/b][/color] [COLOR=#804040][b]function[/b][/color]
Now for given input file
example.xml
Code:
<hashtable>
<entry>
<string>6</string>
<hashtable>
<entry>
<string>creationdt</string>
<string>1257162838</string>
</entry>
<entry>
<string>resellerid</string>
<string>44453</string>
</entry>
<entry>
<string>endtime</string>
<string>1333238399</string>
</entry>
it delivers this result
Code:
c:\_mikrom\Work\xml>cscript /NoLogo parse_nostandard_xml.vbs
* Original xml_string = '<hashtable><entry><string>6</string><hashtable><entry><
string>creationdt</string><string>1257162838</string></entry><entry><string>rese
llerid</string><string>44453</string></entry><entry><string>endtime</string><str
ing>1333238399</string></entry>'

* Normalized xml_string = '<hashtable_root><numstrings>6</numstrings><entry><str
ing>creationdt</string><string>1257162838</string></entry><entry><string>reselle
rid</string><string>44453</string></entry><entry><string>endtime</string><string
>1333238399</string></entry></hashtable_root>'

* Now parsing XML:
Number of entries found: 3
creationdt = 1257162838
resellerid = 44453
endtime = 1333238399
...Done.
 
Not sure if its a good guess or even if your reply needs to know but are the creationdt and endtime EPOCH (Unix) timestamps .. you could add a function to convert them (but only if the client wants theme sent as human readable values) ;)


Nice work otherwise ;)

Laurie.
 
tarn said:
.. you could add a function to convert them
Hi tarn,
I thing it's the thema for the original poster.

I only tried to show, what to do with non valid XML file.
Btw, the whole form of the XML above seems to be problematic - I would never place variable name and variable value into the same tag <string>..</string>, because then the parsing is depending on the tags order. For example if the parser above get the data in the order
Code:
<string>1257162838</string>
<string>creationdt</string>
then it parses them as
Code:
1257162838 = creationdt
what's IMHO wrong.
 
Indeed microm,

Sometimes you have to make exceptions for some of these strange application vendors ;)

Laurie.
 
Thank you everyone.

This worked for me:

namespace A
{
class Program
{
static void Main(string[] args)
{
var result = from e in XDocument.Load("abc.xml").Root.Descendants("hashtable").Elements("entry")
let array = e.Elements("string").ToArray()
select new
{
Name = array[0].Value,
Id=array[1].Value
};

foreach (var item in result)
{
Console.WriteLine(item.Id+"<===>"+item.Name);
}
}
}
}
 
The problem with the OP's provider is that they don't know the difference between tags and values. That is definitely not well-formed XML; it's not even close. It looks like what they should be doing is something like this:
Code:
<root>
  <hashtable>
    <unknown>6</unknown>
  </hashtable>
  <hashtable>
    <creationdt>1257162838</creationdt>
    <resellerid>44453</resellerid>
    <endtime>1333238399</endtime>
  </hashtable>
</root>
 
Yes I agree, the OP's provider should learn little bit more about XML.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top