Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse XML Loaded into String/Varchar SQL Fields? 1

Status
Not open for further replies.

kjv1611

New member
Jul 9, 2003
10,758
US
I am attempting to query some history records from a SQL table. In this table, one field needs to be split into multiple fields for my purposes.

It looks like the data is stored in an XML format, but that's just a guess from me, as I've never really delved into XML other than to glance at some once or twice.

So in this field, all the records are something like this:
Code:
<Updates><Field Name="TopLevelStatus" OldValue="Dirty" NewValue="Clean" /><Field Name="LowLevelStatus" OldValue="Stained" NewValue="NotStained" /></Updates>

So my question is... is there any way to easily parse this information without some long drawn-out functions? I do not have access to build User Defined Functions so far, so that's not an option.

If the total RecordSource were small enough, I could just pull it into an Access database, and work with it in VBA, but I think it's too big now, and KNOW it will be later.

So my initial guess is to use a combination of SUBSTR and CHARINDEX, and possibly another one or two functions. But I am hoping there's an easier way to go about this.

What I want to get out of it is 4 columns:
OldTopStatus, OldBottomStatus, NewTopStatus, NewBottomStatus

Thanks for any suggestions

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
What is the actual data type for the column where you are storing this information?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
You also didn't mention the version of SQL Server you are using. If you are using SQL2005 or newer, and you are storing this XML data in a varchar column, then you can convert the data to XML using the convert function and then extract the value from it. Like this...

Code:
Declare @Temp Table(Data VarChar(max))

Insert Into @Temp Values('<Updates><Field Name="TopLevelStatus" OldValue="Dirty" NewValue="Clean" /><Field Name="LowLevelStatus" OldValue="Stained" NewValue="NotStained" /></Updates>')

Select *, 
       Convert(xml, Data).value('(//Updates/Field[@Name="TopLevelStatus"]/@OldValue)[1]','varchar(100)') As TopLevelOldValue,
       Convert(xml, Data).value('(//Updates/Field[@Name="TopLevelStatus"]/@NewValue)[1]','varchar(100)') As TopLevelNewValue,
       Convert(xml, Data).value('(//Updates/Field[@Name="LowLevelStatus"]/@OldValue)[1]','varchar(100)') As LowLevelOldValue,
       Convert(xml, Data).value('(//Updates/Field[@Name="LowLevelStatus"]/@NewValue)[1]','varchar(100)') As LowLevelNewValue
From   @Temp

Notice how ugly the code is? Yeah, me too. In my opinion, it's best to split XML data in to normal data prior to storing it in the database. XML takes more database space and is slower to process when looking for the data values it contains.

I do understand that there are situations beyond our control, I mention this simply because if you ever find yourself in a position to make design decisions regarding storing XML data, it's better to store the values instead of storing the entire XML.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I forgot to mention....

[tt]
Convert(xml, Data).value('[!](//Updates/Field[@Name="TopLevelStatus"]/@OldValue)[1][/!]','varchar(100)') As TopLevelOldValue,
[/tt]

The part highlighted in [!]Red[/!] is called XQuery. There's a ton of free information on XQuery that can help you with this. I encourage you to spend 20 minutes learning a little bit about it.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks for all the information so far. As for XQuery, I was about to ask if looking into that was looking in the right direction when I found this:

As for the design, I really thought the same thing. I am not sure why the decision was made to go with the XML field as opposed to just separate fields.

Well, I do understand part of the reason. They are storing all sorts of different events in this table. So if it's an event of this type or that type, then they can simply store it in the same field.

However, I am a little concerned about the longevity of it. This one has been in place for a couple of weeks now, maybe 3 weeks tops, and it's already logged 17 million records. Probably 60% of them were logged in the initial data load, but ongoing testing has resulted in the remainder. So at least about 1 million records per week added.... so after a year, we'd have over 70 million records would be my guess... by the time it goes into production from development/testing.

I've been considering mentioning at least my concern on that design setup for a couple of reasons, but I'm not so sure it'll make much difference at this point in development. Who knows, though, maybe it will.

For one, I think that part of this large table should be in a separate table altogether. Right now, the only way you can differentiate the different types is by looking in another varchar field.

As for the data type, the data type of this field is unfortunately in a varchar format. [cry]

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Look at it this way...

You XML (from the original post) is 156 bytes. If this is the average length of data, and you multiply this by 70,000,000, you get 10,920,000,000. In storage space, that's roughly 11 gigabytes.

If you were to change the storage so that it was NOT xml, but the 4 separate values, you would have: Dirty, Clean, Stained, NotStained. That's roughly 30 bytes per row. Multiply this by 70 million, and you get 2,100,000,000 or roughly 2 gigabytes.

Normalizing this further, so that you don't store the values, but actually have a small lookup table where you store an int for each value, you would be able to get each row down to 4 ints, at 4 bytes per int, so 16 bytes per row. Multiply this by 70 million, and you get 1,120,000,000 roughly 1 gigabyte.

Net result... one tenth the storage space. Your backups will be quicker, and your database will perform better.

Of course, I took a lot of liberties with the calculations. You can see that even with a simple XML, it is a lot more efficient to properly normalize the data and store it that way. Also, consider that the longer your data structure stays like this, the harder it will be to change it in the future because there will be more and more code built up around this data structure.

Please understand that I am not telling you what to do. I'm simply explaining the repercussions of your current storage strategy.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Yeah, I agree totally that it makes more sense - everything I've ever seen or read on the subject seems to point to it - to store the data in separate columns, and in as small of fields as possible.

I'm guessing that they are using that format in with part of the application, and I don't think they'll consider any changes in that area unless it's determined an absolute necessity. Even though it'd be more difficult to change later, they may still say it's not a necessity. But I don't know. I did briefly speak with one person who is over quite a bit of it all, but I don't think it'll go anywhere. Still, I'll try to at least discuss it briefly to make sure.

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Sorry it's taken so long for me to get back to this one. It's something I'm going to have to dig into more and more and learn more about, b/c I think our dev team here is going more in that direction of using XML anyway.. I could be wrong, but it seems like it.

I wish they weren't for this one, but I'll just use it as a learning opportunity - usually I won't take time to learn something unless I have to. [blush]

Unfortunately I've got several other things going on at the same time, so can't focus in on this one as much as I would like - right now, at least. May be able to a little more in the future.

And I do indeed intent to read up more on the XQuery stuff for sure!

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Oh, somehow I didn't mention. I was able to use the given example in a test to pull out a good sampling of the values, correctly to what I'd expect.

What I've got to figure out now is how to find the "top level" status from times where it's not listed in the results. It gives a NULL which is correct, but I somehow need to dig out of the history to see what the actual status would have been at that time.

For the type industry my company is in, the history is very important, so we've got to be able to search the history, not just current state of any account or group of accounts at any time.

Anyway, thanks again for the help..

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top