Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

xmlreader - performance

Status
Not open for further replies.

simonchristieis

Programmer
Jan 10, 2002
1,144
GB
Has any looked at the performance improvement on using .net's xmlreader to parse xml files using xslt against command line parsing.

Is there any performance improvement on parsing larger zml files - 100-200 meg ?

Thanks in advance.

Simon
 
1. run some tests and see.
2. don't worry about performance if you don't even have working code. make it work, make it reusable, make it fast. in that order.
3. XmlReader is designed specifically to read xml. So why use another tool?

Jason Meckley
Programmer
Specialty Bakers, Inc.
 
Thanks for your non reply.

1. I could run some tests and see, but seeing as I offer advise on this site, feel entitled to glean some information back.

2. Interesting you dont include research and planning in your development cycle.

3. I currently use java and the xalan parser, I was wondering what other developers experiences have told them.

Regards

Simon

 
1. I could run some tests and see, but seeing as I offer advise on this site, feel entitled to glean some information back.
Assuming you have a reasonable size file on hand it would only take a simple console app to determine if there is a significant difference in execution.

2. Interesting you dont include research and planning in your development cycle.
I never said I don't. I focus on the client's (coworkers) requirements and design a solution to meet those needs. I find premature optimization leads to problems of maintainability. using the ideas of SOLID design principles, refactoring has minimal impact on the code base as a whole.

3. I currently use java and the xalan parser, I was wondering what other developers experiences have told them.

If you are familiar with Rhino Tools or Ayende Rahien, check out Rhino.ETL. It's purpose is to Extract, Transform and Load large quantities of data. I only learned of this tool recently and have no experience with it other than Ayende's blog and the Rhino Tools forum. However Rhino Tools has proven itself with the other components it provides. I would definitely consider this component for future data migration.


Jason Meckley
Programmer
Specialty Bakers, Inc.
 
We are using Informatica as the ETL tool, but mainly as a scheduler for the processing.

Our technical architecture revolves around transforming xml files using xslt files and these transforms are driven from batch files.

For me to introduce a new technology would be tricky, unless I could justify it.

Currently we use Java to drive the transform, C++ to drive the xml extracts and dos to shift files back and forth so already it's a little messy.

I dont have Visual Studio here, I'll have to try and run some tests when I get home see if I can find some way to optimise the transforms.

Thanks anyway.

Simon
 
I agree. introducing another set of tools to an already complex architecture wouldn't be prudent.

I'm curious why you're introducing c# if you're using languages that could can already handle this.

Jason Meckley
Programmer
Specialty Bakers, Inc.
 
Curiousity mainly, I was wondering if I could use some kind of xml stream reader / parser / writer to parse the file partitionally (new word?) rather than load the file into memory first then parse.

At the moment I can easily test the saxon and xalan parsers using java and wanted to know if the msxml parser would be any better and thought the best way to test that would be to use something under the .net framework and I've used a bit of c# previously.

So here I am.

8^)
 
with .net you can using IEnumerable<> to return parts of collections at a time.
Code:
IEnumerable<string> GetData()
{
   yield return "a";
   yield return "b";
   yield return "c";
}
what this does is returns "a". then you process "a". Then you get "b" and process "b". so on.
if you did
Code:
IEnumerable<string> GetData()
{
   return new string[] {"a","b","c"};
}
then it returns all 3 strings and then processes. this could be of use in your situation.
Code:
IEnumerable<string> GetData()
{
   using(StreamReader reader = File.OpenText("file.xml"))
   while(!reader.EndOfStream) 
      yield return reader.ReadLine();
}
}
at a later point you could break out of the loop. this way your only loading what you need. not loading the entire file.
Code:
void DoSomethingWithStrings()
{
   foreach(string s in GetData())
   {
      if (!IsValid(s)) break;
      //do something;
   }
}

Jason Meckley
Programmer
Specialty Bakers, Inc.
 
Thanks - I'll have a look at that over the weekend.

It might be heading away from what I'm trying to do though, if simply because I'm trying to parse the xml using xslt rather than parse the xml file as a text string..

If I wanted to simply extract the data I needed it would be easier, as you mentioned earlier, to use one of the existing technologies within the project.

I think I'm going to try and explore the WSE .net 2.0 advances regarding MTOM web services. As I understand it, this will allow a large xml file to be sent in pieces to a webservice and then reformed. I wonder if this would allow the file segments to be parsed as they arrive ?

Anyway, thanks for your help - I'll let you know how I get on.

Regards

Simon
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top