Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with .XSLT transformation

Status
Not open for further replies.

marcello62

Technical User
Jun 19, 2006
42
NL
Hi there,

I have to convert about 20.000 big .XML files to .txt format. Therefore, for each file I use 5 .XSLT stylesheets to render 5 .txt files, which I want to get together using C# syntax. However, I have problems looping through all the .XML files, the program crashes. I noticed this before when I attempted to develop this program using the combination Delphi/MSXSL and Delphi/Saxon, so the problem is probably not related to C# as a platform. I had hoped, however, that C# would perform better than Delphi, unfortunately this is not so. Does anyone perhaps have an idea how to improve C#'s performance in using XslTransform commands on large batches of .XML files?

Any help will be greatly appreciated.
 
Could we see a sample of the code that is crashing?

Also, when you say the program is "crashing" are you getting some kind of error message or is the thing just freezing up?
 
Hi Aptitude,

The source code follows at the end.
What basically happens:
From under the button bntGetFiles the user is prompted to select a map. The names of all .XML files are stored in an arraylist. Then, for each item in the arraylist (normally it will contain the names of 20.000 xml-files), the method of a selfcreated class (ConvertXMLPatient is the name of the class, convertXMLBestanden is the name of the method) is called, in which I perform 5 transformations. It works for 6000, at most 7000 files, and then it crashes. There is no single file which causes the problem. The error message is: 'An unhandled exception ('System.Runtime.Interopservices.COMException') occurred in WindowsApplicationXML.exe [3792].
I would very much appreciate if you took a look at the code.

private void btnGetFiles_Click(object sender, EventArgs e)
{

String strBestand;
String strXmlBestand;
String strExtensie;
String strMap;
String strDateTime;
String strDuur;
int intLen;
int intStart;
long lngDuur;
long lngAantal;
long lngMinuten;
long lngUren;
ConvertXmlPatient convert;
DateTime dtDateTimeBegin;
DateTime dtDateTimeEind;
DateTime dtDateTimeVerschil;

lngAantal = 0;
txtAantalXMLBestanden.Text = lngAantal.ToString();
dlgSelecteerMap.ShowDialog();
strMap = dlgSelecteerMap.SelectedPath;
//if strMap = ""
//{
// MessageBox.Show("Selecteer eerst een map!");
//}

dtDateTimeBegin = DateTime.Now;
strDateTime = dtDateTimeBegin.ToString();
txtDateTimeBegin.Text = strDateTime;
txtDateTimeBegin.Refresh();

convert = new ConvertXmlPatient();

ArrayList bestandslijst = new ArrayList(Directory.GetFiles(strMap));

foreach (String item in bestandslijst)
{
strBestand = item.ToString();
intLen = strBestand.Length;
intStart = intLen - 4;
strExtensie = strBestand.Substring(intStart, 4);

if (strExtensie == ".xml")
{
lngAantal = lngAantal + 1;
}
}
txtTeConverterenBestanden.Text = lngAantal.ToString();
txtTeConverterenBestanden.Refresh();
lngAantal = 0;

foreach (String item in bestandslijst)
{
strBestand = item.ToString();
strDateTime = DateTime.Now.ToString();
intLen = strBestand.Length;
intStart = intLen - 4;
strExtensie = strBestand.Substring(intStart, 4);

if ( strExtensie == ".xml")
{
lngAantal = lngAantal + 1;
strXmlBestand = strBestand;
convert.convertXmlBestanden(strMap, strXmlBestand, lngAantal);
txtAantalXMLBestanden.Text = lngAantal.ToString();
txtAantalXMLBestanden.Refresh();
}
}
dtDateTimeEind = DateTime.Now;
strDateTime = dtDateTimeEind.ToString();
txtDateTimeEind.Text = strDateTime;
lngDuur = dtDateTimeEind.Second - dtDateTimeBegin.Second;
strDuur = lngDuur.ToString();
txtDuurSeconden.Text = strDuur;

MessageBox.Show("Alle XML-bestanden in de map " + strMap + " zijn geconverteerd!");
}
}

public class ConvertXmlPatient
{
public ConvertXmlPatient()
{
long lngCounter;
lngCounter = 0;
}

public void convertXmlBestanden(string strMap, string strXmlBestand, long lngAantal)
{
String strBestand;
String strWerkmap;
String b;
String t1;
String t2;
String t3;
String t4;
String t5;
long lngCount;

strBestand = strXmlBestand;
strWerkmap = strMap;

lngCount = lngAantal;
//Laad het XML document
XmlDataDocument xmlDossier = new XmlDataDocument();
xmlDossier.Load(strBestand);

//PATIENT
//Laad het transformatie patientdossier
System.Xml.Xsl.XslTransform transformPatient = new
System.Xml.Xsl.XslTransform();
t1 = strWerkmap + "\\cPatient.xsl";
transformPatient.Load(t1);
//Definieer de outputstream
System.IO.StreamWriter txtDossier1 = new
System.IO.StreamWriter(strWerkmap + "\\out1_" + lngCount.ToString() + ".txt");
//creeer het outputbestand
transformPatient.Transform(xmlDossier, null, txtDossier1);

//PROBLEEM
//Laad het transformatie patientdossier
System.Xml.Xsl.XslTransform transformProbleem = new
System.Xml.Xsl.XslTransform();
t2 = strWerkmap + "\\cProbleem.xsl";
transformProbleem.Load(t2);
//Definieer de outputstream
System.IO.StreamWriter txtDossier2 = new
System.IO.StreamWriter(strWerkmap + "\\out2_" + lngCount.ToString() + ".txt");
//creeer het outputbestand
transformProbleem.Transform(xmlDossier, null, txtDossier2);

//RUITER
//Laad het transformatie patientdossier
System.Xml.Xsl.XslTransform transformRuiter = new
System.Xml.Xsl.XslTransform();
t3 = strWerkmap + "\\cRuiter.xsl";
transformRuiter.Load(t3);
//Definieer de outputstream
System.IO.StreamWriter txtDossier3 = new
System.IO.StreamWriter(strWerkmap + "\\out3_" + lngCount.ToString() + ".txt");
//creeer het outputbestand
transformRuiter.Transform(xmlDossier, null, txtDossier3);

//RECEPT
//Laad het transformatie patientdossier
System.Xml.Xsl.XslTransform transformRecept = new
System.Xml.Xsl.XslTransform();
t4 = strWerkmap + "\\cRecept.xsl";
transformRecept.Load(t4);
//Definieer de outputstream
System.IO.StreamWriter txtDossier4 = new
System.IO.StreamWriter(strWerkmap + "\\out4_" + lngCount.ToString() + ".txt");
//creeer het outputbestand
transformRecept.Transform(xmlDossier, null, txtDossier4);

//RECEPTREGEL
//Laad het transformatie patientdossier
System.Xml.Xsl.XslTransform transformReceptregel = new
System.Xml.Xsl.XslTransform();
t5 = strWerkmap + "\\cReceptregel.xsl";
transformReceptregel.Load(t5);
//Definieer de outputstream
System.IO.StreamWriter txtDossier5 = new
System.IO.StreamWriter(strWerkmap + "\\out5_" + lngCount.ToString() + ".txt");
//creeer het outputbestand
transformReceptregel.Transform(xmlDossier, null, txtDossier5);

}
}
 
like Aptitude said, code will be required to assist.

My first guess is system resources. 20K files with a "big" file size could kill a system if the resources are not managed carefully.

make sure you close, flush, and/or dispose of any objects which minipulate the xml or txt file. this can be done with a using() { } block or try/catch/finally. example
Code:
using(StreamWriter w = new StreamWriter())
{
   //use w
} // w disposed
//w no longer exists

-- or --

StreamWriter w = new StreamWriter();
try
{
   //use w
}
Catch { }
Finally
{
   w.Dispose();
   w = null;
}
just so we don't overlook the obvious. your changing content layout, not just the extension, correct? because that can done use [tt]System.IO.File("filename").Move("new filename");[/tt]

Jason Meckley
Programmer
Specialty Bakers, Inc.
 
OTTOMH:

two possibilities:

1. there is an internal leak of some resource, causing the process to belly up after a certain number of operations.

whittle the process down to its absolute essential, probably just a call to the relevant XslTransform method, and create a console app which is, as far as possible, essentially just this call. pass it the necessary input and output filenames as parameters. then spawn this from a second console process a number of times and see if you still run out ot resources. if you do, you may have problem number

2. there is an external leak of system resources.

this is pretty damned serious and not one you'll solve lightly. contact your vendor to see if they'll fix it; look into virtual machines; find a different technology.

good luck.

mr s. <;)

 
First off in your ConvertXmlPatient.convertXmlBestanden you are doing the same thing 5 times by copying & pasting the code. This is a bad idea as you get 5 bugs instead of 1. Second you are not closing the file streams you create. The best way to rectify this is to us a using clause as jmeckley said.

There is another issue in that you are running this process from the main UI thread. This will cause your application to stop processing windows messages for a long time which is bad. You should try running this in a seperate worker thread in the background.

Code:
public class ConvertXmlPatient
    {
        public ConvertXmlPatient()
        {
            long lngCounter;
            lngCounter = 0;
        }

        public void convertXmlBestanden(string strMap, string strXmlBestand, long lngAantal)
        {
            String strBestand;
            String strWerkmap;
            String b;            
            long lngCount;

            strBestand = strXmlBestand;
            strWerkmap = strMap;

            lngCount = lngAantal;
            //Laad het XML document
            XmlDataDocument xmlDossier = new XmlDataDocument();
            xmlDossier.Load(strBestand);

            //PATIENT
            //Laad het transformatie patientdossier
            ProcessTransform(strWerkmap, lngCount, xmlDossier, "\\cPatient.xsl", "\\out1_");

            //PROBLEEM
            //Laad het transformatie patientdossier
            ProcessTransform(strWerkmap, lngCount, xmlDossier, "\\cProbleem.xsl", "\\out2_");

            //RUITER
            //Laad het transformatie patientdossier
            ProcessTransform(strWerkmap, lngCount, xmlDossier, "\\cRuiter.xsl", "\\out3_");

            //RECEPT
            //Laad het transformatie patientdossier
            ProcessTransform(strWerkmap, lngCount, xmlDossier, "\\cRecept.xsl", "\\out4_");

            //RECEPTREGEL
            //Laad het transformatie patientdossier
            ProcessTransform(strWerkmap, lngCount, xmlDossier, "\\cReceptregel.xsl", "\\out5_");        
        }

        private static void ProcessTransform(String strWerkmap, long lngCount, XmlDataDocument xmlDossier, string input, string output)
        {
            String t1;

            System.Xml.Xsl.XslTransform transform = new
                System.Xml.Xsl.XslTransform();

            string t1 = strWerkmap + input;
            transform.Load(t1);
         
            //Definieer de outputstream
            using (System.IO.StreamWriter txtDossier = new
                System.IO.StreamWriter(strWerkmap + output + lngCount.ToString() + ".txt"))
            {
                //creeer het outputbestand            
                transform.Transform(xmlDossier, null, txtDossier);             
            }
        }
    }
 
Thanks Aptitude, JMeckley and misterstick. I can see the great value of your comments, I definitely have a path forward now. Next week I'm going to use your tips to adapt my program, I'll certainly post the results!
Thanks again.
 
Hi there,

I tried to dispose the .txt files in the finally statement, but I still noticed not difference in the performance. I now get the error message:

ContextSwitchDeadlock was detected
The CLR has been unable to transition from COM context 0x1a3008 to COM context 0x1a3178 for 60 seconds. The thread that owns the destination context/apartment is most like either doing a non pumping…

Apparently some object (the xml-file? the .txt files) are considered to be come objects. Maybe I should approach them diffently from a programmatic point of view?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top