I am working on updating a Java application that was originally meant to load XLS files into a database table - MySQL, if you're curious. It utilizes the Apache POI libraries, in part, to make this happen. It's not fast, it uses a lot of memory, but it works.
Now that our organization has upgraded to Office 2010, I am having to change automation processes to work with XLSX (ugh). This leads us to my current issue. The file I am currently working with contains, uncompressed, approximately 350 MiB of data. If you are familiar with the new file formats, you'll know the XLSX files is simply a container for several other files; some of which hold the actual data in XML. That being said, I'm running into a java.lang.OutOfMemoryError when simply trying to instantiate the workbook object. I haven't even been able to READ it, at this point.
Now, before you go suggesting that I increase the heap space with -Xmx150T, I've done that. The max the JVM will allow me to go on my machine is 1250 MiB. Every time I run the application, I watch Java in the Task Manager suck up more, and more memory until it hits this limit, then it pukes with the aforementioned error.
This may be a good time to tell you about my environment:
JDK 7/JRE 1.7.0_05 (Oracle)
Windows7 SP1 x86 with 3.5 GB (usable) RAM
Need more info? Please ask.
Does anyone have any other suggestions for reading XLSX files with Java? Are there some better ways I can do this? I'd really rather not, but will if I have to, resort to saving each worksheet as a CSV first. Maybe I can step into the XLSX container, grab the XML file for the worksheet I wish to load, and use an XML parser? <grumble>
I haven't been on here in a while, are we still saying that?