Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Memory and working with large files 1

Status
Not open for further replies.

GregTheGeek

Programmer
Aug 11, 2004
46
US
Hello.

I am working on updating a Java application that was originally meant to load XLS files into a database table - MySQL, if you're curious. It utilizes the Apache POI libraries, in part, to make this happen. It's not fast, it uses a lot of memory, but it works.

Now that our organization has upgraded to Office 2010, I am having to change automation processes to work with XLSX (ugh). This leads us to my current issue. The file I am currently working with contains, uncompressed, approximately 350 MiB of data. If you are familiar with the new file formats, you'll know the XLSX files is simply a container for several other files; some of which hold the actual data in XML. That being said, I'm running into a java.lang.OutOfMemoryError when simply trying to instantiate the workbook object. I haven't even been able to READ it, at this point.

Now, before you go suggesting that I increase the heap space with -Xmx150T, I've done that. The max the JVM will allow me to go on my machine is 1250 MiB. Every time I run the application, I watch Java in the Task Manager suck up more, and more memory until it hits this limit, then it pukes with the aforementioned error.

This may be a good time to tell you about my environment:
JDK 7/JRE 1.7.0_05 (Oracle)
NetBeans 7.1.2
Windows7 SP1 x86 with 3.5 GB (usable) RAM
Need more info? Please ask.

Does anyone have any other suggestions for reading XLSX files with Java? Are there some better ways I can do this? I'd really rather not, but will if I have to, resort to saving each worksheet as a CSV first. Maybe I can step into the XLSX container, grab the XML file for the worksheet I wish to load, and use an XML parser? <grumble>

TIA!
I haven't been on here in a while, are we still saying that?

gtg.jpg

GTG
 
How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Cheers,
Dian
 
Diancecht said:
How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Dian,

Thank you for your response. The data, unzipped, is nearly 400MiB - my previous estimate was a little low. Why simply opening this file with POI is causing the heap is bloating to over a gigabyte is beyond me.

The main reason I'm using Java (aside from legacy support) is because most of these processes are automated; little to no human interaction. Using [tt]LOAD DATA INFILE...[/tt] doesn't even support XLS, let alone XLSX. Did you have any other MySQL tools in mind? Truth be told, I don't have much experience loading data any other way; never had the need.

Thank you again for your input.

gtg.jpg

GTG
 
You must be using the POI DOM-like model for reading the XLS(x) file, as that tries to load the entire file into memory before returning any data.
You should in fact be using the streaming model (SXSSF, forward-read-only, SAX based) as described on this page: and that obviously requires you to update to the latest POI release, 3.8
That should solve any memory issues you have.
 
TonHu,

According to the page you linked to, SXSSF does NOT support reading files. However, it seems that instead of using the usermodel of XSSF, using the eventmodel may be what I'm looking for. It also allows for streaming of the file.

Thank you again for your input!

gtg.jpg

GTG
 
GregTheGeek,

Sorry I mixed up the data from the table on the SXSSF vs the XSSD usermodel, but I'm glad you found the correct info!
 
Thanks again for the input, TonHu. I finally have the ability to load large XLSX files using the XSSF eventmodel.

I started with this code, and did what we all do: tweaked it! [thumbsup2]

gtg.jpg

GTG
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top