Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...I always seem to find what I need in previous threads without starting my own. I have started a couple but that's it..."

Geography

Where in the world do Tek-Tips members come from?

Memory and working with large filesHelpful Member! 

GregTheGeek (Programmer)
2 Aug 12 18:38
Hello.

I am working on updating a Java application that was originally meant to load XLS files into a database table - MySQL, if you're curious. It utilizes the Apache POI libraries, in part, to make this happen. It's not fast, it uses a lot of memory, but it works.

Now that our organization has upgraded to Office 2010, I am having to change automation processes to work with XLSX (ugh). This leads us to my current issue. The file I am currently working with contains, uncompressed, approximately 350 MiB of data. If you are familiar with the new file formats, you'll know the XLSX files is simply a container for several other files; some of which hold the actual data in XML. That being said, I'm running into a java.lang.OutOfMemoryError when simply trying to instantiate the workbook object. I haven't even been able to READ it, at this point.

Now, before you go suggesting that I increase the heap space with -Xmx150T, I've done that. The max the JVM will allow me to go on my machine is 1250 MiB. Every time I run the application, I watch Java in the Task Manager suck up more, and more memory until it hits this limit, then it pukes with the aforementioned error.

This may be a good time to tell you about my environment:
JDK 7/JRE 1.7.0_05 (Oracle)
NetBeans 7.1.2
Windows7 SP1 x86 with 3.5 GB (usable) RAM
Need more info? Please ask.

Does anyone have any other suggestions for reading XLSX files with Java? Are there some better ways I can do this? I'd really rather not, but will if I have to, resort to saving each worksheet as a CSV first. Maybe I can step into the XLSX container, grab the XML file for the worksheet I wish to load, and use an XML parser? <grumble>

TIA!
I haven't been on here in a while, are we still saying that?


GTG

Diancecht (Programmer)
3 Aug 12 4:38
How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Cheers,
Dian

GregTheGeek (Programmer)
3 Aug 12 14:19

Quote (Diancecht)

How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Dian,

Thank you for your response. The data, unzipped, is nearly 400MiB - my previous estimate was a little low. Why simply opening this file with POI is causing the heap is bloating to over a gigabyte is beyond me.

The main reason I'm using Java (aside from legacy support) is because most of these processes are automated; little to no human interaction. Using LOAD DATA INFILE... doesn't even support XLS, let alone XLSX. Did you have any other MySQL tools in mind? Truth be told, I don't have much experience loading data any other way; never had the need.

Thank you again for your input.


GTG

Helpful Member!  TonHu (Programmer)
6 Aug 12 6:35
You must be using the POI DOM-like model for reading the XLS(x) file, as that tries to load the entire file into memory before returning any data.
You should in fact be using the streaming model (SXSSF, forward-read-only, SAX based) as described on this page: http://poi.apache.org/spreadsheet/index.html and that obviously requires you to update to the latest POI release, 3.8
That should solve any memory issues you have.
GregTheGeek (Programmer)
6 Aug 12 10:31
TonHu,

I will give it a try. Thank you!


GTG

GregTheGeek (Programmer)
6 Aug 12 17:32
TonHu,

According to the page you linked to, SXSSF does NOT support reading files. However, it seems that instead of using the usermodel of XSSF, using the eventmodel may be what I'm looking for. It also allows for streaming of the file.

Thank you again for your input!


GTG

TonHu (Programmer)
20 Aug 12 16:28
GregTheGeek,

Sorry I mixed up the data from the table on the SXSSF vs the XSSD usermodel, but I'm glad you found the correct info!
GregTheGeek (Programmer)
18 Sep 12 17:23
Thanks again for the input, TonHu. I finally have the ability to load large XLSX files using the XSSF eventmodel.

I started with this code, and did what we all do: tweaked it! thumbsup2


GTG

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close