Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to Split Large Strings?

Status
Not open for further replies.

sjohri214

Programmer
Jul 19, 2002
24
GB
Hi,

I have been trying to figure out how to split a large String into smaller Strings.

so if I have a String such as the following:

ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGH
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGH
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGH
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGH
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGH

how is it possible to break up such a String (which maybe upto 4 Million characters in length) into a series of smaller Strings of length 30,000 for example.

I have tried to do this using the subString() method, however, this is not practical for larger Strings - since I encounter StringIndexOutOfBoundsExceptions when the second argument to the subString() method is greater than the length of each of the individual lines in the String (60 in the example above).
Any Suggestions?

Many thanks
 
Where do you get these Strings from?
From a file, a Database?
Are they generated?
Perhaps you can split them while getting them.

Did you look at Stringbuffer? (I didn't.)

What is a line?
Are those Lines splitted by \n, \r, \r\n?
May this differ?
 
Hi,

Sorry, I should have been a little clearer.
The String has been read directly from a File (located on my System - read in using BufferedReader br = new BufferedReader(new FileReader(fileName))).

And the lines are always separated with the \n newline character.

Thanks again
 
I'm sorry, but Java is just not designed to do this kind of job (ie splitting 4 million character strings). You'll run out of memory quicker than you can say "Doh!". Use a language such as perl - which is designed for character and text manipulation.

If you really have to do it in Java, try to stay away from String objects, and use promitive data types such as
Code:
char
arrays instead. There is a method in String that creates a char[] array from a String - you may consider using this.
 
If your reading the data from in input stream you can process as many or as few bytes / chars as you wish. You dont't have to "read a line"
 
sedj:
You're wrong.
4 million characters are 4 million bytes in java (depending on the encoding it may be 8 or 16 M).
That's nothing to nowadays Computers.

Even if it would be, sjohri only needs 30.000 at a time.
It only depends on the algorithm you use.

 
stefanwagner :
You try parsing a 4m character String object which is 8/16 Mb in memory (ie the stack not the heap). The JVM would barf.
 
BufferedReader will not readin the whole file into memory. It only read what it's buffer size can hold.

I think your problem should be how to split the file rather how to split the string. In that case you don't limitation on the size of file (except disk space).

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top