Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Any utility/command for splitting of a big file? 1

Status
Not open for further replies.

bansalhimanshu

Programmer
Sep 27, 2004
36
0
0
US
Please tell me a unix command or utility for splitting of a very big text file into small files of say 2 lines each. For now I wrote my own script to do the job but I am looking for some standard command/utility.
 
I was using split for the purpose but it has a limitation of 676 files. I could not understand how to split a file based on lines numbers using csplit. Can you elaborate. I need a large file to be split into a large number of files containing 2 lines of the larges file. The number of splitted files will well exceed 1000.
 
man awk

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
bansalhimanshu said:
But does it work when the number of files is greater than 676?

only one way to find out, ain't it?

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi

With a suffix length of 2 characters, each having 26 possible values, of course you can have only 26^2=676 suffixes. But with a suffix length of 3, there are 26^3=17576 suffixes. And so on.

I tried it out with the length of 3 before I posted earlier. Now I tried even for 26^4=456976 and works. Now please try it yourself too.

Feherke.
 
Thanks Feherke, it worked fine.

mrn as per your suggestion I was looking to find out the quantity of suffix. I could not understand your solution but got this one (may be this is what you were referring to):
Size of suffix = log (number, basename)
where
number = (total number of lines in big file / number of lines required in splitted files)
basename = 26

This gave me value of 2 when a big file of (676 * 2 =) 1352 lines needs to be divided into a file of two line each.
 
That's basically what I meant but I was thinking along the lines of

Bigfile=2300Mb

2300/676 = 3.4mb (ish)

split -b 4m filename

would split the file into 575 chunks of 4Mb in size

or wc -l bigfile =22000000/676 = 32545 (ish)

split -l 32545 -a 3 bigfile smallfile

this would create 32545 line files called smallfileaaa, smallfileaab

Using -l -a allows you upto a max of 17,576 files (On AIX 5.2)

-l = lines
-a = number of chars to us as suffix

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top