i am trying to split an ASCII file which has over 5 million records (total size of file is 1GB) into equal sizes. However the UNIX split command is to slow. As anybody come across a different/faster way of spliting a large file.
I dont know how fast this would be, but it should work:
dd if=bigfile of=smallfile1 ibs=200000000 count=1
dd if=bigfile of=smallfile2 ibs=200000000 count=1 skip=1
dd if=bigfile of=smallfile3 ibs=200000000 count=1 skip=2
dd if=bigfile of=smallfile4 ibs=200000000 count=1 skip=3
dd if=bigfile of=smallfile5 ibs=200000000 count=1 skip=4
This would create 200 million character files; you could use any other number you wanted. If your records are not the same length, you will wind up splitting records.
What do you mean by to slow ? (5 minutes, 5 hours, 5 days ...)
On which platform (*nix, CPU speed, ..) are you working ?
What are the options of the split command you're using ?
Are the records fixed length or variable ?
If you want good answer, ask good question.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.