Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split a file / Lines must stay together

Status
Not open for further replies.

btinder

Programmer
Mar 12, 2004
27
US
Hi all,

I want to split a large file (10,000) lines into 2 5,000 line files, but with a twist:

1. Records have a string that I sort the main input file on (not a fixed string, it changes).

2. when splitting the file, records with the same exact identifier must stay together.

So for example in the input file I have this:

Record1
(Line 4999) 11110 Someone
(Line 5000) 11111 BLAH BLAH
(Line 5001) 11111 BLAH JR.
(Line 5002) 11111 BLAH Three
(Line 5003) 11112 Someone Else

So the file can't be split in between BLAH BLAH and BLAH Jr. but instead split after BLAH Three.

Does that make sense? Any help would be greatly appreciated.
 
What are your meaning of splitting?
A different file for each identiier ?
Or just a coalesce stuff ?

Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
Split meaning, make new files...

1 large input file (10,000 lines) becomes 2 smaller files (5,000 lines each).



 

Code:
BEGIN{ outfile[0]="foo"; outfile[1]="bar"
  which=0; id=""; splitpoint=5000
}

!which && (NR>splitpoint)     { if ($1!=id) which=1 }
{print >outfile[which]; id=$1 }

Let me know whether or not this helps.
 
The code looks good so far, I will report back when I test it with a full input file and let you know if it worked.

Thanks!
 
You can try and adapt this script(it does a little bit more than you need)

Code:
#!/usr/bin/awk -f
# splitfile

function NewSplitFile() {
   if (SplitFile != "") close(SplitFile);
   SplitFile  = OUT;
   sub(/%/,sprintf("%02d",++SplitCount),SplitFile);
   SplitLines = 0
   print "Writing to",SplitFile,"..."
}

NR==1 {
   if (FIELD == "") FIELD = 1;
   if (SPLIT == "") SPLIT = 1000;
   if (OUT   == "") OUT   = FILENAME
   if (OUT !~ /%/ ) OUT   = OUT "_%"
   NewSplitFile();
}

SplitLines >= SPLIT {
   if ($FIELD != PrvField) NewSplitFile();
}
{
   print >> SplitFile;
   SplitLines += 1;
   PrvField = $FIELD;
}
[tt]
Usage: splitfile [FIELD=n] [SPLIT=n] [OUT=xxx] input_file(s)
FIELD=n Identifier field number (Def=1)
SPLIT=n Output file size (Def=1000 lines)
OUT=xxx Output file name, % is replaced by split file number (Def=First_inputfile_%)

Example: splitfile SPLIT=5000 OUT=result_%.dat result.dat[/tt]

Jean Pierre.
 
Hi Jean Pierre,

I tested the script, and it worked great! Thank you so much!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top