Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Splitting a large text file! 2

Status
Not open for further replies.

SirHammer1

Technical User
Sep 4, 2001
5
US
Hello Gurus!
I have a large text file comprised of multiple biographies that I need to split up into separate files for each biography. The only consistent pattern is a label like ABCD: on the last line of the bio, i.e.

John Doe
Forestry Service
Somewhere, Nebraska
ABCD: 321

Jane Hoe:
Research Dept.
ABCD: 456

There are blank lines mingled with some records, so I have to key on the ABCD. Any help would be greatly appreciated, my AWK skills are really old!

Thanks,
SirHammer
 
This should get you started. It will create file f1.dat, f2.dat, etc. for each biography. Put a getline in the if statement if you want to remove the blank line at the beginning of the second and subsequent files.

BEGIN{ fn = "f1.dat"; finx = 1}
{
print > fn
if (substr($0,1,4) == "ABCD") {
close (fn)
finx++
fn = "f" finx ".dat"
}
}

Hope this helps.

CaKiwi
 
I have a similiar situation and ran the above script but got only one file created.

nawk '{ fn = "f1.dat"; finx = 1}
{
print > fn
if (substr($0,1,4) == "ABCD") {
close (fn)
finx++
fn = "f" finx ".dat"
}
}' filename

 
This script will split the file whenever it finds a line starting with ABCD. If no lines start with ABCD you will get only one file. Also you left out the BEGIN in the first line of your script. It should read

nawk 'BEGIN{ fn = "f1.dat"; finx = 1}

Post your input data and I am sure someone will help you.

CaKiwi
 
Just a confirmation, this worked like a CHAMP!

Thanks so much!
 
Thanks it did work...I needed the BEGIN in the script.
 
Can you explain how this script works...
I understand the counter (finx++) and substring part but the rest
is not clear...such as the lines with:

BEGIN { fn = "f1.dat"; finx = 1}
fn = "f" finx ".dat"


 
The line

BEGIN { fn = "f1.dat"; finx = 1}

is executed before the script starts processing any data. It sets the variable fn to the string "f1.dat" which is used as the name of the first file in the print > fn statement.
The first time the line

fn = "f" finx ".dat"

is executed it sets the variable fn to "f2.dat" so the the print > fn statement will now write to file f2.dat. The next time it is executed fn will be "f3.dat" and so on.

Hope this helps.

CaKiwi

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top