Splitting a large text file! 2

SirHammer1 · Sep 4, 2001

Hello Gurus!
I have a large text file comprised of multiple biographies that I need to split up into separate files for each biography. The only consistent pattern is a label like ABCD: on the last line of the bio, i.e.

John Doe
Forestry Service
Somewhere, Nebraska
ABCD: 321

Jane Hoe:
Research Dept.
ABCD: 456

There are blank lines mingled with some records, so I have to key on the ABCD. Any help would be greatly appreciated, my AWK skills are really old!

Thanks,
SirHammer

CaKiwi · Sep 4, 2001

This should get you started. It will create file f1.dat, f2.dat, etc. for each biography. Put a getline in the if statement if you want to remove the blank line at the beginning of the second and subsequent files.

BEGIN{ fn = "f1.dat"; finx = 1}
{
print > fn
if (substr($0,1,4) == "ABCD&quot

{
close (fn)
finx++
fn = "f" finx ".dat"
}
}

Hope this helps.

CaKiwi

SirHammer1 · Sep 4, 2001

Thanks much, this will help!

teser · Sep 5, 2001

I have a similiar situation and ran the above script but got only one file created.

nawk '{ fn = "f1.dat"; finx = 1}
{
print > fn
if (substr($0,1,4) == "ABCD&quot

{
close (fn)
finx++
fn = "f" finx ".dat"
}
}' filename

CaKiwi · Sep 5, 2001

This script will split the file whenever it finds a line starting with ABCD. If no lines start with ABCD you will get only one file. Also you left out the BEGIN in the first line of your script. It should read

nawk 'BEGIN{ fn = "f1.dat"; finx = 1}

Post your input data and I am sure someone will help you.

CaKiwi

SirHammer1 · Sep 5, 2001

Just a confirmation, this worked like a CHAMP!

Thanks so much!

teser · Sep 5, 2001

Thanks it did work...I needed the BEGIN in the script.

teser · Sep 5, 2001

Can you explain how this script works...
I understand the counter (finx++) and substring part but the rest
is not clear...such as the lines with:

BEGIN { fn = "f1.dat"; finx = 1}
fn = "f" finx ".dat"

CaKiwi · Sep 5, 2001

The line

BEGIN { fn = "f1.dat"; finx = 1}

is executed before the script starts processing any data. It sets the variable fn to the string "f1.dat" which is used as the name of the first file in the print > fn statement.
The first time the line

fn = "f" finx ".dat"

is executed it sets the variable fn to "f2.dat" so the the print > fn statement will now write to file f2.dat. The next time it is executed fn will be "f3.dat" and so on.

Hope this helps.

CaKiwi

teser · Sep 5, 2001

Thanks

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Splitting a large text file! 2

SirHammer1

Technical User

CaKiwi

Programmer

SirHammer1

Technical User

teser

Technical User

CaKiwi

Programmer

SirHammer1

Technical User

teser

Technical User

teser

Technical User

CaKiwi

Programmer

teser

Technical User

Similar threads

Part and Inventory Search

Sponsor