Write out parts of a file based on header strings 1

mrr · Apr 4, 2002

Hi,
I have a file that has numerous lines separated by a unique
header and I want to parse out into individual files based on that header. Here's the example file:
HDR1 SDFSDF SDFSDFSDFSDFSDFSDF
HDR2 SDFSDFSDFSDFSDFSDFSDF
DATA
DATA
DATA
HDR1 SDFSDF SDFSDFSDFSDFSDF
HDR2 SDFSDFSDFSDFSDFSDFSDFF
DATA
DATA
DATA

Here's what I am trying to do. When the script finds an index substring of HDR1 it writes each block of records to a
sequential file named 1.dat
When it finds the next occurance of HDR1 it will then write that group to a file named 2.dat and so on to bottom of file.

I also have a need to adapt this similar script where the second field of the HDR1 record would be the filename.dat for each group of data. I can't use it on this particular data file because the description of field 2 has embedded spaces - so if you can help on the first method, I would be very gratful.
Thanks for the help.

bigoldbulldog · Apr 4, 2002

mrr,

How about

{
if( $1 == "HDR1" ){
if( i > 1 ) close( i ".dat" )
++i
}
if( $1 !~ /HDR/ ){
print > i ".dat"
}
}

Am I to assume the xx.dat files will not contain any of the header information? If not then remove the conditional around the print statement.

Your second issue can be put into play by substituting the
i ".dat" with $2 ".dat" and then you'll probably want a variable to hold the file name so you can close the file as you proceed with the next.

Good luck,
ND [smile]

mrr · Apr 4, 2002

ND,
I tried the script and it works great.
Thanks

mrr · Apr 5, 2002

ND,
After trying this on a big file, It doesn't seem to be closing properly after each file.
I'm running awk and it aborts after 10 files or 174068 records.
The data file I'm running the script against will have
91 files output. Can you get around this in awk/nawk?

bigoldbulldog · Apr 5, 2002

I tried out your description and everything worked out fine (HP-UX 10.20). The problem sounds like limit (try this command) violations on your OS, such as filesize, memory & cpu usage, or disk space for your username. Maybe it is your version of

awk? Sounds like time for the awk gurus to jump in...

Good luck,
ND [smile]

mrr · Apr 5, 2002

ND,
I found out the problem. The 9th group of data records had
headers but no data. Since I was writing out only the data records without headers, it aborted because of nothing to write to output file.

Thanks for the help on this.

vgersh99 · Apr 5, 2002

some awk-s have a limit of "concurently openned" files set to 10 [or 9 as the case may be]. Make sure that given the implemented logic you're not hitting this "condition".

I'd debug the logic with debugging "printf"-s for every iteration outputting the "i" and the current record number [NR or FNR].

As fas as I remember there're no limits set on the number of records. The other possible limit is the number of fields per record.

vlad

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Write out parts of a file based on header strings 1

mrr

Technical User

bigoldbulldog

Programmer

mrr

Technical User

mrr

Technical User

bigoldbulldog

Programmer

mrr

Technical User

vgersh99

Programmer

Similar threads

Part and Inventory Search

Sponsor