Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Write out parts of a file based on header strings 1

Status
Not open for further replies.

mrr

Technical User
May 3, 2001
67
US
Hi,
I have a file that has numerous lines separated by a unique
header and I want to parse out into individual files based on that header. Here's the example file:
HDR1 SDFSDF SDFSDFSDFSDFSDFSDF
HDR2 SDFSDFSDFSDFSDFSDFSDF
DATA
DATA
DATA
HDR1 SDFSDF SDFSDFSDFSDFSDF
HDR2 SDFSDFSDFSDFSDFSDFSDFF
DATA
DATA
DATA

Here's what I am trying to do. When the script finds an index substring of HDR1 it writes each block of records to a
sequential file named 1.dat
When it finds the next occurance of HDR1 it will then write that group to a file named 2.dat and so on to bottom of file.

I also have a need to adapt this similar script where the second field of the HDR1 record would be the filename.dat for each group of data. I can't use it on this particular data file because the description of field 2 has embedded spaces - so if you can help on the first method, I would be very gratful.
Thanks for the help.


 
mrr,

How about

{
if( $1 == "HDR1" ){
if( i > 1 ) close( i ".dat" )
++i
}
if( $1 !~ /HDR/ ){
print > i ".dat"
}
}


Am I to assume the xx.dat files will not contain any of the header information? If not then remove the conditional around the print statement.

Your second issue can be put into play by substituting the
i ".dat" with $2 ".dat" and then you'll probably want a variable to hold the file name so you can close the file as you proceed with the next.

Good luck,
ND [smile]
 
ND,
I tried the script and it works great.
Thanks
 
ND,
After trying this on a big file, It doesn't seem to be closing properly after each file.
I'm running awk and it aborts after 10 files or 174068 records.
The data file I'm running the script against will have
91 files output. Can you get around this in awk/nawk?
 
I tried out your description and everything worked out fine (HP-UX 10.20). The problem sounds like limit (try this command) violations on your OS, such as filesize, memory & cpu usage, or disk space for your username. Maybe it is your version of (n)awk? Sounds like time for the awk gurus to jump in...

Good luck,
ND [smile]
 
ND,
I found out the problem. The 9th group of data records had
headers but no data. Since I was writing out only the data records without headers, it aborted because of nothing to write to output file.

Thanks for the help on this.
 
some awk-s have a limit of "concurently openned" files set to 10 [or 9 as the case may be]. Make sure that given the implemented logic you're not hitting this "condition".

I'd debug the logic with debugging "printf"-s for every iteration outputting the "i" and the current record number [NR or FNR].

As fas as I remember there're no limits set on the number of records. The other possible limit is the number of fields per record.

vlad
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top