Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem splitting data file into several data files

Status
Not open for further replies.

fabien

Technical User
Sep 25, 2001
299
AU
Hi!

I have the following file:
Well Data Export
Project : CA_ATHAB_SURMONT_NAM
Cartographic System : UTM 12N - NAD 1927 Canada
Depth Mode : MD
Depth Unit : International meters
Distance Unit : meters
Format File : /soft/lmk/2003/dat/wlx/VDP_devi.wlx
Date : Mon Mar 15 15:21:13 2004


1 PAD101-NP01-P01-INF 0.000 501492.530

6227862.940 0.000
0.500 501492.513 6227862.919 0.499
1.000 501492.496 6227862.897 0.998
1.500 501492.479 6227862.876 1.497
2.000 501492.463 6227862.854 1.996
2.500 501492.446 6227862.833 2.495

....

1 TOTO-NP01-P01-INF 0.000 501492.530 6227862.940 0.000
0.500 501492.513 6227862.919 0.499



...

The goal is to split this file based on the name is column two i.e "PAD201-NP01-P01-INF".dat then for each file I would like to write the header
" Well Data Export
Project : CA_ATHAB_SURMONT_NAM
Cartographic System : UTM 12N - NAD 1927 Canada
Depth Mode : MD
Depth Unit : International meters
Distance Unit : meters
Format File : /soft/lmk/2003/dat/wlx/VDP_devi.wlx
Date : Mon Mar 15 15:21:13 2004
"

+ all the lines until the next 1 XXXXXX item

Thanks!

 
I have started writing the following but it does not work

BEGIN {
#grab header
for ( i = 1; i < 11; i++ ) {
header = $0
}
}
{
print $0
if ($1 == "1") {
filename = $2"_dev.dat"
for ( i = 1; i < 10; i++ ) {
printf "%s",header > filename
}
print $0 >> filename

} else {
print $0 >> filename
}
}
 
The main problem you have is that you need a getline in the begin section. Here's my version of your program

BEGIN {
#grab header
for ( i = 1; i < 11; i++ ) {
getline
header = $0
}
}
{
if ($1 == "1") {
filename = $2"_dev.dat"
for ( i = 1; i < 11; i++ ) {
print header > filename
}
}
print $0 > filename
}

CaKiwi
 
Try this
[tt]
FNR==1 {
hdr_count = 0;
while ($1 != "1") {
header[++hdr_count] = $0;
getline;
}
}
$1 == "1" {
if (filename != "") close(filename);
filename = $2 "_dev.dat";
for (h=1; h<=hdr_count; h++)
print header[h] > filename;
}
{
print $0 > filename;
}
[/tt]

Jean Pierre.
 
Or

$1 != "1" && !i {
header[++n] = $0
next
}
$1 == "1" {
if (filename) close(filename)
filename = $2"_dev.dat"
for ( i = 1; i <= n; i++ ) print header > filename
}
{
print > filename
}

aigles reminded me to put in the

if (filename) close(filename)


CaKiwi
 
As some awk version doesn't like getline without redirection in the BEGIN section, I woukd suggest something like this:
$1==1{
++found1;if(found1>1)close(filename)
filename=$2"_dev.dat"
for(i=1;i<=h;++i)print header>filename
}
!found1{header[++h]=$0;next}
{print>filename}


Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
I use 'FNR==1' pattern for reading the header section to allow multiple input file.

Jean Pierre.
 
Thanks for all your answer but this does not work because some of the data points are 1.0 (and we are testing on "1" so it picks the next column as filename. I think all the filemame start with PAD so we could test on this but I don't know how to extract it.. maybe with substr?
 
Based on aigles's suggestion I have done the following and it seems to work.. Thanks again

FNR==1 {
hdr_count = 0;
while (substr($2,1,1) != "P") {
header[++hdr_count] = $0;
getline;
}
}
substr($2,1,1) == "P" {
if (filename != "") close(filename);
filename = $2 "_dev.dat";
for (h=1; h<=hdr_count; h++)
print header[h] > filename;
}
{
print $0 > filename;
}
 
[tt]
NF && 1==NR,/^ *$/ { header = header $0 "\n" }

"P"==substr($2,1,1) {if (filename) close(filename);
filename=$2 "_dev.dat"; print header >filename }

filename { print >filename }
[/tt]


If you have nawk, use it instead of awk because
on some systems awk is very old and lacks many useful features.

Let me know whether or not this helps.

 
Slightly better:
[tt]
# Skips blank lines at beginning.
NF && !header,/^ *$/ { header = header $0 "\n" }

"P"==substr($2,1,1) {if (filename) close(filename);
filename=$2 "_dev.dat"; print header >filename }

filename { print >filename }
[/tt]
 
fabien, just a clarification on my post.
Does the test $1==1 fail with 1.0 ?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top