Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk I think!!

Status
Not open for further replies.

unixguy303

IS-IT--Management
Feb 27, 2006
14
US
I have a large file with multiple header, detail, & Trailer records

I need to split the file into smaller files say 4 parts

I must start each file with a header and end with a trailer

I can have 1 or more detail records

ie:

h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321

any ideas?

Thanks
Brandt



 
What have you tried so far ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Use the split command to break your file into the required files:

MAN split

for example, if bigfile is 8 lines long:

Code:
split -l 2 bigfile

creates four files xaa, xab, xac, xad

Then, for each file append a header to a temp file, append the file in question to the temp file, and, finally, append the trailer to the temp file.



 
split won't work for it would not know what record to split on. each record can be variable length (multiple detail records. a split must occur between at trailer record and a header record.

h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
 
Hi

Is not clear to me how do you know which line in trailer. Here I assumed that it begin with letter "t". You can give the number of parts as value of [tt]s[/tt] as parameter. You must test it before use. I only done some basic tests.
Code:
awk -v s=[green][i]parts[/i][/green] 'BEGIN{f=0}NR==1{c="wc -l<"FILENAME;c|getline l;close(c)}{b=int(NR/(l/s));print $0>FILENAME"."f;if(f!=b){while($0!~/[green][i]^t[/i][/green]/){getline;print $0>FILENAME"."f};close(FILENAME"."f);f=b}}' [green][i]/input/file[/i][/green]
Tested with [tt]gawk[/tt].

Note that is all that is one line.

Feherke.
 
Feherke,

I don't have a clue what you attempting to show me here.
Could you please explain so I might be able to get something working. Do you have anny questions? I really need to get this!!

Thanks in advance
Brandt
 
All you need to do is match your header records to a pattern, e.g. assuming all header records begin with "h"...
Code:
$ awk '/^h/{close(f); f=sprintf("outfile.%03d",++n)}{print $0 > f}' infile
$ head outfile.*
==> outfile.001 <==
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321

==> outfile.002 <==
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
:
etc.
... or use csplit instead.
 
Syntax for csplit is...
Code:
$ csplit -f 'outfile.' -n 3 infile '/^h/' '{*}'
0
102
136
170
102
136
170
$ head outfile.*
==> outfile.000 <==

==> outfile.001 <==
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321

==> outfile.002 <==
h1234567890 1234567890 1234567890
dqwertyuiop qwertyuiop qwertyuiop
dqwertyuiop qwertyuiop qwertyuiop
t0987654321 0987654321 0987654321
:
etc
 
.Hi

Brandt said:
Could you please explain so I might be able to get something working.
Code:
awk -v s=[green][i]parts[/i][/green] '            [gray]# set the number of desired parts[/gray]

BEGIN {                     [gray]# do it before input processing[/gray]
  f=0                       [gray]# initialize the part number[/gray]
}

NR==1 {                     [gray]# when processing the first line[/gray]
  c=[i]"wc -l<"[/i]FILENAME        [gray]# compose shell command to get line count[/gray]
  c|[b]getline[/b] l               [gray]# execute the command and store the result[/gray]
  [b]close[/b](c)                  [gray]# close the pipe to the command[/gray]
}

{
  b=[b]int[/b](NR/(l/s))           [gray]# calculate to which part belongs this line[/gray]
  [b]print[/b] $0>FILENAME[i]"."[/i]f     [gray]# write the line to the f[sup]th[/sup] part file[/gray]
  [b]if[/b] (f!=b) {               [gray]# if the calculated part is not the real one[/gray]
    [b]while[/b] ($0!~/[green][i]^t[/i][/green]/) {      [gray]# repeat while the line is not a trailer[/gray]
      [b]getline[/b]               [gray]# read the next input line[/gray]
      [b]print[/b] $0>FILENAME[i]"."[/i]f [gray]# write the line to the f[sup]th[/sup] part file[/gray]
    }
    [b]close[/b](FILENAME[i]"."[/i]f)     [gray]# close the f[sup]th[/sup] part file[/gray]
    f=b                     [gray]# step forward to the calculated part[/gray]
  }
}

' [green][i]/input/file[/i][/green]

[gray]### variables :
# b - calculated part number of the line
# c - command to get the number of input lines
# f - currently writing part file's number
# l - total number of lines in the input file
# s - desired number of resulted parts[/gray]

Feherke.
 
Feherke,

Thanks This looks great and I believe with your logic this is what I need.

My question now is in your use of "FILENAME" is this supposed to be a variable?

How does this line work? with FILENAME not a variable? or you looking for the literal file name of my file.

c="wc -l<"FILENAME # compose shell command to get line count


Thanks again!!!!

Brandt

 
Hi

The [tt]FILENAME[/tt] built-in variable holds the name of the current input file. I maintained by [tt]awk[/tt] just like the other built-in variables. I think this will explain better :
Code:
[blue]master #[/blue] cat letter.txt
a
b
c

[blue]master #[/blue] cat number.txt
1
2
3

[blue]master #[/blue] awk '{print "line "FNR" of file "FILENAME" : "$0}' letter.txt number.txt
line 1 of file letter.txt : a
line 2 of file letter.txt : b
line 3 of file letter.txt : c
line 1 of file number.txt : 1
line 2 of file number.txt : 2
line 3 of file number.txt : 3

Feherke.
 
Here is what I get when I run it.

Again Thanks!!

# ./foo
awk: Syntax error
at line 15 of program << # se ... >>
context is
print >>> $0>FILENAME"." <<< f # write the line to the fth part f
ile
awk: illegal statement
at line 15 of program << # se ... >>
#
# pg foo
### variables :
# b - calculated part number of the line
# c - command to get the number of input lines
# f - currently writing part file's number
# l - total number of lines in the input file
# s - desired number of resulted parts


awk -v s=4 ' # set the number of desired parts

BEGIN { # do it before input processing
f=0 # initialize the part number
}

NR==1 { # when processing the first line
c="wc -l<"FILENAME # compose shell command to get line count
c|getline l # execute the command and store the result
close(c) # close the pipe to the command
}

{
b=int(NR/(l/s)) # calculate to which part belongs this line
print $0>FILENAME"."f # write the line to the fth part file
if (f!=b) { # if the calculated part is not the real one
while ($0!~/^t/) { # repeat while the line is not a trailer
getline # read the next input line
print $0>FILENAME"."f # write the line to the fth part file
}
close(FILENAME"."f) # close the fth part file
f=b # step forward to the calculated part
}
}

' ./mis07026_pad.txt

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top