complex modification file - awk ?

vlad9999 · Apr 19, 2010

I have a file like this:

ident1 0ZZZ name1
ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident1 9ZZZ
ident2 0ZZZ name2
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 9ZZZ
...

Each block consists of several lines identified with a code identifier on the first 7 characters, and each block begins
with a line ident1 0ZZZ name1 and ends with a line ident1 9ZZZ.

What I want? Just add after the code 1635 (always the same) namek value given in the first line of each block (name1, name2 ,...).

This should be possible with awk I suppose ... but how? I await your help, thank you ...

feherke · Apr 19, 2010

Hi

vlad9999 said:
What I want?

Would be helpful to post a sample too.

As far as I understand, you want something like this :

Code:

ident1 0ZZZ name1
ident1 1635 [red]name1[/red] miscellaneous informations
ident1 1635 [red]name1[/red] miscellaneous informations
ident1 9ZZZ
ident2 0ZZZ name2
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 9ZZZ

In that case, the following code would be enough :

Code:

awk '$2=="0ZZZ"{n=$3}$2=="1635"{$2=$2" "n}1' /input/file

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 19, 2010

Thank you, thank you, thank you!
Decidedly awk is super! We can do things that we can not do otherwise and that a very simple way! Of course without loops while read, very expensive.

feherke · Apr 19, 2010

Hi

[tt][blue][small][ignore][off-topic][/ignore][/small][/blue][/tt]
AWK is indeed super. But solving extremely simple problems like this is not really relevant. Other languages can also solve it without explicit looping :

Code:

perl -pae '$n=$F[2]if$F[1]eq"0ZZZ";s/1635/1635 $n/' /input/file

[gray]# or[/gray]

ruby -pae 'n=$F[2]if$F[1]=="0ZZZ";sub!(/1635/,"1635 "+n)' /input/file

[gray]# or[/gray]

sed '/ 0ZZZ /{h;s/.* //;x};/ 1635 /{G;s/ 1635 \(.*\)\n\(.*\)/ 1635 \2 \1/}' /input/file

[tt][blue][small][ignore][/off-topic][/ignore][/small][/blue][/tt]

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 19, 2010

Bravo! It makes me think!
Indeed, it is good to know very well at least one of these languages to develop good script UNIX (or Linux). Again thank you

vlad9999 · Apr 19, 2010

Finally I use perl, because awk converts several successive spaces into one.

By cons, how to pass variables to the perl program (the equivalent of -v var = $ var for awk) ?

feherke · Apr 19, 2010

Hi

vlad9999 said:
Finally I use perl, because awk converts several successive spaces into one.

Where are those multiple spaces ? There is none in your sample data.

vlad9999 said:
By cons, how to pass variables to the perl program (the equivalent of -v var = $ var for awk) ?

Exported environment variables are accessible through [tt]%ENV[/tt] :

Code:

export foo=bar; perl -e 'print $ENV{"foo"}'

Or enclose the Perl code in double quotes, to let the shell expand the variables before executing the command.

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 20, 2010

Sorry, it's true, in a first moment I had not noticed this, but the part miscellaneous informations contain successive spaces.

feherke · Apr 20, 2010

Hi

That can be solved quite easily :

Code:

awk '$2=="0ZZZ"{n=$3}$2=="1635"{[highlight]sub([/highlight]$2[highlight],[/highlight]$2" "n[highlight])[/highlight]}1' /input/file

[gray]# or[/gray]

awk [highlight]-F'[ ]'[/highlight] '$2=="0ZZZ"{n=$3}$2=="1635"{$2=$2" "n}1' /input/file

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 20, 2010

Good to know! Both work great!

vlad9999 · Apr 20, 2010

Now my whole script is finished. My question on awk was part of an script whose mission was to extract dossiers from a file (we has into an other file the list of dossiers to extract). Each dossiers in the file is identified with an identifier ident1, ident2 ... and can have multiple lines.

To do this, it's me who has inserted into the original file the lines identk 0ZZZ name1 and identk 9ZZZ. Thus, after sort, the lines to be extracted are framed with the two lines and are easily extracted with the awk '/ 0ZZZ /, / 9ZZZ / ".

The problem was that we should also add a numbering of the extracted dossiers, so I inserted the line with identk 0ZZZ name1, with name1 = numbering of the extracted dossier.

feherke · Apr 20, 2010

Hi

Wait a moment. So you are able to deduce the name* programmatically from ident* ? Then probably all this could be done easier.

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 20, 2010

No, name is obtained with cat -n to numbering the extracted dossiers in the file of dossiers to extract. That is how I insert the two lines "start" and "end" in the original file:

Code:

cat $list_file | sed "s|$| 0ZZZ|g" | [COLOR=blue]cat -n[/color] | awk '{printf "%7s %4s %08d\n",$2,$3,$1}'  \
      >> $work_file
cat $list_file | sed "s|$| 9ZZZ|g" >> $work_file

with $list_file:
ident1
ident2
...

vlad9999 · Apr 20, 2010

So then (CH1="0ZZZ", CH2="9ZZZ", CODE0="1635"):

Code:

cat $work_file | sort | awk "/ $CH1/,/ $CH2/" |   \
   awk -v CODE0=$CODE0 -v CH1=$CH1 '$2==CH1{n=$3} {sub(CODE0,CODE0""n)}1'   \
   | egrep -v " $CH1| $CH2"

feherke · Apr 20, 2010

Hi

No ? Well, I would say, there you are generating the name* programmatically. Ok, not based on the ident* value, but based on the ident*'s position in a file. Quite the same thing.

As I understand, you have two input file like these :

Code:

ident1
ident2

Code:

ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations

Then this code :

Code:

awk 'FNR==NR{l[$1]=NR;next}{sub($2,sprintf("%s %08d",$2,l[$1]))}1' list_file whatever_file

Will generate this output :

Code:

ident1 1635 00000001 miscellaneous informations
ident1 1635 00000001 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations

As I understand, the 0ZZZ and 9ZZZ lines are just helpers and would be removed at the end.

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 20, 2010

Whatever_file file also contains other ident-s we do not want:

Code:

ident3 1635 miscellaneous informations
ident3 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident4 1635 miscellaneous informations
ident4 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident5 1635 miscellaneous informations
ident5 1635 miscellaneous informations

I am not able dee test now, but if your program gives the same result output file with that whatever_file, is OK.

vlad9999 · Apr 21, 2010

I tested and the result is almost good.

Code:

[COLOR=red]ident3 1635 00000000 miscellaneous informations
ident3 1635 00000000 miscellaneous informations[/color]
ident1 1635 00000001 miscellaneous informations
ident1 1635 00000001 miscellaneous informations[COLOR=red]
ident4 1635 00000000 miscellaneous informations
ident4 1635 00000000 miscellaneous informations[/color]
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations[COLOR=red]
ident5 1635 00000000 miscellaneous informations
ident5 1635 00000000 miscellaneous informations[/color]

It remains only to eliminate lines that are not numbered (numbered to 0).
In any case, awk is stronger than I thought, here he works with two files. No need helpers lines.
By cons with this syntax I do not know if we can pass the file whatever_file with a pipe, but hey, it's true that in this case I do not need.

feherke · Apr 21, 2010

Hi

vlad9999 said:
Whatever_file file also contains other ident-s we do not want:

That is easy :

Code:

awk 'FNR==NR{l[$1]=NR;next}[highlight]$1 in l[/highlight]{sub($2,sprintf("%s %08d",$2,l[$1]))[highlight];print[/highlight]}' list_file whatever_file

[gray]# or[/gray]

awk 'FNR==NR{l[$1]=NR;next}{sub($2,sprintf("%s %08d",$2,l[$1]))}[highlight]l[$1][/highlight]' list_file whatever_file

vlad9999 said:
By cons with this syntax I do not know if we can pass the file whatever_file with a pipe, but hey, it's true that in this case I do not need.

One of the input files can be read from the standard input :

Code:

cat whatever_file | awk 'FNR==NR{l[$1]=NR;next}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}' list_file [highlight]-[/highlight]

[gray]# or[/gray]

cat list_file | awk 'FNR==NR{l[$1]=NR;next}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}' [highlight]-[/highlight] whatever_file

If you need to get more files from the standard input, you have to get one of them through
explicit file reading :

Code:

cat whatever_file | awk 'BEGIN{while("cat list_file"|getline)l[$1]=++n}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}'

[gray]# or[/gray]

cat list_file | awk '{l[$1]=NR}END{while("cat whatever_file"|getline)if($1 in l){sub($2,sprintf("%s %08d",$2,l[$1]));print}}'

Feherke.

http://free.rootshell.be/~feherke/

vlad9999 · Apr 21, 2010

waw! with all these examples, I see otherwise awk now. THANK YOU

vlad9999 · Apr 22, 2010

This is not on the original topic, but just for accuracy, after testing, the "-" causes only reading the first word of the file !

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

complex modification file - awk ?

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor