Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

complex modification file - awk ?

Status
Not open for further replies.

vlad9999

Programmer
Apr 19, 2010
23
FR
I have a file like this:

ident1 0ZZZ name1
ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident1 9ZZZ

ident2 0ZZZ name2
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 9ZZZ

...

Each block consists of several lines identified with a code identifier on the first 7 characters, and each block begins
with a line ident1 0ZZZ name1 and ends with a line ident1 9ZZZ.

What I want? Just add after the code 1635 (always the same) namek value given in the first line of each block (name1, name2 ,...).

This should be possible with awk I suppose ... but how? I await your help, thank you ...
 
Hi

vlad9999 said:
What I want?
Would be helpful to post a sample too.

As far as I understand, you want something like this :
Code:
ident1 0ZZZ name1
ident1 1635 [red]name1[/red] miscellaneous informations
ident1 1635 [red]name1[/red] miscellaneous informations
ident1 9ZZZ
ident2 0ZZZ name2
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 1635 [red]name2[/red] miscellaneous informations
ident2 9ZZZ
In that case, the following code would be enough :
Code:
awk '$2=="0ZZZ"{n=$3}$2=="1635"{$2=$2" "n}1' /input/file
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].


Feherke.
 
Thank you, thank you, thank you!
Decidedly awk is super! We can do things that we can not do otherwise and that a very simple way! Of course without loops while read, very expensive.
 
Hi

[tt][blue][small][ignore][off-topic][/ignore][/small][/blue][/tt]
AWK is indeed super. But solving extremely simple problems like this is not really relevant. Other languages can also solve it without explicit looping :
Code:
perl -pae '$n=$F[2]if$F[1]eq"0ZZZ";s/1635/1635 $n/' /input/file

[gray]# or[/gray]

ruby -pae 'n=$F[2]if$F[1]=="0ZZZ";sub!(/1635/,"1635 "+n)' /input/file

[gray]# or[/gray]

sed '/ 0ZZZ /{h;s/.* //;x};/ 1635 /{G;s/ 1635 \(.*\)\n\(.*\)/ 1635 \2 \1/}' /input/file
[tt][blue][small][ignore][/off-topic][/ignore][/small][/blue][/tt]

Feherke.
 
Bravo! It makes me think!
Indeed, it is good to know very well at least one of these languages to develop good script UNIX (or Linux). Again thank you
 
Finally I use perl, because awk converts several successive spaces into one.

By cons, how to pass variables to the perl program (the equivalent of -v var = $ var for awk) ?
 
Hi

vlad9999 said:
Finally I use perl, because awk converts several successive spaces into one.
Where are those multiple spaces ? There is none in your sample data.
vlad9999 said:
By cons, how to pass variables to the perl program (the equivalent of -v var = $ var for awk) ?
Exported environment variables are accessible through [tt]%ENV[/tt] :
Code:
export foo=bar; perl -e 'print $ENV{"foo"}'
Or enclose the Perl code in double quotes, to let the shell expand the variables before executing the command.


Feherke.
 
Sorry, it's true, in a first moment I had not noticed this, but the part miscellaneous informations contain successive spaces.
 
Hi

That can be solved quite easily :
Code:
awk '$2=="0ZZZ"{n=$3}$2=="1635"{[highlight]sub([/highlight]$2[highlight],[/highlight]$2" "n[highlight])[/highlight]}1' /input/file

[gray]# or[/gray]

awk [highlight]-F'[ ]'[/highlight] '$2=="0ZZZ"{n=$3}$2=="1635"{$2=$2" "n}1' /input/file

Feherke.
 
Now my whole script is finished. My question on awk was part of an script whose mission was to extract dossiers from a file (we has into an other file the list of dossiers to extract). Each dossiers in the file is identified with an identifier ident1, ident2 ... and can have multiple lines.

To do this, it's me who has inserted into the original file the lines identk 0ZZZ name1 and identk 9ZZZ. Thus, after sort, the lines to be extracted are framed with the two lines and are easily extracted with the awk '/ 0ZZZ /, / 9ZZZ / ".

The problem was that we should also add a numbering of the extracted dossiers, so I inserted the line with identk 0ZZZ name1, with name1 = numbering of the extracted dossier.
 
No, name is obtained with cat -n to numbering the extracted dossiers in the file of dossiers to extract. That is how I insert the two lines "start" and "end" in the original file:
Code:
cat $list_file | sed "s|$| 0ZZZ|g" | [COLOR=blue]cat -n[/color] | awk '{printf "%7s %4s %08d\n",$2,$3,$1}'  \
      >> $work_file
cat $list_file | sed "s|$| 9ZZZ|g" >> $work_file
with $list_file:
ident1
ident2
...
 
So then (CH1="0ZZZ", CH2="9ZZZ", CODE0="1635"):
Code:
cat $work_file | sort | awk "/ $CH1/,/ $CH2/" |   \
   awk -v CODE0=$CODE0 -v CH1=$CH1 '$2==CH1{n=$3} {sub(CODE0,CODE0""n)}1'   \
   | egrep -v " $CH1| $CH2"
 
Hi

No ? Well, I would say, there you are generating the name* programmatically. Ok, not based on the ident* value, but based on the ident*'s position in a file. Quite the same thing.

As I understand, you have two input file like these :
Code:
ident1
ident2
Code:
ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
Then this code :
Code:
awk 'FNR==NR{l[$1]=NR;next}{sub($2,sprintf("%s %08d",$2,l[$1]))}1' list_file whatever_file
Will generate this output :
Code:
ident1 1635 00000001 miscellaneous informations
ident1 1635 00000001 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
As I understand, the 0ZZZ and 9ZZZ lines are just helpers and would be removed at the end.

Feherke.
 
Whatever_file file also contains other ident-s we do not want:
Code:
ident3 1635 miscellaneous informations
ident3 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident1 1635 miscellaneous informations
ident4 1635 miscellaneous informations
ident4 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident2 1635 miscellaneous informations
ident5 1635 miscellaneous informations
ident5 1635 miscellaneous informations
I am not able dee test now, but if your program gives the same result output file with that whatever_file, is OK.
 
I tested and the result is almost good.
Code:
[COLOR=red]ident3 1635 00000000 miscellaneous informations
ident3 1635 00000000 miscellaneous informations[/color]
ident1 1635 00000001 miscellaneous informations
ident1 1635 00000001 miscellaneous informations[COLOR=red]
ident4 1635 00000000 miscellaneous informations
ident4 1635 00000000 miscellaneous informations[/color]
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations
ident2 1635 00000002 miscellaneous informations[COLOR=red]
ident5 1635 00000000 miscellaneous informations
ident5 1635 00000000 miscellaneous informations[/color]
It remains only to eliminate lines that are not numbered (numbered to 0).
In any case, awk is stronger than I thought, here he works with two files. No need helpers lines.
By cons with this syntax I do not know if we can pass the file whatever_file with a pipe, but hey, it's true that in this case I do not need.
 
Hi

vlad9999 said:
Whatever_file file also contains other ident-s we do not want:
That is easy :
Code:
awk 'FNR==NR{l[$1]=NR;next}[highlight]$1 in l[/highlight]{sub($2,sprintf("%s %08d",$2,l[$1]))[highlight];print[/highlight]}' list_file whatever_file

[gray]# or[/gray]

awk 'FNR==NR{l[$1]=NR;next}{sub($2,sprintf("%s %08d",$2,l[$1]))}[highlight]l[$1][/highlight]' list_file whatever_file
vlad9999 said:
By cons with this syntax I do not know if we can pass the file whatever_file with a pipe, but hey, it's true that in this case I do not need.
One of the input files can be read from the standard input :
Code:
cat whatever_file | awk 'FNR==NR{l[$1]=NR;next}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}' list_file [highlight]-[/highlight]

[gray]# or[/gray]

cat list_file | awk 'FNR==NR{l[$1]=NR;next}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}' [highlight]-[/highlight] whatever_file
If you need to get more files from the standard input, you have to get one of them through
explicit file reading :
Code:
cat whatever_file | awk 'BEGIN{while("cat list_file"|getline)l[$1]=++n}$1 in l{sub($2,sprintf("%s %08d",$2,l[$1]));print}'

[gray]# or[/gray]

cat list_file | awk '{l[$1]=NR}END{while("cat whatever_file"|getline)if($1 in l){sub($2,sprintf("%s %08d",$2,l[$1]));print}}'

Feherke.
 
waw! with all these examples, I see otherwise awk now. THANK YOU
 
This is not on the original topic, but just for accuracy, after testing, the "-" causes only reading the first word of the file !
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top