Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk script to remove invalid chars in a variable length record

Status
Not open for further replies.

confuseddddd

Programmer
May 22, 2003
53
US
Used this script to remove the asterick from the 2nd field where sometimes contains an asterick.
> awk 'BEGIN { FS=OFS="*" }
> {
> if ($1=="CLM") {
> while ( $3 !~ /^[0-9]*[.][0-9][0-9]$/ ) {
> $2=$2 $3;
> for (x=3; x<split($0,a); x++)
> $x=$(x+1);
> }
> print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25;
> } else {
> print $0;
> }
> }' $(date -u +%Y%m%d).IN > $(date -u +%Y%m%d).001

However, because the fields in the record can vary, I am getting strange results:
original:
CLM*SANAR00 7072*65.00*WC**11^^1******EM^^^TX
after script
CLM*SANAR00 7072*65.00*WC**11^^1******EM^^^TX*************

original:
CLM*GALKA001 9778*295.00*WC**11^^1******EM^^^TX
after script:
CLM*GALKA001 9778*295.00*WC**11^^1******EM^^^TX*************

original:
CLM*TATMI004569*100.00*WC**11^^1******EM^^^TX********29
after script
CLM*TATMI004569*100.00*WC**11^^1******EM^^^TX********29*****

original:
CLM*37494-002*97.70*WC**11^^1******EM^^^FL
after script
CLM*37494-002*97.70*WC**11^^1******EM^^^FL*************

original:
CLM*001129S5 FWC*615.00*WC**11^^1******EM^^^FL
after script
CLM*001129S5 FWC*615.00*WC**11^^1******EM^^^FL*************

original:
CLM*350049582*35*388.00*WC**11^^1******EM^^^KS
after script
CLM*350049582*35*388.00*WC**11^^1******EM^^^KS*************

Anyone have any ideas on how to remove the unwanted asterick from the second field but still leave the rest of the record alone???
 
Try this:
Code:
}' $(date -u +%Y%m%d).IN |
 sed 's!\**$!! > $(date -u +%Y%m%d).001

Hope This Help
PH.
 
Create an file called, say, &quot;rmstar&quot; containing ...

#!/usr/bin/sed -f
s/^\(CLM\*[^\*]*\)\*\([^\*]*\*[0-9]*\.[0-9][0.9]\)/\1\2/
s/^\(CLM\*[^\*]*\)\*\([^\*]*\*[0-9]*\.[0-9][0.9]\)/\1\2/
s/^\(CLM\*[^\*]*\)\*\([^\*]*\*[0-9]*\.[0-9][0.9]\)/\1\2/
s/^\(CLM\*[^\*]*\)\*\([^\*]*\*[0-9]*\.[0-9][0.9]\)/\1\2/
s/^\(CLM\*[^\*]*\)\*\([^\*]*\*[0-9]*\.[0-9][0.9]\)/\1\2/

Use chmod to make it executable, and use it like this....

rmstar $(date -u +%Y%m%d).IN > $(date -u +%Y%m%d).001

Tested....

< CLM*274*1979731*4.00*WC**21^^1******EM^^^IL
> CLM*2741979731*4.00*WC**21^^1******EM^^^IL

< CLM*350049582*35*388.00*WC**11^^1******EM^^^KS
> CLM*35004958235*388.00*WC**11^^1******EM^^^KS

 
Okay, that worked great except it didn't catch all of the situations...

Have the following:
CLM*269605*73612*542.75*WC**11^^1******EM^^^NM
should be CLM*26960573612*542.75*WC**11^^1******EM^^^NM


CLM*30750****41.91*WC**11^^1******EM^^^TX
should be CLM*30750*41.91*WC**11^^1******EM^^^TX

CLM*210599*1*1*715.00*WC**22^^1******EM^^^TX
should be CLM*21059911*715.00*WC**22^^1******EM^^^TX

Any other ideas...I didn't see the s commands what was
different if anything. I thought I could just add another s command but wasn't sure....
 
I'm not a sed or awk heavy user, but to those who are, regarding the original script in your post: is there some way to tell awk to &quot;print all the fields until you get to an end of line&quot; rather than have $1,$2,$3...$25? some of the lines don't have 25 fields and that seems to be why those extra * were being printed.
 
print $0

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
bi,

If you needed to start part way through the line you could do something like this:

[tt]echo one two three four five | awk '{
for (i=3;i<=NF;i++) {
printf &quot;%s &quot;,$i
}
print &quot;&quot;
}'[/tt]


Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top