Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parse file and delete field

Status
Not open for further replies.

schocku

Programmer
Nov 20, 2001
23
US
Hi everybody,

I have a file (a.dat) with the following data in it :

10/17/2003^F^555555^F^333^F^HELLO^F^04-15-2003^R^
10/17/2003^F^444444^F^11^F^^F^09-15-2003^R^
10/17/2003^F^555555^F^333^F^WATER^F^04-15-2003^R^
10/17/2003^F^^F^333^F^FIRE^F^04-15-2003^R^
10/17/2003^F^111^F^333^F^WIND^F^04-15-2003^R^
....

Based on the field number given by the user I need to delete the field from all lines in the file and output it to another file. For example if the user input is "a.dat 3" (which means user wants 3rd field to be deleted from file a.dat) then my output should be

10/17/2003^F^555555^FHELLO^F^04-15-2003^R^
10/17/2003^F^444444^F^^F^09-15-2003^R^
10/17/2003^F^555555^F^WATER^F^04-15-2003^R^
10/17/2003^F^^F^FIRE^F^04-15-2003^R^
10/17/2003^F^111^F^WIND^F^04-15-2003^R^
....

Could you please help me with a script to do this. Thanks for your help.
 
something like that to get you started using awk/nawk:

# to remove the THIRD field (by default)
nawk -f schocku.awk myFile.txt

# to remove the SECOND field
nawk -v fld2delete=2 -f schocku.awk myFile.txt

# to remove the FOURTH field
nawk -v fld2delete=4 -f schocku.awk myFile.txt

#------------------- schocku.awk
BEGIN {
FS="\\^F"
OFS="^F"

if (!fld2delete)
fld2delete=3
}

NF >= fld2delete {
for (i=fld2delete; i < NF; i++) {
nextField = i + 1;
$i = $nextField;
}
NF--;
print
next;
}
1
#----------------------------------------------

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Assuming that field separator is &quot;^F^&quot; and record separator is &quot;^R^&quot;, then try...

Col=3; sed -e 's/[^^]*^[FR]^//'$Col -e 's/F^$/R^/' a.dat

...it's not awk but gives the desired results....

10/17/2003^F^555555^F^HELLO^F^04-15-2003^R^
etc
 
vgersh99,

Thanks for your quick response.

I changed the FS to be &quot;\\^F\\^&quot; and OFS to be &quot;^F^&quot; in your script and tried it. It worked very well, except for the last column. For example when I specified the fld2delete=5 it did not delete the 5th field from the sample file &quot;a.dat&quot;. The reason might be because the row seperator is ^R^. Any ideas how to delete the last column also if need be.
 
something like that, but I didn't know about the record separator - I thought it was just a 'new-line'.

here's something a bit better:

BEGIN {
FS=&quot;\\^F\\^&quot;
OFS=&quot;^F^&quot;
#ORS=&quot;^R^&quot;

if (!fld2delete)
fld2delete=3
}

NF >= fld2delete {
for (i=fld2delete; i <= NF; i++) {
nextField = i + 1;
$i = $nextField;
}
NF--;
print
next;
}
1


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Thanks to both vgersh99 and Ygor. Both your solutions worked great.
 
Ygor,

I used your solution in my application for stripping a column and it worked fine. However if you have a blank column in between it does not strip successfully. For example in the below data :

10/17/2003^F^444444^F^11^F^^F^1333^F^09-15-2003^R^

If I want to strip column 1 it gets stripped fine. However if I want to strip column 5 it does not. If I want to strip column 4 it strips column 5.

I think the blank column in between (^F^^F^) causes a problem. Anyway to work around it.
 
The problem is how to count columns. A column with a null value is still counted as a column, ie. using your example...
col1=&quot;10/17/2003&quot;
col2=&quot;444444&quot;
col3=&quot;11&quot;
col4=&quot;&quot;
col5=&quot;1333&quot;
col6=&quot;09-15-2003&quot;

So to strip column 5...

Col=5; sed -e 's/[^^]*^[FR]^//'$Col -e 's/F^$/R^/' a.dat

Gives...

10/17/2003^F^444444^F^11^F^^F^09-15-2003^R^

...as expected.
 
Ygor,

Sorry about that. It works fine for files with small number of columns. I had a file with 49 columns in it and I was trying to delete the 49th column and was unable to do it. I read in the sed man pages that sed cannot edit very long lines. Is sed reliable for long lines and how long is a long line ??
 
I have a new requirment now. I want to be able to strip multiple columns from the flat file using awk. So it could be column 3 and 5 in the above example.

If i pass the variable fld2delete with the value &quot;3,5&quot; would I be able to strip it within the awk program.

The other solution would be to strip fld 3 first using the awk script, write to a file and strip field 5 from that file and output to another new file.

Thanks.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top