parse file and delete field

schocku · Oct 23, 2003

Hi everybody,

I have a file (a.dat) with the following data in it :

10/17/2003^F^555555^F^333^F^HELLO^F^04-15-2003^R^
10/17/2003^F^444444^F^11^F^^F^09-15-2003^R^
10/17/2003^F^555555^F^333^F^WATER^F^04-15-2003^R^
10/17/2003^F^^F^333^F^FIRE^F^04-15-2003^R^
10/17/2003^F^111^F^333^F^WIND^F^04-15-2003^R^
....

Based on the field number given by the user I need to delete the field from all lines in the file and output it to another file. For example if the user input is "a.dat 3" (which means user wants 3rd field to be deleted from file a.dat) then my output should be

10/17/2003^F^555555^FHELLO^F^04-15-2003^R^
10/17/2003^F^444444^F^^F^09-15-2003^R^
10/17/2003^F^555555^F^WATER^F^04-15-2003^R^
10/17/2003^F^^F^FIRE^F^04-15-2003^R^
10/17/2003^F^111^F^WIND^F^04-15-2003^R^
....

Could you please help me with a script to do this. Thanks for your help.

vgersh99 · Oct 23, 2003

something like that to get you started using awk/nawk:

# to remove the THIRD field (by default)
nawk -f schocku.awk myFile.txt

# to remove the SECOND field
nawk -v fld2delete=2 -f schocku.awk myFile.txt

# to remove the FOURTH field
nawk -v fld2delete=4 -f schocku.awk myFile.txt

#------------------- schocku.awk
BEGIN {
FS="\\^F"
OFS="^F"

if (!fld2delete)
fld2delete=3
}

NF >= fld2delete {
for (i=fld2delete; i < NF; i++) {
nextField = i + 1;
$i = $nextField;
}
NF--;
print
next;
}
1
#----------------------------------------------

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

Ygor · Oct 23, 2003

Assuming that field separator is "^F^" and record separator is "^R^", then try...

Col=3; sed -e 's/[^^]*^[FR]^//'$Col -e 's/F^$/R^/' a.dat

...it's not awk but gives the desired results....

10/17/2003^F^555555^F^HELLO^F^04-15-2003^R^
etc

schocku · Oct 23, 2003

vgersh99,

Thanks for your quick response.

I changed the FS to be "\\^F\\^" and OFS to be "^F^" in your script and tried it. It worked very well, except for the last column. For example when I specified the fld2delete=5 it did not delete the 5th field from the sample file "a.dat". The reason might be because the row seperator is ^R^. Any ideas how to delete the last column also if need be.

vgersh99 · Oct 23, 2003

something like that, but I didn't know about the record separator - I thought it was just a 'new-line'.

here's something a bit better:

BEGIN {
FS="\\^F\\^"
OFS="^F^"
#ORS="^R^"

if (!fld2delete)
fld2delete=3
}

NF >= fld2delete {
for (i=fld2delete; i <= NF; i++) {
nextField = i + 1;
$i = $nextField;
}
NF--;
print
next;
}
1

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

schocku · Oct 23, 2003

Thanks to both vgersh99 and Ygor. Both your solutions worked great.

schocku · Nov 19, 2003

Ygor,

I used your solution in my application for stripping a column and it worked fine. However if you have a blank column in between it does not strip successfully. For example in the below data :

10/17/2003^F^444444^F^11^F^^F^1333^F^09-15-2003^R^

If I want to strip column 1 it gets stripped fine. However if I want to strip column 5 it does not. If I want to strip column 4 it strips column 5.

I think the blank column in between (^F^^F^) causes a problem. Anyway to work around it.

Ygor · Nov 19, 2003

The problem is how to count columns. A column with a null value is still counted as a column, ie. using your example...
col1="10/17/2003"
col2="444444"
col3="11"
col4=""
col5="1333"
col6="09-15-2003"

So to strip column 5...

Col=5; sed -e 's/[^^]*^[FR]^//'$Col -e 's/F^$/R^/' a.dat

Gives...

10/17/2003^F^444444^F^11^F^^F^09-15-2003^R^

...as expected.

schocku · Nov 23, 2003

Ygor,

Sorry about that. It works fine for files with small number of columns. I had a file with 49 columns in it and I was trying to delete the 49th column and was unable to do it. I read in the sed man pages that sed cannot edit very long lines. Is sed reliable for long lines and how long is a long line ??

schocku · Nov 23, 2003

I have a new requirment now. I want to be able to strip multiple columns from the flat file using awk. So it could be column 3 and 5 in the above example.

If i pass the variable fld2delete with the value "3,5" would I be able to strip it within the awk program.

The other solution would be to strip fld 3 first using the awk script, write to a file and strip field 5 from that file and output to another new file.

Thanks.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

parse file and delete field

schocku

Programmer

vgersh99

Programmer

Ygor

Programmer

schocku

Programmer

vgersh99

Programmer

schocku

Programmer

schocku

Programmer

Ygor

Programmer

schocku

Programmer

schocku

Programmer

Similar threads

Part and Inventory Search

Sponsor