Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing duplicate words in a file

Status
Not open for further replies.

ianicr

IS-IT--Management
Nov 4, 2003
230
GB
I have a file that is:
"MR","I","SMITH","1","NEW STREET","LONDON","LONDON"

I wish to remove the first london but leave the second. Is there an easy way to do this?
 
Question -

Given the format of the data being

"MR","I","SMITH","1","NEW STREET","LONDON","LONDON"


You would want to keep all the data in the case of a record

like this one -

"MR","R","JONES","1","NEW STREET","GREENWICH","LONDON"

Correct?


 
Yep. That's just what I need. Not too bothered what goes into the field as long as its not the same as the field after.
 
Perhaps use sed....

sed 's/\(,"[^"]*"\)\1/\1/g' file1 > file2
 
If you want to remove field 6 ("," separator) if it is the same as field 7 ...

with sed :

sed -e /s/^\(\([^,]*,\)\{5\}\)\([^,]*\),\3$/\1,\3/' input >output

with awk :

awk 'BEGIN {FS="," ; OFS=","} $6==$7 {$6=""; print $0}' input >output

Jean Pierre.
 
If you need to preserve the field place holder but not the value that it contains you could do the following -

sed 's/\(,"[^"]*"\)\1/,""\1/g' file1>file2
 
Try this:
Code:
awk -F, '
$6==$7{$6="\"\""}
{print}
' </path/to/inputfile

Hope This Help
PH.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top