Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

remove repeating groups with sed

Status
Not open for further replies.

emblem

Programmer
Jun 25, 2002
26
Hello experts,

I have lines in a pipe-delimited file like this:

SCH|Dartmouth University|2006|fall|
ACA|Duke |2008|winter|
SCH|Dartmouth College|2007~2008|spring|
CON|49 Main Street|CT|
SCH|Princeton|2003~2004~2005|

I am trying to edit multiple year fields in SCH rows to show just the first year in a string of ~ years
to get this
SCH|Dartmouth College|2007|spring|

This is beyond my modest reg expr substitution skills. Although I do see that if the first year could be defined as a pattern, then I could substitute $1 $2 $3 with just $1. I am not sure how to define multiple patterns within a single long string.

I have tried
sed -e 's/(\([0-9]{4})(~[0-9]{4}){1,4}\)\1/' input_file
and various other permutations of this, but without success, although I appreciate the dbase77's example at thread822-1444480.

I also tried doing this in awk
$1 ~ /SCH/ {split($3,dates,"~")
$3=dates[1]
print $0 }
seems like an elegant way to use arrays, but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with, so I would like to use a simpler tool if possible.

Many thanks for your suggestions.
 
Hi

emblem said:
but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with
Where you set the [tt]FS[/tt] ? There is no such thing in code you posted.

Anyway, why would you like to split the string ? Just remove the unwanted part :
Code:
awk -F '|' -v OFS='|' '$1=="SCH"{sub(/~.*/,"",$3)}1' /input/file
Regarding [tt]sed[/tt], is abit more complicated, but as long as the input looks like your sample, we can keep it simple :
Code:
sed '/^SCH|/s/~[^|]*|/|/' /input/file

Feherke.
 
Thank you Feherke! This is tremendously helpful. I did not show the BEGIN{ IFS='|'} clause in my awk, but it was there anyway and not working. The -F'|' puts me back in business.

your awk and sed code both work, too

Now that awk is functional, my original

$1 ~ /SCH/ {split($3,dates,"~");
$3=dates[1]
print $0}

code is working, but you have taught me something about regexps.
 
Of course for such a simple change you didn't necessarily need to use a scripting tool of any kind.
For example, if you were editing the file in vim you could simply have typed this:
Code:
:g/^SCH/s/^\([^|]*|[^|]*|[0-9]*\)[^|]*/\1/

The "g" limits the subsequent regex to only work on your SCH rows and the regex just dumps anything after the ~ in the third field.



Trojan.
 
hahaha
Yeah, ok.

But the point I was trying to make was that if you learn your editor you can do much of this kind of work directly within it.



Trojan.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top