remove repeating groups with sed

emblem · Aug 19, 2009

Hello experts,

I have lines in a pipe-delimited file like this:

SCH|Dartmouth University|2006|fall|
ACA|Duke |2008|winter|
SCH|Dartmouth College|2007~2008|spring|
CON|49 Main Street|CT|
SCH|Princeton|2003~2004~2005|

I am trying to edit multiple year fields in SCH rows to show just the first year in a string of ~ years
to get this
SCH|Dartmouth College|2007|spring|

This is beyond my modest reg expr substitution skills. Although I do see that if the first year could be defined as a pattern, then I could substitute $1 $2 $3 with just $1. I am not sure how to define multiple patterns within a single long string.

I have tried
sed -e 's/($[0-9]{4})(~[0-9]{4}){1,4}$\1/' input_file
and various other permutations of this, but without success, although I appreciate the dbase77's example at thread822-1444480.

I also tried doing this in awk
$1 ~ /SCH/ {split($3,dates,"~")
$3=dates[1]
print $0 }
seems like an elegant way to use arrays, but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with, so I would like to use a simpler tool if possible.

Many thanks for your suggestions.

feherke · Aug 19, 2009

Hi

emblem said:
but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with

Where you set the [tt]FS[/tt] ? There is no such thing in code you posted.

Anyway, why would you like to split the string ? Just remove the unwanted part :

Code:

awk -F '|' -v OFS='|' '$1=="SCH"{sub(/~.*/,"",$3)}1' /input/file

Regarding [tt]sed[/tt], is abit more complicated, but as long as the input looks like your sample, we can keep it simple :

Code:

sed '/^SCH|/s/~[^|]*|/|/' /input/file

Feherke.

http://rootshell.be/~feherke/

emblem · Aug 20, 2009

Thank you Feherke! This is tremendously helpful. I did not show the BEGIN{ IFS='|'} clause in my awk, but it was there anyway and not working. The -F'|' puts me back in business.

your awk and sed code both work, too

Now that awk is functional, my original

$1 ~ /SCH/ {split($3,dates,"~");
$3=dates[1]
print $0}

code is working, but you have taught me something about regexps.

TrojanWarBlade · Aug 20, 2009

Of course for such a simple change you didn't necessarily need to use a scripting tool of any kind.
For example, if you were editing the file in vim you could simply have typed this:

Code:

:g/^SCH/s/^\([^|]*|[^|]*|[0-9]*\)[^|]*/\1/

The "g" limits the subsequent regex to only work on your SCH rows and the regex just dumps anything after the ~ in the third field.

Trojan.

p5wizard · Aug 20, 2009

simply... <grin>

HTH,

p5wizard

TrojanWarBlade · Aug 21, 2009

hahaha
Yeah, ok.

But the point I was trying to make was that if you learn your editor you can do much of this kind of work directly within it.

Trojan.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

remove repeating groups with sed

emblem

Programmer

feherke

Programmer

emblem

Programmer

TrojanWarBlade

Programmer

p5wizard

IS-IT--Management

TrojanWarBlade

Programmer

Similar threads

Part and Inventory Search

Sponsor