emblem
Programmer
- Jun 25, 2002
- 26
Hello experts,
I have lines in a pipe-delimited file like this:
SCH|Dartmouth University|2006|fall|
ACA|Duke |2008|winter|
SCH|Dartmouth College|2007~2008|spring|
CON|49 Main Street|CT|
SCH|Princeton|2003~2004~2005|
I am trying to edit multiple year fields in SCH rows to show just the first year in a string of ~ years
to get this
SCH|Dartmouth College|2007|spring|
This is beyond my modest reg expr substitution skills. Although I do see that if the first year could be defined as a pattern, then I could substitute $1 $2 $3 with just $1. I am not sure how to define multiple patterns within a single long string.
I have tried
sed -e 's/(\([0-9]{4})(~[0-9]{4}){1,4}\)\1/' input_file
and various other permutations of this, but without success, although I appreciate the dbase77's example at thread822-1444480.
I also tried doing this in awk
$1 ~ /SCH/ {split($3,dates,"~")
$3=dates[1]
print $0 }
seems like an elegant way to use arrays, but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with, so I would like to use a simpler tool if possible.
Many thanks for your suggestions.
I have lines in a pipe-delimited file like this:
SCH|Dartmouth University|2006|fall|
ACA|Duke |2008|winter|
SCH|Dartmouth College|2007~2008|spring|
CON|49 Main Street|CT|
SCH|Princeton|2003~2004~2005|
I am trying to edit multiple year fields in SCH rows to show just the first year in a string of ~ years
to get this
SCH|Dartmouth College|2007|spring|
This is beyond my modest reg expr substitution skills. Although I do see that if the first year could be defined as a pattern, then I could substitute $1 $2 $3 with just $1. I am not sure how to define multiple patterns within a single long string.
I have tried
sed -e 's/(\([0-9]{4})(~[0-9]{4}){1,4}\)\1/' input_file
and various other permutations of this, but without success, although I appreciate the dbase77's example at thread822-1444480.
I also tried doing this in awk
$1 ~ /SCH/ {split($3,dates,"~")
$3=dates[1]
print $0 }
seems like an elegant way to use arrays, but my awk and gawk got so confused with | delimiters in spite of FS declarations that it never finds the 3rd field to start with, so I would like to use a simpler tool if possible.
Many thanks for your suggestions.