Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

sed if/else or branching? 1

Status
Not open for further replies.

blarneyme

MIS
Jun 22, 2009
160
US
This question came up in another thread so I wanted to see if
it could be done purely with sed[\b] instead of awk or shell.
# cat text
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9
The output should be
A 0,1
B 2
C 3,4,5
D 6,7
E 8,9
I've come close but am missing how to evaluate the first column of the next line with the previous
Code:
sed '/^$/!{
h
s/\(.*\) \(.*\)\n/\1/
N
s/\(.*\) \(.*\)\n/\1 \2,/
s/,[A-Z]/,/
s/, /,/
}' text

This is my output as it stands now
# ./s
A 0,1
B 2,3
C 4,5
D 6,7
E 8,9

Any ideas how to branch or do an if/else test to get it to print as expected?
 
I am closer, but still have a problem when column 1 has more than 2 variables.
Code:
# cat ss
sed '/^$/!{
h
bb
s/\(.*\) \(.*\)\n/\1 \2,/;
s/,[A-Z]/,/;
s/, /,/;
:b
/^A/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
/^B/{
 s/\(.*\) \(.*\)\n/\1/p;
 N
 /^\(.*\)\n\1$/ {
    bb
 }
}
/^C/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
 /^\(.*\)\n\1$/ {
   bb
 }
}
/^D/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
/^E/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
}' text
# cat text
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9
output
# ./ss
A 0,1
B 2
C 3
C 4,5
D 6,7
E 8,9
 
Hello,

Here is my console:

bash$ cat test.txt
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9
bash$ sed '1h;1!H;g;s/\(.*\)\([A-Z]\) \(.*\)\n\2 \(.*\)/\1\2 \3,\4/;h;$!d' test.txt
A 0,1
B 2
C 3,4,5
D 6,7
E 8,9


Do you need more explanations ? :)
Using \( and \) allows sed to identify a group of characters and reuse them by \1 \2 \3 respecting the order.
 
If you have time, could you break down the one-liner by each command and how it works with the input.

Something like:
1h copies line A0 from pattern space to holding space
1H appends line A1 from the pattern space to the holding space
g copies line A 0 A 1 from holding space to pattern space replacing the current line
s/ what is the \n\2 ?

Also my A 0 A 1 must not be correct because your substitution has (.*)([A-Z])(.*)(.*) and I would have expected a ([A-Z])(.*)([A-Z])(.*) and I understand the newline, just would have expected it after the second pair of ()() not the third and still not understanding the \2 after the newline.

If you have time I'd appreciate your explanations!
Thanks!
 
Ok. The idea is to append the next line, for each cycle, into the holding space, deleting the already existing capital and adding commas

1h; When? For the first line. What? Put working space into holding space
1!H; When? Always except the first line. What? Append working space into holding space. Why? To concatenate the new line to what we have already done.
g; When? Always. What? Put holding space into working space. Why? To work on it.
s/\(.*\)\([A-Z]\) \(.*\)\n\2 \(.*\)/\1\2 \3,\4/; substitution processing
h; When? Always. What? Put working space into holding space. Why? To prepare the next cycle.
$!d; When? Always except the last line. What? Clear working space. Why? Not to see result on screen. Remove it to see the evolution on screen.


Let's examinate the substitution:

When?
s/blabla/bloblo/ the first time
s/blabla/bloblo/2 the two first time
s/blabla/bloblo/g always

. -> any character
.* -> 0 or more character(s)
.+ -> 1 or more character(s)
.? -> 0 or 1 character (not sure. Maybe .\? )
\n -> end of line. Putting a new line (by N, G or H) is the way to concatenate multilines. You can choose to substitute \n or not.
^ -> beginning of line
$ -> end of line in regular expressions (but last line in the sed adressing system. Don't be confused)

What is the first block? Anything. Surely, what we have already done.
What is the second block? A capital.
What is the third block? Anything. Surely the numbers, separated by commas
What is the fourth block? Anything. Surely the last new number
What is the first \2? The same capital. Why not [A-Z]? Because any pair of capitals matches. Not only a capital twice.

The subtitution seems to delete the \n. Why can we see few lines? Because when you change the capital, the pattern is not recognized and the substitution is not done.
If I write 's/\n/ /g', would it write everything on a single line? No. The working space has no \n but is considered as a line. Use H,G,N.


Let's examinate your code:

sed '/^$/!{ :) you said "If the line is not an empty line". Why not saying "If there is something": sed '/./{ ?
h :) Great to put into holding space. But if you never get the text back, it is useless.
N Remember that the cycle is accomplished for each line and then, a new cycle starts. Here, it will do what you want one line out of two. Because it has sent result to screen before a new N is executed.


Still any problem? ;-)
 
Thank you very much for your answer. It is very clear and very understandble.
 
More powerfull:

sed ':l N;s/\(.*\)\([A-Z]\) \(.*\)\n\2 \(.*\)/\1\2 \3,\4/;b l' test.txt

:l
It is a label. It could have been:
:yahoo

b l
It is an unconditionnal branching.
When does it stop? When 'N' has no line to provide.
If you used :yahoo, use:
b yahoo

Is there any conditionnal branching? Yes:
t yahoo
This command loops when the last substitution has matched

;-)
 
Very succinct indeed! Would you be willing to provide me with an example of using conditonal branching (t yahoo)?

Thanks!
 
Code:
#define VAR_3_SPACES    "   "
#define VAR_4_SPACES    "    "
#define VAR_5_SPACES    "     "

pdhfnvb
:kdfjvn:df
:kdjfvn:dkvn
:kdjfvn:
Code:
sed 's/"$/\n"/;:loop s/"\n/"/;s/ \n/\nQ/;t loop' qqq.txt
Code:
#define VAR_3_SPACES    "QQQ"
#define VAR_4_SPACES    "QQQQ"
#define VAR_5_SPACES    "QQQQQ"

pdhfnvb
:kdfjvn:df
:kdjfvn:dkvn
:kdjfvn:

The first substitution inits the process.
The second will match only at the end.
The third has to be repeated. Why? Because if a pattern is recognized once, sed won't work on it twice. => loop
There is no stopping command => conditional branching
 
FlorianAwk, your sed script posted 30 Mar 12 14:15 is valid only with GNU sed ...
Did you try it with a legacy *nix sed ?
 
In legacy sed:
1) the : command is limited to a 8 length label and don't admit ; as a command delimiter (work around: replace ; with ^J)
2) \n in a pattern is considered as n (no workaround AFAIK)

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Have you read the link I gave? You could read
Code:
REGULAR EXPRESSIONS 
(...)  The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.
Can you give official origin of your information? Would you say UNIX man pages are bad?

Furthermore, in my script, \n could have been any character. It works.
 
Have you read the link I gave
Yes.
And a man page saying that a command admits long option (--anything) isn't a UNIX man man page but a GNU one.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top