sed if/else or branching? 1

blarneyme · Mar 27, 2012

This question came up in another thread so I wanted to see if
it could be done purely with sed[\b] instead of awk or shell.

# cat text
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9

Click to expand...

The output should be

A 0,1
B 2
C 3,4,5
D 6,7
E 8,9

Click to expand...

I've come close but am missing how to evaluate the first column of the next line with the previous

Code:

sed '/^$/!{ h s/$.*$ $.*$\n/\1/ N s/$.*$ $.*$\n/\1 \2,/ s/,[A-Z]/,/ s/, /,/ }' text

This is my output as it stands now

# ./s
A 0,1
B 2,3
C 4,5
D 6,7
E 8,9

Click to expand...

Any ideas how to branch or do an if/else test to get it to print as expected?

blarneyme · Mar 27, 2012

I am closer, but still have a problem when column 1 has more than 2 variables.

Code:

# cat ss
sed '/^$/!{
h
bb
s/\(.*\) \(.*\)\n/\1 \2,/;
s/,[A-Z]/,/;
s/, /,/;
:b
/^A/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
/^B/{
 s/\(.*\) \(.*\)\n/\1/p;
 N
 /^\(.*\)\n\1$/ {
    bb
 }
}
/^C/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
 /^\(.*\)\n\1$/ {
   bb
 }
}
/^D/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
/^E/{
 s/\(.*\) \(.*\)\n/\1/;
 N
 s/\(.*\) \(.*\)\n/\1 \2,/;
 s/,[A-Z]/,/;
 s/, /,/;
}
}' text

# cat text
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9

output

# ./ss
A 0,1
B 2
C 3
C 4,5
D 6,7
E 8,9

FlorianAwk · Mar 27, 2012

Hello,

Here is my console:

bash$ cat test.txt
A 0
A 1
B 2
C 3
C 4
C 5
D 6
D 7
E 8
E 9
bash$ sed '1h;1!H;g;s/$.*$$[A-Z]$ $.*$\n\2 $.*$/\1\2 \3,\4/;h;$!d' test.txt
A 0,1
B 2
C 3,4,5
D 6,7
E 8,9

Do you need more explanations ?

Using $ and $ allows sed to identify a group of characters and reuse them by \1 \2 \3 respecting the order.

blarneyme · Mar 28, 2012

If you have time, could you break down the one-liner by each command and how it works with the input.

Something like:
1h copies line A0 from pattern space to holding space
1H appends line A1 from the pattern space to the holding space
g copies line A 0 A 1 from holding space to pattern space replacing the current line
s/ what is the \n\2 ?

Also my A 0 A 1 must not be correct because your substitution has (.*)([A-Z])(.*)(.*) and I would have expected a ([A-Z])(.*)([A-Z])(.*) and I understand the newline, just would have expected it after the second pair of ()() not the third and still not understanding the \2 after the newline.

If you have time I'd appreciate your explanations!
Thanks!

FlorianAwk · Mar 28, 2012

Ok. The idea is to append the next line, for each cycle, into the holding space, deleting the already existing capital and adding commas

1h; When? For the first line. What? Put working space into holding space
1!H; When? Always except the first line. What? Append working space into holding space. Why? To concatenate the new line to what we have already done.
g; When? Always. What? Put holding space into working space. Why? To work on it.
s/$.*$$[A-Z]$ $.*$\n\2 $.*$/\1\2 \3,\4/; substitution processing
h; When? Always. What? Put working space into holding space. Why? To prepare the next cycle.
$!d; When? Always except the last line. What? Clear working space. Why? Not to see result on screen. Remove it to see the evolution on screen.

Let's examinate the substitution:

When?
s/blabla/bloblo/ the first time
s/blabla/bloblo/2 the two first time
s/blabla/bloblo/g always

. -> any character
.* -> 0 or more character(s)
.+ -> 1 or more character(s)
.? -> 0 or 1 character (not sure. Maybe .\? )
\n -> end of line. Putting a new line (by N, G or H) is the way to concatenate multilines. You can choose to substitute \n or not.
^ -> beginning of line
$ -> end of line in regular expressions (but last line in the sed adressing system. Don't be confused)

What is the first block? Anything. Surely, what we have already done.
What is the second block? A capital.
What is the third block? Anything. Surely the numbers, separated by commas
What is the fourth block? Anything. Surely the last new number
What is the first \2? The same capital. Why not [A-Z]? Because any pair of capitals matches. Not only a capital twice.

The subtitution seems to delete the \n. Why can we see few lines? Because when you change the capital, the pattern is not recognized and the substitution is not done.
If I write 's/\n/ /g', would it write everything on a single line? No. The working space has no \n but is considered as a line. Use H,G,N.

Let's examinate your code:

sed '/^$/!{

you said "If the line is not an empty line". Why not saying "If there is something": sed '/./{ ?
h

Great to put into holding space. But if you never get the text back, it is useless.
N Remember that the cycle is accomplished for each line and then, a new cycle starts. Here, it will do what you want one line out of two. Because it has sent result to screen before a new N is executed.

Still any problem? ;-)

blarneyme · Mar 29, 2012

Thank you very much for your answer. It is very clear and very understandble.

FlorianAwk · Mar 29, 2012

More powerfull:

sed ':l N;s/$.*$$[A-Z]$ $.*$\n\2 $.*$/\1\2 \3,\4/;b l' test.txt

:l
It is a label. It could have been:
:yahoo

b l
It is an unconditionnal branching.
When does it stop? When 'N' has no line to provide.
If you used :yahoo, use:
b yahoo

Is there any conditionnal branching? Yes:
t yahoo
This command loops when the last substitution has matched

;-)

blarneyme · Mar 30, 2012

Very succinct indeed! Would you be willing to provide me with an example of using conditonal branching (t yahoo)?

Thanks!

FlorianAwk · Mar 30, 2012

Code:

#define VAR_3_SPACES    "   "
#define VAR_4_SPACES    "    "
#define VAR_5_SPACES    "     "

pdhfnvb
:kdfjvn:df
:kdjfvn:dkvn
:kdjfvn:

Code:

sed 's/"$/\n"/;:loop s/"\n/"/;s/ \n/\nQ/;t loop' qqq.txt

Code:

#define VAR_3_SPACES    "QQQ"
#define VAR_4_SPACES    "QQQQ"
#define VAR_5_SPACES    "QQQQQ"

pdhfnvb
:kdfjvn:df
:kdjfvn:dkvn
:kdjfvn:

The first substitution inits the process.
The second will match only at the end.
The third has to be repeated. Why? Because if a pattern is recognized once, sed won't work on it twice. => loop
There is no stopping command => conditional branching

PHV · Mar 30, 2012

FlorianAwk, your sed script posted 30 Mar 12 14:15 is valid only with GNU sed ...
Did you try it with a legacy *nix sed ?

FlorianAwk · Mar 30, 2012

No,I didn't.
But if you read the UNIX man pages,

http://unixhelp.ed.ac.uk/CGI/man-cgi?sed

you can find label, b and t. Everything should work. I wonder how you see the difference.

Does it work better with
\"
instead of
"
?

PHV · Mar 30, 2012

In legacy sed:
1) the : command is limited to a 8 length label and don't admit ; as a command delimiter (work around: replace ; with ^J)
2) \n in a pattern is considered as n (no workaround AFAIK)

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

FlorianAwk · Mar 30, 2012

Have you read the link I gave? You could read

Code:

REGULAR EXPRESSIONS 
(...)  The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.

Can you give official origin of your information? Would you say UNIX man pages are bad?

Furthermore, in my script, \n could have been any character. It works.

PHV · Mar 30, 2012

Have you read the link I gave
Yes.
And a man page saying that a command admits long option (--anything) isn't a UNIX man man page but a GNU one.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

sed if/else or branching? 1

blarneyme

MIS

blarneyme

MIS

FlorianAwk

Programmer

blarneyme

MIS

FlorianAwk

Programmer

blarneyme

MIS

FlorianAwk

Programmer

blarneyme

MIS

FlorianAwk

Programmer

PHV

MIS

FlorianAwk

Programmer

PHV

MIS

FlorianAwk

Programmer

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor