sed question 1

tikual · Nov 10, 2003

Hi all,

I would like to combine all lines of a file to one line by sed script.

From:
This is the first line.
This is the second line.
This is the third line.

To:
This is the first line. This is the second line. This is the third line.

So I wrote a sed script like this:
/^$/d
1,${
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
}

It nearly meets my requirement because it prints once in every combination of two lines. But, if I have a big file. Only the last output is my want. How can I solve this problem or it has another simply method to do this?

Thanks!!

tikual

Ygor · Nov 11, 2003

Why not use the paste -s command? From man paste....

-s Merge subsequent lines rather than one from each input
file. Use tab for concatenation, unless a list is
specified with the -d option. Regardless of the list,
the very last character of the file is forced to be a
new-line.

tikual · Nov 11, 2003

Why not? hehe.... because I am learning sed command by myself. In few mins ago I solved the problem. Now I would like to share it with all of you.

After modified the script:
#n
/^$/d
$!{
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
D
}
${
g
P
}

tikual

gabele · Nov 11, 2003

for performance reasons i would suggest to use tr here:
the command

Code:

 tr &quot;\n&quot; &quot; &quot; < myinputfile.txt

maps all newline characters (including the last one) to space characters.

regards,

Tobias

Tobias Gabele
Bierwirth & Gabele SoftwareDesign GBR

http://uni-sql.de/gabele/projects.html

tikual · Nov 11, 2003

hehe.... all of you ignored my Subject too. Anyway, I also appreciate all of your replies. Thanks!

tikual

bigoldbulldog · Nov 11, 2003

Check out the last line to print out the result. You could also play with the -n flag and use $p

/^$/d
1,${
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
}
$!d

This also works but is even less efficient (12 times slower!) with large files.

sed -e :a -e '$!N;s/\n/ /;t a'

Both are really slow. In the real world I'd use one of the other posters' ideas.

Cheers,
ND [smile]

bigoldbulldog@hotmail.com

bigoldbulldog · Nov 11, 2003

Whops, I reversed my scripts when using timex. Use of sed -e :a -e '$!N;s/\n/ /;t a' the faster one.

Cheers,
ND [smile]

bigoldbulldog@hotmail.com

tikual · Nov 11, 2003

Wow... bigoldbulldog you let me know another way. A star for you!

tikual

Annihilannic · Nov 11, 2003

Another one: sed -n 'H;${x;y/\n/ /;p;}'

Annihilannic.

tikual · Nov 11, 2003

Let me sum up all suggestions first.

Bigoldbulldog modify my script as

/^$/d
1,${
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
}
$!d

Failed: Actually it caused by my original script. Both of us didn't keep an eye on the last line. It was missing!!

Bigoldbulldog's suggestion
sed -e :a -e '$!N;s/\n/ /;t a'

Great: Combine all lines and no changes of original text(Include spaces!). How it works? I don't know label A for what.

Annihilannic's suggestion
sed -n 'H;${x;y/\n/ /;p;}'

Good: One minor mistake found and that is a extra space added before the first line.

Once again, thanks All!!

tikual

bigoldbulldog · Nov 12, 2003

Just to be academic, let's chew on this some more.

This will output nothing if the last line is empty.
/^$/d will delete, go to the end of script and output nothing. If there are no more lines to process then the script is done.

/^$/d
1,${
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
}
$!d

Now you get

1,${
G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
}
$!d

Processing from lines 1 to the end ($) is done intrinsically in sed so 1,${...} is unnecessary.

G
s/^$.*$\n$.*$/\2 \1/
s/ *$.*$/\1/
x
$!d

It appears that s/ *$.*$/\1/ is to remove the creation of the initial space that is created be from processing the first line. This happens by concatenation of the empty hold space to the pattern space. This can be avoided by doing the G command on every line but line 1.
So replace G with 1!G and remove the substitution which should have been done once anyway, instead of on every line (e.g. $s/ *$.*$/\1/). Note: the removed substitution also prevented multiple space characters from occurring between concatenated lines when there are intermediate blanks. I'll maintain the assumption that non-blank lines are to be separated by a single space and will address it later.

1!G
s/^$.*$\n$.*$/\2 \1/
x
$!d

Now there is no initial space character on the output, but lines that are blank will result in extra spaces (1 for each blank line between non-blanks) between the concatenated lines. My earlier script doesn't handle this. You could consider adding a substitution that replaces two or more spaces with one (e.g. s/ \{2,\}/ /g). But then individual lines that contain similar multiples of space characters will be altered.

The fix is to check when the hold space is concatenated to an empty line in the patter space (e.g. found via /^\n/) and do your substitution (#A) only when this is not the case (#Test A). You can then follow that by a clean and cheap substitution (#B) to remove the potentially remaining newline that is the first character in the pattern space.

1!G
/^\n/ !{ #Test A
s/^$.*$\n$.*$/\2 \1/ #A
}
s/\n// #B
x
$!d

What the first line or lines of input are blank? Then a space character will precede the output. So remove it with a substitution (#C) so now we have

1!G
/^\n/ !{ #Test A
s/^$.*$\n$.*$/\2 \1/ #A
}
s/\n// #B
x
$s/^ // #C
$!d

But what if the first line is non-blank line and starts with a space character? Then it would be removed.

So the fix is not the use the substitution but while eating up any initial blank lines check if the hold space is also empty thus empty is being concatenated to empty ( /^\n$/ ). If the case (#Test B) is found then jump passed the substitution that puts in the space and the #B substitution will deal with the newline.

1!G
/^\n/ !{ #Test A
/\n$/ b t #Test B
s/^$.*$\n$.*$/\2 \1/ #A
:t
}
/\n/ s/// #B
x
$!d

Voilà!

There are probably more what-ifs and dozens of ways to write these things, but such is the life of sed. Typically writing sed scripts is a cyclic process until the best fit is met.

btw:

My earlier script will behave similarly if re-written as

sed -e :a -e '$!N;s/\n$//;s/^\n//;s/\n//;t a'

or

:a
$!N
s/\n$//
s/^\n//
s/\n/ /
t a

Cheers,
ND [smile]

bigoldbulldog@hotmail.com

tikual · Nov 15, 2003

Hi bigoldbulldog,

Really a expert in sed. Thanks for your explanation!!

tikual

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

sed question 1

tikual

Technical User

Ygor

Programmer

tikual

Technical User

gabele

Programmer

tikual

Technical User

bigoldbulldog

Programmer

bigoldbulldog

Programmer

tikual

Technical User

Annihilannic

MIS

tikual

Technical User

bigoldbulldog

Programmer

tikual

Technical User

Similar threads

Part and Inventory Search

Sponsor