Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

sed question 1

Status
Not open for further replies.

tikual

Technical User
Jun 10, 2003
237
0
0
HK
Hi all,

I would like to combine all lines of a file to one line by sed script.

From:
This is the first line.
This is the second line.
This is the third line.

To:
This is the first line. This is the second line. This is the third line.

So I wrote a sed script like this:
/^$/d
1,${
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
}

It nearly meets my requirement because it prints once in every combination of two lines. But, if I have a big file. Only the last output is my want. How can I solve this problem or it has another simply method to do this?

Thanks!!

tikual
 
Why not use the paste -s command? From man paste....

-s Merge subsequent lines rather than one from each input
file. Use tab for concatenation, unless a list is
specified with the -d option. Regardless of the list,
the very last character of the file is forced to be a
new-line.

 
Why not? hehe.... because I am learning sed command by myself. In few mins ago I solved the problem. Now I would like to share it with all of you.

After modified the script:
#n
/^$/d
$!{
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
D
}
${
g
P
}

tikual
 
for performance reasons i would suggest to use tr here:
the command

Code:
 tr &quot;\n&quot; &quot; &quot; < myinputfile.txt
maps all newline characters (including the last one) to space characters.

regards,

Tobias

Tobias Gabele
Bierwirth & Gabele SoftwareDesign GBR
 
hehe.... all of you ignored my Subject too. Anyway, I also appreciate all of your replies. Thanks!

tikual
 
Check out the last line to print out the result. You could also play with the -n flag and use $p

/^$/d
1,${
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
}
$!d

This also works but is even less efficient (12 times slower!) with large files.

sed -e :a -e '$!N;s/\n/ /;t a'

Both are really slow. In the real world I'd use one of the other posters' ideas.

Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Whops, I reversed my scripts when using timex. Use of sed -e :a -e '$!N;s/\n/ /;t a' the faster one.

Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Wow... bigoldbulldog you let me know another way. A star for you!

tikual
 
Let me sum up all suggestions first.

Bigoldbulldog modify my script as

/^$/d
1,${
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
}
$!d

Failed: Actually it caused by my original script. Both of us didn't keep an eye on the last line. It was missing!!

Bigoldbulldog's suggestion
sed -e :a -e '$!N;s/\n/ /;t a'

Great: Combine all lines and no changes of original text(Include spaces!). How it works? I don't know label A for what.

Annihilannic's suggestion
sed -n 'H;${x;y/\n/ /;p;}'

Good: One minor mistake found and that is a extra space added before the first line.

Once again, thanks All!!

tikual
 
Just to be academic, let's chew on this some more.

This will output nothing if the last line is empty.
/^$/d will delete, go to the end of script and output nothing. If there are no more lines to process then the script is done.

/^$/d
1,${
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
}
$!d

Now you get

1,${
G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
}
$!d

Processing from lines 1 to the end ($) is done intrinsically in sed so 1,${...} is unnecessary.

G
s/^\(.*\)\n\(.*\)/\2 \1/
s/ *\(.*\)/\1/
x
$!d

It appears that s/ *\(.*\)/\1/ is to remove the creation of the initial space that is created be from processing the first line. This happens by concatenation of the empty hold space to the pattern space. This can be avoided by doing the G command on every line but line 1.
So replace G with 1!G and remove the substitution which should have been done once anyway, instead of on every line (e.g. $s/ *\(.*\)/\1/). Note: the removed substitution also prevented multiple space characters from occurring between concatenated lines when there are intermediate blanks. I'll maintain the assumption that non-blank lines are to be separated by a single space and will address it later.

1!G
s/^\(.*\)\n\(.*\)/\2 \1/
x
$!d

Now there is no initial space character on the output, but lines that are blank will result in extra spaces (1 for each blank line between non-blanks) between the concatenated lines. My earlier script doesn't handle this. You could consider adding a substitution that replaces two or more spaces with one (e.g. s/ \{2,\}/ /g). But then individual lines that contain similar multiples of space characters will be altered.

The fix is to check when the hold space is concatenated to an empty line in the patter space (e.g. found via /^\n/) and do your substitution (#A) only when this is not the case (#Test A). You can then follow that by a clean and cheap substitution (#B) to remove the potentially remaining newline that is the first character in the pattern space.

1!G
/^\n/ !{ #Test A
s/^\(.*\)\n\(.*\)/\2 \1/ #A
}
s/\n// #B
x
$!d

What the first line or lines of input are blank? Then a space character will precede the output. So remove it with a substitution (#C) so now we have

1!G
/^\n/ !{ #Test A
s/^\(.*\)\n\(.*\)/\2 \1/ #A
}
s/\n// #B
x
$s/^ // #C
$!d

But what if the first line is non-blank line and starts with a space character? Then it would be removed.

So the fix is not the use the substitution but while eating up any initial blank lines check if the hold space is also empty thus empty is being concatenated to empty ( /^\n$/ ). If the case (#Test B) is found then jump passed the substitution that puts in the space and the #B substitution will deal with the newline.

1!G
/^\n/ !{ #Test A
/\n$/ b t #Test B
s/^\(.*\)\n\(.*\)/\2 \1/ #A
:t
}
/\n/ s/// #B
x
$!d

Voilà!

There are probably more what-ifs and dozens of ways to write these things, but such is the life of sed. Typically writing sed scripts is a cyclic process until the best fit is met.


btw:

My earlier script will behave similarly if re-written as

sed -e :a -e '$!N;s/\n$//;s/^\n//;s/\n//;t a'

or

:a
$!N
s/\n$//
s/^\n//
s/\n/ /
t a


Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Hi bigoldbulldog,

Really a expert in sed. Thanks for your explanation!!

tikual
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top