Convert consecutive line breaks into a single space

Kirsle · Jun 23, 2007

Hey.

I'm still doing some work on my Tk::HyperText module, but there's a little thing I'm having trouble with on it: converting multiple line breaks into a single space.

For example, in this HTML code...

Code:

<b>The quick brown fox
jumps over the lazy dog.</b>

The page should display "The quick brown fox jumps over the lazy dog." in a real web browser, because the line break gets converted into a space.

However, when I try to run a regexp like this:

Code:

$sector =~ s/\x0a+/ /ig;

It converts every Lf into a space. So that, with this HTML code:

Code:

<html>
<head>
<title>HyperText Demonstration</title>
</head>
<body bgcolor="#EEEEFF" link="#0000FF" vlink="#FF00FF" alink="#FF0000" text="#000000">

<h1>Tk::HyperText Demonstration</h1>

The page displays with 5 or 6 spaces on the top line, then the <h1> code. At most it should only have 1 space on the top line.

Here's my relevant bit of code:

Code:

		# If this was preformatted text, preserve the line endings and spacing.
		if ($style{pre} == 1) {
			# Leave it alone.
		}
		else {
			$sector =~ s/\x0d//smg;
			$sector =~ s/\x0a+//smg;
			$sector =~ s/\s+/ /sg;
		}

Help would be appreciated.

-------------
Cuvou.com | My personal homepage
Project Fearless | My web blog

MillerH · Jun 23, 2007

Greetings Kirsle,

Is there any reason why you can't just let the \s meta character capture all your explicit returns?

Code:

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$str[/blue] = [red]qq{[/red][purple]<b>The quick brown fox\njumps over the lazy dog.</b>[/purple][red]}[/red][red];[/red]

[blue]$str[/blue] =~ [red]s/[/red][purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red][purple] [/purple][red]/[/red][red]g[/red][red];[/red]

[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$str[/blue][/purple][red]"[/red][red];[/red]

[gray][i]# Outputs: <b>The quick brown fox jumps over the lazy dog.</b>[/i][/gray]

- Miller

latigerlilly · Jun 24, 2007

Why ask why? Just use the following code to get rid of the extra line breaks:

[highlight]chomp($sector);[/highlight]

If you really need to ask why, without looking at the rest of your code, a good guess would be that a visitor to your website would have to type enter or return to have the program accept the input. Therefore, you'll expect just one line break, but get 2 instead due to your visitors hitting the enter or return button.

Shameless plug
I've answered your question, now please answer mine:

[highlight]

http://www.tek-tips.com/viewthread.cfm?qid=1381216&page=1

[/highlight]

Have a nice day,
Lilly [sunshine]

Kirsle · Jun 24, 2007

Actually, you didn't answer my question, latigerlilly. ;-) MillerH gave a better answer than you did.

MillerH: your code worked a lil better than mine did. I still ran into a big of a problem with the spaces at the beginning of the page (which I've concluded is because of the line breaks around <html>, <head>, <title>, etc), but I just added a bit of code to set $sector="" if it only contains white spaces.

But, as for having newlines in the middle of a paragraph, your code converts them correctly so that each newline in the paragraph becomes a space. A small minor bug yet is that the first line of a new paragraph has one space in the front of it. It's not too big of a deal but I'll keep working at it.

latigerlilly: I wasn't talking about CGI or anything to do with user input, I was talking about HTML rendering, as I'm programming this module:

http://search.cpan.org/perldoc?Tk::HyperText

-------------
Cuvou.com | My personal homepage
Project Fearless | My web blog

travs69 · Jun 24, 2007

s/^\s*//;

Get's reading of leading spaces.

MillerH · Jun 24, 2007

Hi Kirsle,

I think the biggest improvement that you could make to your module would be to revamp the html rendering. Instead of rolling your own parser, I would advise you to use HTML::TokeParser or some other related engine.

http://search.cpan.org/search?query=HTML::TokeParser

I've made my own markup language before, and while it can be fun creating your own rendering engine, there are too many special cases to make doing everything yourself very effective. I'm not sure which engine would work best for what you're trying to do, or what specifications that you would want to apply. Maybe only support html snippets? As in fail on <html> and <body> and <title> tags. But whatever spec you come up with, it's going to be easier if you use someone's already existing code.

- Miller

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Convert consecutive line breaks into a single space

Kirsle

Programmer

MillerH

Programmer

latigerlilly

Programmer

Kirsle

Programmer

travs69

MIS

MillerH

Programmer

Similar threads

Part and Inventory Search

Sponsor