Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Convert br tag to line break?

Status
Not open for further replies.

databarn

Programmer
Sep 19, 1998
202
US
Folk,
The task is to grab a web page, strip the HTML tags, and save it as a text file. Kind of a reverse CMS.

The problem is converting the br tags back into line breaks. I can't find a PHP function that does it, so I've tried str_replace(), but to no avail. I.E.,
[red]str_replace("<br />",xxx,$body)[/red] where [red]xxx[/red] is any of the items listed below. None of 'em worked <sigh />.

I've tried [red]chr(10)[/red] and [red]chr(13[/red]) alone and in tandem. I've tried [red]0x0A[/red] and [red]0x0D[/red] alone and in tandem. I've tried [red]\n[/red] and [red]\r[/red] alone and in tandem, quoted and unquoted. I've tried assigning each of these to the variable [red]$cr[/red], just for grins, when they didn't work directly, i.e.,
[red]str_replace("<br />",$cr,$body)[/red].

Does anyone have any idea how this might be done with a PHP script?

TIA,
make a good day . . .
. . . barn
 
How are you determining success or failure?

According to the PHP manual on nl2br() < the function doesn't remove the "\n" characters, so to go back you just have to remove the "<br>" or "<br />" tags. The following function was also posted on the above page:
Code:
<?php
   /* br2nl for use with HTML forms, etc. */
   function br2nl($text)
   {
       /* Remove XHTML linebreak tags. */
       $text = str_replace("<br />","",$text);
       /* Remove HTML 4.01 linebreak tags. */
       $text = str_replace("<br>","",$text);
       /* Return the result. */
       return $text;
   }
?>

Ken
 
Sorry, Ken,

I should have mentioned that just eliminating the br tag doesn't work. That gives me one long string of text, no line breaks, no paragraphs, just a single string, essentially unreadable as an article. I suspect the articles were originally produced in an HTML editor - in fact I know that a couple of 'em were - and have no true line breaks embedded.

'Preciate the effort though - I'll file it away for future reference.

Make a good day . . .
. . . barn
 
Again, how are you displaying the text? If you are just echoing or printing the to browser, you will not see the carriage returns, since all browsers ignore them. Surround your text with "<pre></pre>" tags and they will not be ignored.

Ken
 
Sorry,

Overlooked that part.

I've tried several different tags, <pre>, <code>, etc., to display results on the processing page.

Then I got smart - I thought - and wrote the result to a text file and looked at it with NoteTab Pro, ConTEXT, several other editors that either display, or allow me to see, different formats for line breaks. All I get is one long line, no breaks to be seen. I've had this result with several different articles, so it's not unique to one particular file - I'd had hopes <sigh />. That's why I suspect the articles have been created in some HTML editor or some sort of CMS editor.

If it makes it easier, the complete task is to extract articles from HTML files and massage them into shape for text-based email. I'm not restricted to line length, per se, but I do have to maintain paragraph and line spacing (white space), thus the need to convert <br /> and <p> to CRLF pairs.
 
Could you just paste one of the input articles here so that we can see what you're working with?
 
Maybe this will give you a start:

Code:
<?php
$str = "This is a sting with a br tag.<br />And then we have a <p> tag and a </p> tag.";
$str = str_replace("<br />", "\n", $str);
$str = str_replace("<p>", "\n\n", $str);
$str = str_replace("</p>", "\n\n", $str);
echo "<pre>";
echo $str;
echo "</pre>";
?>

Thanks,
--Mark
 
Folk,

Did you ever try to fill a pool with a three inch drain with a garden hose? Only to discover that the drain was open?

I noticed some craziness in my email client, then in a text editor. Checked, found that I'd had the box up for a bit over three days. Rebooted. Things seem be working a bit differently, now <grin />.

Went back to look at the text output, found line breaks all over the place . . . now it looks as though everything worked <sigh />. So, apparently this whole thread has been a false alarm. Sorry to have tied up your time.

I love technology <chortle />.

Make a good day . . .
. . . barn
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top