Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex : best way to get only the <body>...</body>

Status
Not open for further replies.

kevinpham

Programmer
Dec 21, 2001
32
US
Hi,
What is the bestway to extract only the <body>....</body> of an HTML file, and convert all links and graphics to their predefined URL.
For example, if I have
<html><head></head><body>
junk <a href=&quot;/local/files/.html&quot;><img src=&quot;/local/images/.gif&quot;></a>

junk <a href=&quot; src=&quot;</body></html>
-----------------------
what i want is to get the part from <body>(this part)</body>
and convert all links and images that do not start with http:// to if we know /local/files will be
It is not that hard, but I have some problems w/ it. When I convert all local files to URL, it is messup.

Really appreciate for all you guys help.

cheer
kevin
 
You can catch the <body></body> chunk like this,
Code:
$html_text =~ /<body>.*?</body>/is;
$body = $&;
Then,
Code:
$body =~ s/<a href=&quot;(.*?)&quot;/fix_url($1)/egis;

sub fix_url
{
my $url = shift;
unless ($url =~ /^http:\/\//)  
  { $url = '<a href=[URL unfurl="true"]http://'.$url.'</a>;[/URL] }
return($url)
}

The 'e' switch on the regex triggers the evaluation of the right side of the regex for the replacement. So, the contents of the anchor tag are passed to sub fix_url which tweaks the url if needed and passes it back for the replacement.

I have not run this so it might need a syntax tweak, but, hopefully it illustrates one approach that might do what you want.



'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
it helps alot...

thanks for the tips.

It got me worked for hours and now just a few minutes implement the approach, it works perfectly

cheer
kevin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top