Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simple Regex

Status
Not open for further replies.

kevinpham

Programmer
Dec 21, 2001
32
US
Hi,
I just want a simple regex to do three things
1. take out the part between
<body> ... </body> tag in an HTML file.
2. replace every relative image url with its full url
<img src='aaa.jpg'> to be <img src='3. Replace every relative url with its full url
<a href='aaa.htm'> to be <a href='

I will then put the result back into my own header and footer tag, but that's an easier thing afterward.

if anyone can help put, I would like to appreciate. cheer
kevin
 
Code:
$domain = '[URL unfurl="true"]http://www.mydomain.com/';[/URL]

s/<body>.*<\/body>/<body><\/body>/gi;
s/<img src=(['&quot;])(.*?)\1>/<img src=$1$domain$2$1>/gi;
s/<a href=(['&quot;])(.*?)\1>/<a href=$1$domain$2$1>/gi;
The domain needs to have the trailing slash / because it's just prefixed on the link itself, so it needs the delimiter. Also, it prefixes ALL links, not just those without an absolute path. You'd have to put in a little more logic to do that, but still pretty simple. Just test each link to see if it starts with 'http'. Hope it helps. Yell if there's problems. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Sponsor

Back
Top