Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expressions and HREF

Status
Not open for further replies.

PeterWhmrc

Programmer
Nov 12, 2011
1
Hi all,

Although theyre supposed to be fairly simple Im definitely having problems getting my head around Regular Expressions and more importantly how they are used to replace or amend data.

I am trying to do something I think should be fairly simple.

Im reading in a webpage using cURL amending some text and outputting it to the browser (basically a simple censor type page)

This works fine however I want to be able to enhance it so that if a user clicks a link on the viewed page it doesnt take them to that link as such but loads the page into the php script.

Im probably not making much sense here!

MyPage1.html = simple page with a textfield to enter a URL (say MyPage2.php = php script that loads selected URL (via GET) and changes the text before displaying

If user clicks a link on that page (say then they are directed to the bbc site whereas I want them to be directed to

mypage2.php?theurl='
basically allowing the user to surf the net while still always being on mypage2.php

this is my first foray into Regular Expressions (I think Ive shied away from them in the past).

So I basically want to use Regex to find any <a href=" (or <a href=') then add " between the href= and the " or '

that would work for the bulk of href's trouble is theres other combinations possible too though, <a title="my site" href="mysite.com"> is also valid along with numerous other possibles.

If we can get the basic one though that would do for starters. If anyone could give me hints, pointers or a link to a good easy tuo understand tutorial Id be grateful.

Also to the Admins, when searching for Regular Expressions I found numerous posts in numerous forums, would this not be something that could have its own forum seeing as it crosses languages?
 
Hi

There are various practices, either forced my circumstances, intended to break robots or just pure stupidities, that will be hard to surpass :
[ul]
[li]There are syntactically invalid [tt]href=no-quote-around.html[/tt] attributes.[/li]
[li]There are relative links affected by [tt]base[/tt] [tt]href[/tt] tag.[/li]
[li]There are compromised links like [tt]<a href="#" onclick="location='whatever.html'">[/tt][/li]
[li]There are not links like [tt]<form action="whatever.html"><input button="submit" value="click here"></form>[/tt][/li]
[li]There are contents loaded with AJAX.[/li]
[/ul]
What you are trying to accomplish is a proxy. Better read on that subject first. Parsing HTML with regular expressions rarely leads to success.


Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top