Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract link from value 'href="..."' 2

Status
Not open for further replies.

solepixel

Programmer
May 30, 2007
111
US
I'm trying to figure out how to extract the part between the 'href="' and '"' in the string:

'href="/pagename/?stuff"'

So basically I've gotten the position of 'href="', then added 6 so it starts after that string, but how do i get the substr from that position to the NEXT quote (")?

something like:
Code:
$start = strpos($link_text,'href="') + 6;
$something = get first (") AFTER 'href="';
$url = substr($link_text,$start,$something);
 
Hi

Code:
$html="<a class=\"menu\" href=\"[URL unfurl="true"]http://example.com/path/\"[/URL] target=\"_top
\"></a>";
$url=ereg_replace(".*href=\"([^\"]*)\".*","\\1",$html);
echo "$url\n";
output said:
Note that the above code does exactly what you asked for. Unfortunately people may enclose their attributes in single quotes too.

( Yes jpadie, I know, regular expressions should be avoided as much as possible... ;-) )

Feherke.
 
Try a strpos() with the $offset set to ($start + 1). It would look something like this:

Code:
$something = [red]strpos($link_text,'"',($start+1));
$length = $something - $start;[/red]

$url = substr($link_text, $start, [red]$length[/red]);

Notice the change to the way substr() is called. The third parameter is the length of the string to be returned, not the stop point as you seem to be using it.
 
How about something like:

Code:
$href='href="/pagename/?stuff"';
$linkparts=explode('"',$href);

$thepartyouwant=$linkparts[1];
echo $thepartyouwant;

----------------------------------
Ignorance is not necessarily Bliss, case in point:
Unknown has caused an Unknown Error on Unknown and must be shutdown to prevent damage to Unknown.
 
I think I wanted to go with the offset method. I for some stupid reason thought offset meant how many to skip... so i set it to 1, 2, 3, and no change. Now I know why.

Code:
$start = strpos($link_text,'href="') + 6;
$end = strpos($link_text,'"',$start+1) - 6;
$single_url = substr($link_text,$start,$end);

Thanks for all the help, but unfortunately, I can't use:

$url=substr($html,6,strlen($html)-7);

because sometimes there's more things inside the "link text" like onclick or target=...

The explode thing wouldn't work since the href=" part might not necessarily be the first thing in the text.

Lastly, Regex... I would consider that a last resort and since the offset thing worked, I will put it away for another day. Thanks again everyone.
 
that does not look like a great method. you cannot allow for tags like this
Code:
<img class="someclass" href="someimage.jpg">
nor that use single quotes instead of double quotes.

I would go for feherke's method but substitute ereg for preg_match. My understanding is that ereg will be taken out (at least by default) of php6 for release and PRCE will be pre-loaded (by default). Also preg_* i believe, is quite a bit faster than ereg.

the pattern i'd recommend looks like this
Code:
$pattern = '/<.*?href *?=[\'"](.*?)[\'"].*?>/i';

$text = "<img class=\'someclass\' href=\"[URL unfurl="true"]http://www.domain.com/somewhere.php\"[/URL] tag=\"something else\">";
preg_match($pattern, $text, $matches);
the url is contained in $matches[1].

the pattern is not perfect since the second quote should really look back to match the first quote, but having a quote within a url is invalid anyway: so hopefully this won't matter in practice.
 
Hi

Well, life is a bit... call girl and these are valid in both HTML 4.01 Strict and XHTML 1.0 Strict :-( :
Code:
<a href="bla'bla"></a>
<a href='bla"bla'></a>
So your pattern should be like below and the URL will be in $matches[2] :
Code:
$pattern = '/<.*?href *?=([\'"])([^\\1]*?)\\1.*?>/i';
Man, I have to read more about regular expressions. I do not master yet the non-greedy [tt]?[/tt]. Thanks jpadie. [medal]

Feherke.
 
neat fix. of course it doesn't work for the scenario that we both missed: the non-compliant complete lack of quotes with no spaces!

Code:
<img href=somepage.php class=someclass>

I only dabble with regular expressions after a coffee. without the caffeine hit my brain does not twist well enough...
 
haha, thank you all for trying to fix this for every possible scenario, however I guess what I failed to mention is the code this is using is something I also wrote so I know it will always contain double quotes. This is definitely useful for someone else needing the same thing tho.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top