Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

find an image url within raw html

Status
Not open for further replies.

JamesGMills

Programmer
Aug 15, 2005
157
GB
Hi guys,

I am using curl to get the raw html, within this html I want to get a URL to an image.

The URL will change each time but its wrapped in a class which I hope will help...

<span class="art">
<img width="64" src="</span>

So i basically want to get
into a string somehow?

Ihave tried using preg_match() but I am getting nowhere with it...

Can anyone help or point me in the right direction?

Thanks,

James


------------------------
 
Hi

Having the HTML source in variable $html :
Code:
[url=http://php.net/preg_match/]preg_match[/url]("/<span class=\"art\">\s*<img.+?src=\"(.+?)\"/",$html,$match);
echo "The URL is ",$match[1];
So continue reading on [tt]preg_match()[/tt]...

Feherke.
 
Thank you very much...

I get the preg_match function and what it does... its the regular expression in the search which I don't get.

For example... I am trying to do the same sort of thing again...

The html that I have is a full html page and within that html there is the below source:

<div class='post-body entry-content'>
<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href=" style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 395px; height: 263px;" class="my_blog_image" src="

From this I just want the src of the image...

I have added class="my_blog_image" just before the src of the image to try and make it easier but ideally this would be done just by searching for
<div class='post-body entry-content'> and getting the first src of the first image?

Can this be done?

Thanks in advance,

James


------------------------
 
Hi

James said:
ideally this would be done
Ideally the requirement should not change. In your first sample there was [tt]span[/tt] before the [tt]img[/tt]. Now there is [tt]div[/tt] and [tt]a[/tt]. Have you tried to modify the regular expression accordingly ?
Code:
preg_match("/<div class='post-body entry-content'>\s*<a .*?>\s*<img .+?src=\"(.+?)\"/",$html,$match);

Feherke.
 
I did try and have a go at modifying the one you sent me before but was not successful...

I just dont get the commands..

\s*

.*?

Etc... very powerful...

Thanks :)

------------------------
 
Hi

[URL unfurl="true" said:
http://pcre.org/pcre.txt[/URL]][tt]\s any whitespace character

. match any character except newline (by default)

* 0 or more quantifier

? extends the meaning of (
also 0 or 1 quantifier
also quantifier minimizer[/tt]


Feherke.
 
Hi,

Use the following function to return an array with all image src attribute values from a string of html.

Code:
function GetImageSource($strHtml)
{
	$strRegExp = '/<img (?:.*?)src=(?:"|\'){1}(.*?)(?:"|\'){1}/is';
	
	$arrMatches = array();
	
	$bolMatched = preg_match_all($strRegExp,$strHtml,$arrMatches);
	
	if( $bolMatched!==false && isset($arrMatches[1]) )
	{
		return $arrMatches[1];
	}
	else
	{
		return false;
	}
}
Example:
Code:
$html = '<div id="baz"><img class="foo" src="[URL unfurl="true"]http://bar.com/img.jpg"[/URL] /></div>';
$html.= "<a href='/link/'><img class='bar' onmouseover='someJsFunc();' src='/images/example.jpg' width='100'></a>";

$arr = GetImageSource($html);

if( $arr )
{
   print_r($arr);
}
else
{
   echo 'nothing was matched';
}
This example will print:
Code:
Array
(
    [0] => [URL unfurl="true"]http://bar.com/img.jpg[/URL]
    [1] => /images/example.jpg
)

You can now just loop through the array and use simple functions like stristr to match a particular src you want to use.

There is certainly a performance penalty incurred using this method for web pages with many image tags, but if you don't know regular expressions it's a reasonable alternative.
 
Thats a useful function, nice idea if you needed to get all images and then do some further checking on the images or whatever...

One thing I can think of of is you wanted to get the largest image from the page then you could get all the images in this array and then loop through to find the largest size... that would work right?

Something I may need soon for a website which would let you store 'products' in a wish list sort of thing in your own account...

Thanks,
James



------------------------
 
Hi again,

I wonder if anyone could help with this.

I have the following line:
Day 0 - Test post for trip
Is it possible to get all <a href's into an array?

Something like this:

$myTitle = "Day 0 - Test post for trip
preg_match("/http:\/\/(.)\s/",$myTitle,$matches);
print_r($matches);

------------------------
 
Hi

To extract both URLs you need [tt]preg_match_all()[/tt]. And that regular expression is abit strange. Maybe this :
PHP:
[COLOR=darkgoldenrod]preg_match_all[/color][teal]([/teal][green][i]"/http:\/\/\S+/"[/i][/green][teal],[/teal][navy]$myTitle[/navy][teal],[/teal][navy]$matches[/navy][teal]);[/teal]

Feherke.
 
Yes that works perfectly...

I just do not get this preg_match function at all

If I had ...</noscript><title>51.5199,-0.0983 - Google Maps</title>...

preg_match("/<title>(.+?) -/",$myGoogleRaw,$myCoordsMatches);

I just want the coordinates, however what I have is not working... i must be close? right?

Thanks,

James

------------------------
 
Hi

Sorry. I become dumb and dumber after this day of work...

Of course, it should be : "Well, it works for me. I get "51.5199,-0.0983" in $[red]myCoordsMatches[/red][1]."


Feherke.
 
lol I guessed that was what you meant however I could still not see the correct result in my app... turns out I was also getting dumber as the day goes on!

Thanks for all your help,

James

------------------------
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top