Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

html parser

Status
Not open for further replies.

defdefdef

Technical User
Jul 4, 2005
19
0
0
FR
Hi,
I would like to extract informations from a html page.
For all the IMG tags of the page => alt and scr informations.
How can i do this ???
Thanks,
 
Thanks a lot for your help,
I just begin with perl and i have some problems with the HTML::TokeParser.
I woul like to extract all ALT and SCR informations of the page.
I tried the following code :

use HTML::TokeParser;
$p = HTML::TokeParser->new("page.html") || die "Can't open";

while ($p->get_tag("IMG")) {
my $img = $p->get_trimmed_text;
print "img: $img\n";
}

but nothing .... where is the error ?????? please
how can i extract both alt and scr informations separately
(something with The $p->{textify} attribute perhaps ?)
thanks,
 
I'll see your TokeParser and raise you a TokeParser::Simple ..

Code:
require HTML::TokeParser::Simple;
my $url = '[URL unfurl="true"]http://www.arsenal.com/';[/URL]
my $p   = HTML::TokeParser::Simple->new( url => $url ) || die "Can't open: $!";
while (my $token = $p->get_tag('img')) {
  print $token->get_attr('src')." => ".$token->get_attr('alt')."\n";
}

I'm assuming you meant the SRC attributes, and not the SCR attributes for an image tag..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top