html parser

defdefdef · Apr 6, 2007

Hi,
I would like to extract informations from a html page.
For all the IMG tags of the page => alt and scr informations.
How can i do this ???
Thanks,

KevinADC · Apr 6, 2007

look into HTML::TokeParser

http://search.cpan.org/~gaas/HTML-Parser-3.56/lib/HTML/TokeParser.pm

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

defdefdef · Apr 6, 2007

Thanks a lot for your help,
I just begin with perl and i have some problems with the HTML::TokeParser.
I woul like to extract all ALT and SCR informations of the page.
I tried the following code :

use HTML::TokeParser;
$p = HTML::TokeParser->new("page.html") || die "Can't open";

while ($p->get_tag("IMG")) {
my $img = $p->get_trimmed_text;
print "img: $img\n";
}

but nothing .... where is the error ?????? please
how can i extract both alt and scr informations separately
(something with The $p->{textify} attribute perhaps ?)
thanks,

brigmar · Apr 7, 2007

I'll see your TokeParser and raise you a TokeParser::Simple ..

Code:

require HTML::TokeParser::Simple;
my $url = '[URL unfurl="true"]http://www.arsenal.com/';[/URL]
my $p   = HTML::TokeParser::Simple->new( url => $url ) || die "Can't open: $!";
while (my $token = $p->get_tag('img')) {
  print $token->get_attr('src')." => ".$token->get_attr('alt')."\n";
}

I'm assuming you meant the SRC attributes, and not the SCR attributes for an image tag..

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

html parser

defdefdef

Technical User

KevinADC

Technical User

defdefdef

Technical User

brigmar

Programmer

Similar threads

Part and Inventory Search

Sponsor