Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

WWW:Mechanize or HTML:TokeParser or what?

Status
Not open for further replies.

Tarnish

Technical User
Nov 13, 2006
221
US
Hi all,

I'm not proficient with Perl at all, but I have a wide array of other programming experience...

I'm trying to scrape a web page. Specifically, I'm trying to scrape data from a table with two columns. The first column holds one date per row (ex. 9/14/2009). The second column holds a link. From the second column, I need to get the href value and put that in a variable and then I need to get the text between the a and /a tags and put that in a variable.

I found some based code that pulls links from a page and it works fine, but I can't find any indication that Mech can traverse to specific parts of a page returned by get($url).

What I am trying to do is go to a table with a certain id or class (table described above) and loop through each row and pull the data and then add it to an rss feed (I know how to add it to the feed), then move to the next row, until there are no more rows in the table.

Would Mech do that? I found some old code that supposedly does that but it isn't setup for a table and I think the fact I'm using a table is breaking the code.

The mech code is here:

The 'old' code is here:

Any help would be greatly appreciated.
T
 
Mechanize seems to be aimed at getting the page and form field contents, but not the page text. Perhaps HTML::parser is what you need to parse the contents of $mechanize->content, or you could parse the contents of the table by hand using your own regular expressions.

Annihilannic.
 
Thanks for the replay Annihilannic.

I finally got something pieced together. The important piece is the HTML::TableExtract.

Thanks again.
T
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top