Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing HTML Tables

Status
Not open for further replies.

FinnMan

Technical User
Feb 20, 2001
75
US
I'm working with some web pages that contain a lot of tables. Apart from writing some extensive regex, does anybody know of any tools or scripts out there for use with parsing tables from web pages. Perl has a module for it but it's not really up my alley so to speak.

Thx,
FM
 
Perhaps lynx ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
I have a small awk program that will do simple parsing of tables; it won't handle tables within tables.
 
I can't believe I racked my brain on this. Lynx was the solution. I used the -dump option. The end result is I get one nice long page instead of the double column page that existed. This makes it much easier to strip and parse.

On a side note, I did discover html2text. It's a wonderful utility although the formatting was not in a manner that useful for parsing tables.

./FM
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top