Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help parsing html tables

Status
Not open for further replies.

terminal2

Programmer
Jul 5, 2006
2
US
I'm trying to write an shell/awk script to parse the table below. I'd like to extract everything within the cells: i.e. category, title, seeders, leechers, etc..

Any help would be appreciated.

<a href="/browse/101" title="More from this category">Audio &gt; Music</a></td>
<td><a href="/torrent/3324221/_ystein_Sunde" class="detLink" title="Details for
Øystein Sunde">Øystein Sunde</a></td>
<td>05-01&nbsp;2005</td>
<td><a href=".torrent" title="Download this torrent"><img src="/img/dl.gif" class="dl" alt="Download" /></a><img src="y.org/img/icon_comment.gif" alt="This torrent has 5 comments." title="This torre
nt has 5 comments." /></td>
<td align="right">378.45&nbsp;MiB</td>
<td align="right">0</td>
<td align="right">2</td>
 
Try html2txt (Just google)

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Hi

Mike said:
Try html2txt (Just google)
There are dozens of tools with that name. But [tt]lynx[/tt] is unique. ( Anyway, some html2txt "tools" are only wrappers around [tt]lynx[/tt]. )
Code:
lynx -dump -nolist /input/file
The problem is, the OP wanted to parse, not to convert.

But probably we will never find out more about this.

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top