How extract href's from the table?

vvv · Jan 11, 2001

Can i extract href's from the table
using modules TableExtract and LinkExtor?

thanks.

tanderso · Jan 11, 2001

Sorry, I don't know anything about those two modules. However, if you explain your question, I can try to give you a method to acheive it. You have an HTML table and you want to retrieve the links from it?
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

vvv · Jan 12, 2001

Yes, you're right,

I've got HTML document, which contains tables

<table .....
<td
<div
<a href="...."; TEXT DATA1 </a>
<a href="...."; TEXT DATA2 </a>
<a href="...."; TEXT DATA3 </a>
and so on

CPAN HTML:TableExtract allows easily extract only cell data TEXT DATA,
how to quickly take out links from given table?

goBoating · Jan 12, 2001

a little pattern matching will do this fairly quickly, with fewer system resources that loading and using the modules..... and it is pretty easy, once you've played with pattern matching a little....... maybe this should be my next faq.

open(HTML,"<HTMLFILE_t

pen&quot

or die "Failed to open file, $!\n";
while (<HTML>) { $buffer .= $_; }
close HTML;

# while we match <table ... some stuff... /table>, catch the table chunk in $&.
# do this in a 'while' in case there are multiple tables.
# <table>.....</table>
while ($buffer =~ /<table.*?\/table>/gis) # find and match all table chunks
{
$table = $&;
# while we match <td ......><a href......>....</a></td>, catch each <a href...>
while ($table =~ /<td.*?(<a href.*?\/a>)\/td>/gis)
{
$href = $1;
print "$href\n"; # do something with what we caught.
}
}

I have not run this, but I think it is good. It might take a little tweaking to make it match the exact structure of the file your are trying to parse.

'hope this helps.

keep the rudder amid ship and beware the odd typo

luciddream · Jan 12, 2001

i would change one thing:

<a href.*?

to:

<a.*?href.*?

you never know how many spaces these silly web designers put in betwwen their a and href, if they put href first at all. adam@aauser.com

goBoating · Jan 12, 2001

Yup. Good idea. I expect that there will need to be a few other tweaks when applied to the specific file structure vvv is parsing. But, thanks for the critique.

keep the rudder amid ship and beware the odd typo

tanderso · Jan 12, 2001

I'm not sure of how TableExtract works or what its functions return, but if you are using that to get the content from your cells, then you won't need to match the entire table as shown above, since your module will do that for you. Instead, just do the matching on the content returned by your TableExtract functions. For example, lets say that TableExtract::function() returned the content of one of your cells, then

my $content = TableExtract::function();

while ($content =~ /(<a.*?href.*?\/a>)/gis)
{
my $href = $1;
print "$href\n"; # do something with what we caught.
}

Of course, if you understand the pattern matching code written by goBoating and you haven't yet committed yourself to using the module, then I would suggest using that instead since you will have more power and flexibility than with using a module.
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How extract href's from the table?

vvv

Programmer

tanderso

IS-IT--Management

vvv

Programmer

goBoating

Programmer

luciddream

Programmer

goBoating

Programmer

tanderso

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor