Sorry, I don't know anything about those two modules. However, if you explain your question, I can try to give you a method to acheive it. You have an HTML table and you want to retrieve the links from it?
Sincerely,
<table .....
<td
<div
<a href="...."; TEXT DATA1 </a>
<a href="...."; TEXT DATA2 </a>
<a href="...."; TEXT DATA3 </a>
and so on
CPAN HTML:TableExtract allows easily extract only cell data TEXT DATA,
how to quickly take out links from given table?
a little pattern matching will do this fairly quickly, with fewer system resources that loading and using the modules..... and it is pretty easy, once you've played with pattern matching a little....... maybe this should be my next faq.
open(HTML,"<HTMLFILE_tpen" or die "Failed to open file, $!\n";
while (<HTML>) { $buffer .= $_; }
close HTML;
# while we match <table ... some stuff... /table>, catch the table chunk in $&.
# do this in a 'while' in case there are multiple tables.
# <table>.....</table>
while ($buffer =~ /<table.*?\/table>/gis) # find and match all table chunks
{
$table = $&;
# while we match <td ......><a href......>....</a></td>, catch each <a href...>
while ($table =~ /<td.*?(<a href.*?\/a>)\/td>/gis)
{
$href = $1;
print "$href\n"; # do something with what we caught.
}
}
I have not run this, but I think it is good. It might take a little tweaking to make it match the exact structure of the file your are trying to parse.
Yup. Good idea. I expect that there will need to be a few other tweaks when applied to the specific file structure vvv is parsing. But, thanks for the critique.
I'm not sure of how TableExtract works or what its functions return, but if you are using that to get the content from your cells, then you won't need to match the entire table as shown above, since your module will do that for you. Instead, just do the matching on the content returned by your TableExtract functions. For example, lets say that TableExtract::function() returned the content of one of your cells, then
my $content = TableExtract::function();
while ($content =~ /(<a.*?href.*?\/a>)/gis)
{
my $href = $1;
print "$href\n"; # do something with what we caught.
}
Of course, if you understand the pattern matching code written by goBoating and you haven't yet committed yourself to using the module, then I would suggest using that instead since you will have more power and flexibility than with using a module.
Sincerely,
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.