Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with TBODY and XPaths 1

Status
Not open for further replies.

yeddish

IS-IT--Management
Jul 18, 2007
40
US
OK. I am seriously making some headway here, but now I have another funky issue. Maybe someone can shed some light on this one for me.

When I look at my XPath in XPather in Firefox, it displays it with a /tbody after a /table, but that tbody does not appear in the actual source code. When I use the XPath with or without the /tbody there, it doesn't work. The only place that my XPaths get hung up (that I can tell) is on tbody "tags".

Has anyone else ever had an issue like this with XPaths?

Thanks.
-Joel
 
Quick update...

It seems that the tbody tags only appear when I view the code through the DOM Inspector (which XPather works inside of) in Firefox or through another FF addon called "View formatted source".

But, either way that I try my XPath does not work.

Odd...

Any suggestions?

===============
Code:
===============
use strict;

use LWP::Simple;
use HTML::TreeBuilder::XPath;

my $tree= HTML::TreeBuilder::XPath->new;
my $html = get("
$tree->parse($html);
my $xpath = '/html/body/div[@id=\'wrapper\']/div[@id=\'page\']/div[@id=\'content\']/div[@id=\'area2\']/table[3]/tbody/tr[2]/td[2]/text()';

if ($tree->exists($xpath)) {
print "$xpath: Exists!\n";
} else {
print "$xpath: Doesn't exist!\n";
}
 
unless I'm being stupid about what you are trying to do (which has been known to happen :) )
Code:
my $xpath = '/html/body/div[@id=\'wrapper\']/div[@id=\'page\']/div[@id=\'content\']/div[@id=\'area2\']/table[3]/colgroup/tr[2]/td[2]/text()';

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
I would also note these two tidbits
$tree->dump
will show you the exact tree breakdown as xpath see's it (and shows some other funky numbers that might be used to go directly to the data?????)

and I saw this in the documentation

$h->delete()
Detaches this element from its parent (if it has one) and explicitly destroys the element and all its descendants. The return value is undef.

Perl uses garbage collection based on reference counting; when no references to a data structure exist, it's implicitly destroyed -- i.e., when no value anywhere points to a given object anymore, Perl knows it can free up the memory that the now-unused object occupies.

But this fails with HTML::Element trees, because a parent element always holds references to its children, and its children elements hold references to the parent, so no element ever looks like it's not in use. So, to destroy those elements, you need to call $h->delete on the parent.



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
travs69, you are a gangster ninja...

It was the colgroup thing. That did it.
That is so wierd that these other programs are replacing colgroup with tbody. I suppose that they are doing it to clean the HTML up and make it better code...

Thanks again, man!
-Joel
 
No problem - and I will have to say that's the first time I've ever been called a gangster ninja!
[pimp] [karate]

I'm not sure about the tbody thing. When I look up Xpath everyone is talking about.. but I'm not sure where it is coming from.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top