Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cant get Content from URL!

Status
Not open for further replies.

NashTrump

Technical User
Jul 23, 2005
38
GB
hi Guys,

Im trying to look at the html behind a website by using $mech->get($url) command using the use
Below is the code im using and an example output.

$mech->get($url);
$mech->success or die "Can't open page\n";
$content = $mech->content;
print $url;
print $content;


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"
<html xmlns="
<head>

<title>Betting</title>

<script type="text/javascript">

/*<![CDATA[*/

function createCookie(name,value,minutes) {

if (minutes) {

var date = new Date();

date.setTime(date.getTime()+(minutes*60*1000));

var expires = "; expires="+date.toGMTString();

}

else var expires = "";



document.cookie = name+"="+value+expires+"; path=/";

}

/*]]>*/

</script>

</head>

<body>

<script type="text/javascript">

/*<![CDATA[*/

var nRfrPos = location.search.indexOf("saarfr=");

var nCntPos = location.search.indexOf("saacnt=");

var sRfr = location.search.substring(nRfrPos+7, nCntPos-1);

var nCnt = parseInt(location.search.substring(nCntPos+7), 10);

if (isNaN(nCnt)) nCnt = 0;



var sHref;



if (nRfrPos != -1) {

createCookie('jbhckrFix', 'fInSoFt', 1440);

createCookie('ok2prcd', '1', 5); // 5min should be enough

sHref = unescape( sRfr );



if (sHref.indexOf("?") == -1) sHref += "?"; else sHref += "&";

sHref += "saacnt=" + (nCnt+1);

location.href = sHref;

} else {

sHref = location.href.toLowerCase();

if (sHref.indexOf("index.html") != -1)

sHref = location.href.replace("index.html","index.asp");

else {

var sTmp = location.host + "/betting/";

sHref = location.href.replace(sTmp, sTmp + "index.asp");

}



if (sHref.indexOf("?") == -1) sHref += "?"; else sHref += "&";

sHref += "saacnt=" + (nCnt+1);

location.href = sHref;

}

/*]]>*/

</script>

</body>

</html>


However the actual source on the website link is nothing like this!!

Does anyone know what i am doing wrong?

Kind regards

Nash
 
Hi there,

i also sometimes get this:

HTTP::Response=HASH(0x2a0c32c)

have absolutly no idea what this means!!

Regards

nash
 
That is because the HTML is generated by executing javascript commands. The HTML does not come from the server (JS only). Once the JS hits the browser, the browser executes the code and produces the HTML.

See


for a full description of the problem. This is not a limitation inherent to Mechanize only. It can also be found in LWP, HTML::TokeParser::Simple and numerous other modules that are designed for connecting to and parsing the web. AFAIK there is nothing written yet to handle this.

Raklet
 
So in short is there anything i can do to guarentee the return of the HTML?

I can use the other modules, however when using my current version i am getting the html every now and then however not every time...
 
I'm sorry, but I can't really say. At this point I am foundering in deep water. Do a google search for "lwp javascript" - you may find something of interest. There are lots of articles that point to other things, but I never found any definitive answers - just lots of little hints that require more reading and more exploring. Two possibilities that seemed to stand out:



Not much more I can say about this topic.

Raklet
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top