Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Activating Script on another Url 1

Status
Not open for further replies.

NashTrump

Technical User
Jul 23, 2005
38
GB
Hi there,

Ive been learning perl for a while now, and im trying to teach myself how to scrape websites etc.

As bookmakers have a great deal of information on their sites i started there.

However I noticed they have a script which changes the display of odds.

The change display dropdown is halfway down the page. If i wanted to extract odds or information in certain types of displays how do i change that script so i can extract the correct info?

I tried get $url once i changed to url to what i thought the script was accessing but it didnt work..

Has anyone got any ideas how to action a script on a URL?

Kind regards

Nash
 
What kind of script are you talking about? If it's some kind of client-side scripting (ie JavaScript), you may not be able to do much about it. If it's server-side scripting (Perl, PHP, etc.) then...

If you get the URL to the page printed out by the server-side script, you'll wind up with a bunch of HTML code (the same code you'd get if you did View Source on the page you want). Then it's just a matter of regular expressions to find what you want out of that data (or you can go as far as to use HTML::parser's and stuff).

Here's an example of an HTML parsing regexp from a module I was making for MySpace:

Code:
my %regs = (
	messageFrom     => '<span class=\"text\"><a href=\"http:\/\/profile\.myspace\.com\/index\.cfm\?fuseaction=user'
		. '\.viewprofile\&friendID=(\d+)\">(.*?)<\/a><\/span>',
);

if ($html =~ /$reg{messageFrom}/i) {
   my $FriendID = $1;
   my $Username = $2;
}

<< Regular Expressions Tutorial
 
I prefer the excellent module for scraping websites. In order to retrieve the values from StanJames in Fraction, Decimal, or American format, you have to know a few things: the number that the form is on the page, the name of the form field, and the value to be passed to the form. Then, you must use to submit the form. Once it is submitted, you can then retrieve the results. Here is an example script that retrieves the results for Greg Biffle out of the Motorsports section. It retrieves the results in each of Fraction, Decimal, and American notation. The script only grabs the relevant section of html source. I will leave the extraction of the text values themselves as an exercise for yourself.

Code:
use warnings;
use strict;
use [URL unfurl="true"]WWW::Mechanize;[/URL]
$|++;
my $mech = new [URL unfurl="true"]WWW::Mechanize;[/URL]
$mech->get("[URL unfurl="true"]http://www.stanjames.com/index.asp");[/URL]
$mech->success or die "Can't open page\n";
foreach ('Fractions', 'Decimals', 'American') {
    $mech->submit_form(
        form_number => 5,
        fields => { 'priceFormatsDD' => $_, },
    );
    my $content = $mech->content;
    $content =~ s/.+?(\<h3\>Motorsports\<\/h3\>)/$1/si;
    $content =~ s/\<\!-- botmpl --\>.+$//si;
    $content =~ s/.+?(\<tr class\=\"row0\"\>)/$1/si;
    $content =~ s/(\<\/tr\>).+$/$1/si;
    print $content;
    print "\n\n***************************************\n\n";
}
 
Thank you Raklet!!

Your post was extremely informative!
It wasnt exactly how i wanted my script to be built but hey, you guys are the experts!! :)

I shall now try to learn everything i can about
Thank you again i really appreciate it.

Nash
 
How do you want it built? Give a little detail and a sample of your code and I will see what I can do to accomodate you. If you want to pursue Mechanize, make sure you read read through the Cookbook, Examples, and Documentation all found on CPAN.



A very good example to get you started is quotes.pl

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top