Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting only Certain Links from a search 1

Status
Not open for further replies.

NashTrump

Technical User
Jul 23, 2005
38
GB
hi there,

I need to search a website and only extract certain links from it.

i want to search for the Url containing something like:

Also some of the links are about two clicks away from the start URL, how do i go into links and search that the linked page?

thanks for your help in this!!

Nash
 
something like ... I see [3eyes]

What have you got so far? And is this liable to violate the sites Terms of Service?

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Sorry Paul i have no idea what your talking about.
 
What code have you got so far?

And is what you're proposing to do with the site "like" stanjames.com liable to violate the site's terms of service.

From what you've posted I could assume many things, but a site like this doesn't "do work for free", we help people to do their own work, and we don't help if it's likely to prove unethical, illegal or amoral.

You make an effort, let us know what you need it for, we help, that simple

Hopefully that clears it up ;-)

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Ok then...
im looking to understand how to search for a certain piece of text...

How do i check to see if its ethical to do what im proposing?

I heard about reading the robots.txt file..
is that the way?

Can you help me understand whats legal and what crosses boundaries so i dont do anytihng im not supposed to do.

 
Ok, i think i need to use the grep function:

@imgs = grep(!/m/http:\/\/ @imgs);

where @imgs is my array of links.
I need to check each link, and say if it starts with:


Then keep it..

otherwise discard it from the array.

Im getting an error of:
Bareword found where operator expected at /perl/site/lib//findlinks.pm line 42, near "/m/http"
(Missing operator before http?)
syntax error at /perl/site/lib//findlinks.pm line 42, near "/m/http"

Im obviously very crap at this so would appreciate help.
To let you know what im doing here, im simply searching the site for links that contain bets.. cause i want to extract the odds from the site so that i can compare them to other sites to see which site i should bet on.

I've read the terms and conditions and i cant find anything that tells me that this is not allowed.
 
well this line:
Code:
@imgs = grep(!/m/http:\/\/[URL unfurl="true"]www.stanjames.com\/betting\/index.asp\?/i,[/URL] @imgs);

should maybe be:

Code:
@imgs = grep(#[URL unfurl="true"]http://www\.stanjames\.com/betting/index\.asp#i,[/URL] @imgs);

You have '!' at the beginning of your regexp which means "if the following does not match" so the lines that did not match the regexp would be kept in the array instead keeping the lines that do match. And the syntax was just wrong anyway. I used # for the regexp delimiters so the forward slashes don't need escaping making it easier to read, but the dots should be escaped although the regexp would probably still work OK if they weren't escaped. Dots have special meaning in a regexp, means to match any single character. To match a literal dot you escape it: \.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top