Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do you programmatically expand a url link to its true location?

Status
Not open for further replies.

Athanasopolous

Programmer
Jun 25, 2005
40
US
How do you programmatically expand a url link to its true location?

Do you know about tinyurl.com and baidu.com? Baidu.com is a search engine that tries to discourage people from using their web site to make metasearch engines by hiding their links in a way that is a lot like how tinyurl.com works. TinyUrl.com is a web site where, if you want to present someone with a link to something and that link is long, you can use tinyurl to produce a tiny url for presentation purposes.

Anyway, what I want to do is to find a way to programmatically take the link (the first link in a search for Jessica Alba using baidu.com) and have it return the actual link, . That is just one example. What I want to do is not specific to Jessica but for using Baidu.com as part of my group of search engines in my meta search engine project.

Maybe there is a way of using the WebBrowser class but I did not see a member that was the URL.

Maybe there is a way of using WebRequest and WebResponse.
 
Looks to me like baidu is encrypting the actual url, passing it around as a querystring parameter and decrypting as required. You'd have to know how they're encrypting/decrypting the url to get it back out of the querysting and I doubt they're going to share that with you unfortunately.

Rhys

"Technological progress is like an axe in the hands of a pathological criminal"
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe"
Albert Einstein
 
These url services just redirect from the shorter/encrypted url to the actual url. Open up Fiddler and watch what happens.
A GET is sent to which is returned with a status of 302 (which means redirect to) and there's a header called "Location" that says where to redirect to.
Code:
HTTP/1.1 302 Found
Via: 1.1 WEB-DH-PROXY
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 222
Expires: Fri, 30 Nov 2012 17:06:41 GMT
Date: Thu, 29 Nov 2012 17:06:41 GMT
Location: [URL unfurl="true"]http://baike.baidu.com/view/270790.htm[/URL]
Content-Type: text/html; charset=iso-8859-1
Server: Apache
Cache-Control: max-age=86400

The browser will then ask for as it has been told to.

You can use the WebRequest object pretend to be a browser, make the request and read the headers back from it. You don't need to follow the link, if you don't want to.

hth

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top