Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Web Proxy 1

Status
Not open for further replies.

guitardave78

Programmer
Sep 5, 2001
1,294
GB
For various reasons I am writing a Web based proxy (You knowthe idea, bypass firewall etc)
There are no dubious reasons, we are just having firewall issues at work and I am unable to test work!!

So far I can get to the pages and replace all the hrefs in a page with the new proxy href and querystring.
What I need to be able to do is check for relative images, stylesheets and javascript sources and replace them with the full url of the site.

Any ideas how I can do this? The code I have so far is

Code:
<%
function getPage(url)
	on error resume next
	dim xmlDoc,getPage,f
	set xmlDoc = server.createObject("MSXML2.ServerXMLHTTP")
	xmlDoc.Open  "GET", url, false 
	'xmlDoc.setRequestHeader "Cache-Control","no-cache"
	xmlDoc.Send
	getPage = xmlDoc.responseTex
	getPage = replace(getPage,"href=""","href=""[URL unfurl="true"]http://www.YOURDOMAIN.co.uk/proxy/?q=")[/URL]
	getPage = replace(getPage,"href='","href='[URL unfurl="true"]http://www.YOURDOMAIN.co.uk/proxy/?q=")[/URL]
	
	f = "<style>body{padding-top:0px;margin-top:0px;}</style><form method='get'><input type='text' name='q' id='q' value='"&url&"' size='70'> <input type='submit' value='Go' ></form>"
	
	
	if instr(getPage,"</head>") > 0 then
		getPage = replace(getPage,"</head>","</head>"&vbcrlf & f)
	else
		getPage = f & getPage
	end if
	
	'xmlDoc.close
	xmlDoc = nothing
	if err.number < 0 then
		response.write(err.description)
	end if
end function

dim q
q = request.querystring("q")
if q <> "" then
	if left(q,7) <> "[URL unfurl="true"]http://"[/URL] then q = "[URL unfurl="true"]http://"[/URL] & q
	response.write("")
	response.write(getPage(q))
else
	response.write(getPage("[URL unfurl="true"]http://www.yahoo.com"))[/URL]
end if
%>

}...the bane of my life!
 
1. First you will need to parse the requested URL properly and split it up into:

full: domain: dir:
2. Then use a regular expression to return an array of URLs in context - e.g. src=".." href=".." etc this will give you something to iterate through

3. Then you can prepend the above and use the full url to request the page, the domain url for absolute urls (from root dir e.g. /path/script.js), and the directory url for relative urls (../../etc). Any absolute urls ( can be left as is.

4. Add your own link to the front of this and then locate the original element and replace (e.g. use replace())

Also, you may want to use the content type responded back from the url to decide whether to do any parsing or not, and also to feed back to the requesting browser.. for example, images will need to have a different content-type to html pages and will not have any links in them, so don't need parsing. (also note that you may want to do binary writes for image data)

You will need to think of other things along the way as these are just rough thoughts, but this should get you moving.

You may be able to do a lot of the replacement within a regex but it would probably be quite complicated - and may not necessarily be as efficient.

Hope that helps


A smile is worth a thousand kind words. So smile, it's easy! :)
 
ok am on the right route with this sort of thing.
Code:
dim RegularExpressionObject
	Set RegularExpressionObject = New RegExp
	
	'replace src
	pattern = "img src=""|action='|action="""
	With RegularExpressionObject
		.Pattern = pattern
		.IgnoreCase = True
		.Global = True
	End With
	set Matches = RegularExpressionObject.execute(getPage)
	For Each Match in Matches   ' Iterate Matches collection.
		'temp = Replace(temp, Match.Value, "<span class='highlight'>" & Match.Value & "</span>", 1, 1)
		temp = mid(getPage,Match.FirstIndex,20)
		if instr(temp,"[URL unfurl="true"]http://")[/URL] <= 0 then
		'response.write(Replace(getPage, Match.Value,Match.Value& "../",Match.FirstIndex-len(Match.value) + 1,1))
			getPage = getPage & Replace(getPage, Match.Value,Match.Value& url & "/",Match.FirstIndex-len(Match.value) + 1,1)
		end if
	Next

}...the bane of my life!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top