Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Relative URLs - question from a software programmer

Status
Not open for further replies.

BobbaFet

Programmer
Feb 25, 2001
903
NL
Hi guys, thanks for reading my question!

So there is a software project that I am working on that requires me to resolve relative URLs and I thought I had it down pat until the client contacted me saying that it doesn't work for some sites. The client sent along one of the webaddresses where this occured and sure enough it fails there even tho both FF and IE succeed there. That makes me think it's a problem with my software.

Now my question is how to merge base paths and relative paths for websites?

Having read the following RFCs:
- -
I came to the conclusion that the following description is how to resolve relative URLs:

Step 1: Take the directory the currently visited webpage is at.

For example: " is where the page is at, then I should remember to use " as a base absolute path.

So when that index.htm has a picture in it as such:
img src="images/mypic.jpg"

then the full URL of this picture should be:

When there is a BASE HREF="" tag then that should overwrite the webdirectory the webpage is in.

For example: base href=" then the URL should be:
rather than

And this works for every site except this one site but it does work for FF and IE (even tho they resolve different URLs for the pictures that both work!) and it is bugging the living hell out of me.

Now I am going to give you some fictional data as the site concerned is a pornographic site and I do not think it is appropriate to post that here, but it will be accurate for the paths.

----------------------------------------------------------
Website: Base HREF: Image: /promo/089_Skyline/pics/01.jpg
----------------------------------------------------------

This leads me to believe that the images should be found at:

Now, even tho this works for every other site in existance, it doesn't for this one.

IE can load the images and says the URL should be: FF can load the images and says the URL should be:

Now my question is, where am I going wrong with this?

[bobafett] BobbaFet [bobafett]
Code:
if not Programming = 'Severe Migraine' then
                       ShowMessage('Eureka!');
 
Hi

BobbaFet said:
IE can load the images and says the URL should be: FF can load the images and says the URL should be:
I am with FireFox.

However I am unsure what do you mean "says". If you refer to the statusbar text, I would not be 100% sure that is really the requested one. I would take a look at the actual request in FireFox with Live HTTP Headers or FireBug and in Explorer with ieHTTPHeaders or IEWatch.

And that page could contain horrible HTML syntax errors or dirty JavaScript tricks which are handled in different ways in the browsers and your... your... Really, what is your "software project" ? Browser ? Spider ? Does it handle HTML and HTTP through some library ?


Feherke.
 
Image: /promo/089_Skyline/pics/01.jpg
[...]
Now my question is, where am I going wrong with this?
What you're missing is the leading slash in the image URL. That tells the browser to resolve the URL relative to the domain root, rather than to the current page's directory. A lot of people (myself included) write URLs like this - it offers many of the advantages of relative and absolute URLs at the same time.

IE has chosen to ignore the <base> element in resolving the /-led URL, FF has resolved it relative to the <base> domain. Like fehereke, I'm with FF. Does the page have a full DOCTYPE? IE behaviour mey be due to quirks mode.


-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Ah, ok, lots of questions! Time for some answers!


So when the first character of a path is a / then it should resolve to the domain name (ignoring any path info after that)? But why? Why not use the whole ../ bit? Seems, weird to do it like that?

Also, the document type is: loose.dtd

The project is an image spider for a search engine.

[bobafett] BobbaFet [bobafett]
Code:
if not Programming = 'Severe Migraine' then
                       ShowMessage('Eureka!');
 
see: Relative URLs



among other places

Greg
People demand freedom of speech as a compensation for the freedom of thought which they seldom use. Kierkegaard
 
Why not use the whole ../ bit?
Suppose your site is spread around a hierarchy of directories:

example.com
example.com/images
example.com/news
example.com/news/archive/2010stories

You don't want to have to change all the internal links on a page if you move it from the main directory into the archive.

Also, if you're using SSI (or similar) to put standard parts onto every page, it makes life much simpler to use the same urls for things regardless of where you are in the tree. For example: "/images/logo.gif" instead of "images/logo.gif" or "../images/logo.gif" or "../../images/logo.gif" etc.

Of course, you can get the same effect by using absolute URLs, but that takes extra typing/bandwidth and gives you a big job to do should you choose to change domain names in future.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top