Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

question about doorway pages and 1 about .htaccess DirectoryIndex...

Status
Not open for further replies.

spewn

Programmer
May 7, 2001
1,034
i run all my pages from a .pl file, that generates the html seen on a page.

my site runs like this...

' goes to the index default file, index.html. on the index.html page there is the meta tags and then a javascript that changes the location to the site.pl with the parameters...

ex.


<HTML>
<HEAD>
<TITLE></TITLE>
<META NAME=&quot;KEYWORDS&quot; CONTENT=&quot;&quot;>
<META NAME=&quot;DESCRIPTION&quot; CONTENT=&quot;&quot;>
<META HTTP-EQUIV=&quot;content-type&quot; CONTENT=&quot;text/html; charset=ISO-8859-1&quot;>
<META HTTP-EQUIV=&quot;&quot; CONTENT=&quot;no-cache&quot;>
</HEAD>
<BODY>
<form name=&quot;go&quot; method=&quot;post&quot; action=&quot;/cgi/count.cgi?eyc&quot; >
</form>
<script>document.go.submit();</script>
</BODY>
</HTML>


Now my question is this...will the spiders (esp google!) read the index page and follow the javascript to the file? of course, this goes to a counter that then proceeds to the main site.pl file.

this is wrong?

how about if this is in there instead...


<BODY>
<script>location.href='wherever.html'</script>
</BODY>


will it follow then?

or how about if i have a meta tag that redirects it using the refresh...will it follow then?

and finally...

if you know about the .htaccess, how about if the DirectoryIndex points to the site.pl directly...will the spiders follow?

any help is appreciated...

- g
 
Hi mate,

DirectoryIndex index.pl [wink]

Thats the best way you can do it. Spiders will not read javascript, it would be too easy for them to get re-directed all over the place and end up in an eternal loop.

The set-up with the javascript is pointless, I don't undertstand why you would want to do it that way.

Hope this helps

Wullie


The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell
 

well, the site was set up before i even knew what the .htaccess was, and using the javascript to take them to the site.pl file was the best solution.

so, i have to use index.pl instead of site.pl, huh?

thanks!

- g
 
one more thing...

will google read the whole url, including the parameters? i read that it only will read up to two variables, and anything above that will not be read...true?


ex.


(this would be good)

vs.


(this would be bad)


i don't see how limiting the parameters would make sense.

any help?

- g
 
Hi spewn,

Your question is a frequently asked question at the google forum. It is answered with the same (from) google techs:

&quot;If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small. &quot;

The general consensus in the google forum is to NOT use parameters. Their isn't a written guideline from google about this, well...not explicitly stating how many parameters they will, if any, crawl. In fact they don't even state that they index .pl pages. I know they do because some of mine are indexed. But here is about the only documentation I can find at google on the subject (a side from google forum questions and answers).

&quot;Fiction: Sites are not included in Google's index if they use ASP (or some other non-html file-type.)
Fact: At Google, we are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri&quot;. You would think they would have included the perl .pl extension. hehe

I have tracked google crawling some of my forums .pl pages with less than three parameters, but have not seen it crawl pages with more.

If you have questions along these lines and don't find the answer here you can check out the google User Support Discussion Forum at
If you find an exact answer to your question, there are thousands of people, *g* maybe hundreds of thousands of people, who would like to know an exact answer. Until the answer is found, the general unwritten rule is the fewer parameters the better and none is best.

mike
 
Hi guys,

Set-up the server to process the URL's as a &quot;fixed&quot; rather than dynamic address.

You can do this in every server-side language that I know of and it displays URL's in the form of etc etc.

To a search engine, this looks like a directory structure but it is actually variables in the URL and those &quot;directories&quot; do not exist.

One frequent problem with this method is that you must specify either a base href tag in every document or use absolute paths to images.

Hope this helps Wullie


The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell
 

oddly enough, this is where i was going with this...but how do you make the server understand that it's a url not a file location...i saw the apache mod url changer, but is there something less complicated?

- g
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top