Robot files 3

robert89 · Sep 7, 2004

I have a bilingual splash page. Want to know if it is possible to create a robot.txt file and have the spiders start at the index page for each of the languages. In other words, have the robot skip the splash page.

Any thoughts would be appreciated.

Thanks,
Bob

Foamcow · Sep 7, 2004

As mentioned in your previous thread, you don't need to make the spider skip your splash page.

Just make sure there are plain HTML links through to your content pages from the splash page.
The spider will then crawl to them and index them.

Search engines index and list pages and not sites as a whole.
If the search engine doesn't find content on your splash page then the page won't show up in search results.

Just make sure that the spider can get to your other pages.

http://www.foamcow.com

LFI · Sep 9, 2004

foamcow,

by plain HTML links, do you mean the HREF in Anchor tags or does the URL need to appear in the visible content?

More specifically, will all four of the following be crawled (and, if so, what is an example of something that WOULDN'T be crawled)?

(1)

Code:

<a href="[URL unfurl="true"]http://www.mydomain.com/page1.html">http://www.mydomain.com/page1.html</a>[/URL]

(2)

Code:

<a href="[URL unfurl="true"]http://www.mydomain.com/page1.html">Page[/URL] 1</a>

(3)

Code:

<a href="#" onclick="document.location='[URL unfurl="true"]http://www.mydomain.com/page1.html'">Page[/URL] 1</a>

(4)

Code:

<a href="#" onclick="window.open('[URL unfurl="true"]http://www.mydomain.com/page1.html')">Page[/URL] 1</a>

Thanks.

--Dave

ChrisHirst · Sep 9, 2004

1 & 2 will be crawled and followed by the spiders. For a text link it is better to use a descriptive or keyphrase for the anchor text
so

Code:

<a href="[URL unfurl="true"]http://www.mydomain.com/page1.html">Blue[/URL] Widgets</a>

3 & 4 would be crawled but not followed because crawlers do not trigger javascript events or scripts.

robots.txt is an exclusion protocol and only tells the bots not to do something.

Chris.

Indifference will be the downfall of mankind, but who cares?

http://www.candsdesign.co.uk

A website that proves the cobblers kids adage.

http://www.cram-system.com

Nightclub counting systems

So long, and thanks for all the fish.

LFI · Sep 9, 2004

Very informative, Chris. Thanks.

When you say "crawled but not followed," do you mean that (in my examples (3) and (4))
'

http://www.mydomain.com/page1.html'

would be recognized as a related site, but links ON page1.html would not be included (since the spider doesn't follow the link to that page)?

Thanks again.

--Dave

ChrisHunt · Sep 9, 2004

What he means is - the crawler will read the <a> tags, but it's only going to look at the [tt]href[/tt] attribute. It won't execute the code in an [tt]onclick[/tt] attribute, so it won't follow any links that are embedded in that attribute. So the crawler will interpret (3) and (4) as links back to the same page (with a # tacked on the end).

-- Chris Hunt

http://www.mcgonagall-online.org.uk

http://www.napitalia.org.uk

http://www.leicesteryha.org.uk

LFI · Sep 9, 2004

'makes more sense. Thanks, Chris!

Thanks both of you. *'s!

--Dave

Foamcow · Sep 10, 2004

Yeah.. was away yesterday so I didn't get to answer.

But by plain HTML links I meant normal HREFs without any JavaScript. You should use descriptive text for the link as Chris said.

http://www.foamcow.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Robot files 3

robert89

Technical User

Foamcow

Programmer

LFI

Programmer

ChrisHirst

IS-IT--Management

LFI

Programmer

ChrisHunt

Programmer

LFI

Programmer

Foamcow

Programmer

Similar threads

Part and Inventory Search

Sponsor