Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Crawl dynamic pages, search dynamic and static pages 1

Status
Not open for further replies.

markask

Programmer
Apr 28, 2005
167
0
0
GB
Hi,

I got many static pages (their content never changed). How to let google not crawl those pages but people can search them by google?
That is, how to tell google that my site is composed by two parts: the static part and the dynamic part, let google only crawl the dynamic pages and update the dynamic part of my site in google's database but not throw away the static part.

Thank you in advance.
 
ok i will mention here all the points that iam aware and presently sitting on the top of head...

1. dynamic pages are ranked lower when compared to static pages.
2. the more complex the query string is (for dynamic pages) the less are the chances that it is being ranked.
3. Spiders prefer not to read the cgi-bin directory
4. Spiders also prefer not to read beyond the character '?' in the URL
5. there are many ways to improve the ranking of your dynamic pages...examples: using mod_rewrite,using server variables Path_Info or Script_Name, and using other patented solutions etc...
6. To make certains parts of your website not to crawl you need to make changes in the robots.txt file...something like this:
Code:
User-Agent: *
     Disallow: /yourpath1/
     Disallow: /mypages/
     Disallow: /HumanResources/

i mean just put the paths that you dont want the spiders to crawl...

7. To exclude all robots from crawling your site...you do this:
Code:
User-Agent: *
    Disallow: /

Hope this helps

-DNG

 
Thanks a lot for your help. I will try what you suggested.

However, one thing about:
>6. To make certains parts of your website not to crawl you need to make changes in the robots.txt file...
For example, a page was crawled by google one year ago and this page has never be changed since that time. I don't want google crawls this page again and again. But I do want google to keep it in his database so that people can reach this page by search in google.
If I not allow google to crawl this page, google will throw this page away (I quess)!!
 
Well I have to refute some of these.

1. dynamic pages are ranked lower when compared to static pages.
2. the more complex the query string is (for dynamic pages) the less are the chances that it is being ranked.
3. Spiders prefer not to read the cgi-bin directory
4. Spiders also prefer not to read beyond the character '?' in the URL
5. there are many ways to improve the ranking of your dynamic pages...examples: using mod_rewrite,using server variables Path_Info or Script_Name, and using other patented solutions etc...
1/ Not in the slightest
2/ Not at all.
Both the above would relate to crawling and rate at which pages are crawled not ranking.

3/ Crawlers will quite readily crawl any folder they can find a link to. They should be disallowed from the cgi-bin if only executables are in there. Many shopping carts however have all the files in the cgi-bin and they never have a problem will the crawlers getting in there.

4/ Again not at all. The "?" is used as an indicator to slow the page request rate. See answer 1.

5/ Nope. Using URL rewrites will not improve rankings. It may however enable you to get more pages indexed.

more info HR thread on static pages

If you exclude bots from pages they will be removed from the index and therefore cannot rank. You cannot control the frequency of how your pages get crawled. Simply leave it alone There may be a point where the static pages will get crawled less. But what does it matter? The navigation on your static pages adds weight to the internal pages they link to. Leave the decision to the SEs whether to crawl the page or not.

SEO forum828

Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
People Counting Systems

So long, and thanks for all the fish.
 
ChrisHirst,

Thank you very much for your help.

1. Did you mean my site is updated totally in google's database each time it crawled my site? If it is ture, google can get only a small part of my site. What I want is to let google append the newly crawled content to the old content of my site existed in its database.
2. Every day google crawled about 300 pages in my site, only about 10 of them were new content. Other search engines do the similar job. That was a big waste.

Any idea?

 
Chris, Thanks for your insight and correcting me. I reread what i wrote and reliazed some wrong wordings that i used.

Thanks...Have a star.

-DNG
 
It may not always update the page at each crawl, it depends on several factors.
The thing to understand is that the crawlers do not index the returned code. A spider/crawler is purely a retrieval agent, it's only function is to return the source code back to the database, several other software elements do the processing and decide on new/updated/old content etc.

Don't worry about the crawlers just grabbing pages, as more get indexed it will help with the overall indexing of your site.

Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
People Counting Systems

So long, and thanks for all the fish.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top