Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

google shows my site links (site:http://...) but only the links...

Status
Not open for further replies.

spewn

Programmer
May 7, 2001
1,034
so, yesterdy i was only able to find my main page on google by doing:

site:
it showed only 1 result, my main page. now today i checked, and it shows 115 results, so i assume a google 'bot' made it to my site and indexed(?) it...

however, the 'newly' found links only show the page address and the 'similar page' option, unlike the original main site result which shows the page title and a little description of it, plus the page address, the page size, and the 'cached' and 'similar page' options (like normal).

so, my question is, why doesn't it show the newly found links as a regular result, complete with all the regular info?

does google crawl in stages? it seems like it took the page i originally submitted, then a couple weeks later it crawls and adds the links from that page, then a couple weeks later it actually crawls each linked page?

any ideas?

- g
 
They are known as PIPs (Partially Indexed Pages) Many causes of this.

To check if they are really PIPs (pages found but not indexed) do a
site:hostname word_found_on_all_pages

Then see if a snippet, a description or Supplemental Index is shown.

The sequence is apparently

Page is crawled and added to the DB, crawler goes away.
The page in the DB is analysed by several different software algos and the words indexed. It's generally accepted that an inverted index is used.
The links found are added to a crawl scheduler. this happens within hours. My logs show new links/pages added to a site can get visited around 12 hours later. This depends entirely on the site, I have seen it take weeks.
once analysed the page is added to the live DB and replication begins. This can take between 18 and 48 hours.

The anomalies you see with different results is the result of the "Round Robin" method of Google's load sharing around the DCs and unsynched Datacentres. This is what caused the now defunct monthly "Google Dance" now it's a continuous process.

Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
People Counting Systems

So long, and thanks for all the fish.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top