Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

search engine friendly urls

Status
Not open for further replies.

pushyr

Programmer
Jul 2, 2007
159
GB
can you tell me if a crucial search engine like google news would prefer one of the following urls over the other?




i would rather produce the bottom url using apache's RewriteRule, but i'm having some trouble with it.

the top url i'd produced using $PATH_INFO apache's look back function, and if i stick to this url format i would like to know if placing the parameters after the php extension is a bad idea? do you think search engines stop reading after the php extension? i believe they stop reading once they start seeing the query string beginning with the '?' question mark. please correct me if i'm wrong.
 
do you think search engines stop reading after the php extension?
No.
i believe they stop reading once they start seeing the query string beginning with the '?' question mark. please correct me if i'm wrong.
You're wrong.

There are plenty of blogs and CMSes out there that use supposedly "unfriendly" URLs which search engines index just fine. It's easy to test this: try searching for "chrishunt html" (without the quotes). You'll see hits for threads in Tek-Tips with a query string in their URLs.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
hey chris,

i think i read somewhere that google news is specific about the urls that it would index. along with other tight criteria it would index urls without query strings as those without are deemed as more permanent.

to support what further reading about google new's permanent urls i also came across this...

Thank you for your reply. As you may know, Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention. While the sources of the news vary in perspective and editorial approach, their selection for inclusion is without regard to political viewpoint or ideology. Please be aware, however, that we do not currently include sites that are purely news aggregators. Similarly, we do not include sites that are written and maintained by a single individual.

Google News currently gathers articles by crawling online news sites. Following the general technical guidelines below should help our crawler find and index articles from your site correctly:

1. In order for our crawler to correctly gather articles, each page that displays an article's full text needs to have a unique URL that does not change. We cannot include sites in Google News that display multiple articles at the same URL.

2. The URL for each article must contain a unique number consisting of at least three digits.

For example, our news crawler would not crawl articles with the following URLs:

It would crawl these pages:

3. Keep in mind that we cannot include sites for which the URL of the main page includes a date. URLs with dates in them often change on a daily or weekly basis. This prevents us from crawling the site for new content, as we are unable to detect the most current URL to be crawled.

For example, if a URL changes from /novembernews.html to /decembernews.html, Google will continue to crawl the novembernews.html page, and thus not find any new content.

4. Our automated crawler is currently best able to crawl regular HTML links. We are unable to crawl image links or links embedded in JavaScript.

An example of a site that we are able to crawl successfully is Please note that each article on this site has a highly unique and unchanging URL.

Regards,
The Google Team

so i'm really concerned if i should try to use...


or if i can get away with...

 
There's a huge amount of misleading, outdated or just plain wrong SEO advice out there, so you need to experiment and find out the truth instead of relying on what you "think you read somewhere".

A search of news.google.co.uk right now for "obama" returns this URL as its #1 hit:


So it clearly copes perfectly well with query strings.

The main barrier to getting listed in Google News is for Google to recognise your site as a bona fide news source. You can read about submitting a site to GN, including URL and other technical requirements, at



-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
interesting... i certainly see those query strings. i guess a bit of commons sense and experimentation as you point out is needed
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top