Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Web Site with No Access to Outsiders

Status
Not open for further replies.

Chinyere

Programmer
Mar 9, 2001
79
US
Hello [afro2],

I need to set up a development web site. This is a test web site and should not be available to outsiders, search engines, and the like. The web site will be viewed by members of my company and our client ONLY.

How to do this?

I am thinking that the best way to do this is by setting up a firewall. Does anyone out there have any ideas on how I should proceed?

It is very important that outsiders and search engines are not able to view this site. Thanks.

Chinyere
 
The search engine issue really isn't a issue. If you don't submit the url to be indexed intot their DB's they will not have it listed for search results
are you able to use ASP?
[bomb]
I may not get it the 1st or 2nd time,
but how sweet that 15th time can be.
admin@onpntwebdesigns.com
 
There is also a metatag that you can place on your files that will prevent search engine and search directories from indexing your pages. Just because you don't submit them to a search engine or direcory does NOT prevent them fromeventually becoming part of their databases.

 
Lexus,

For a crawler to find your pages, there must first be a link to that page somewhere on the internet. A spider will never just stumble across a page that has no links pointing to it and has never been submitted to their database.

The robots meta tag and robots.txt do not stop all search engines from indexing a site. Both depend on the engine and a lot of smaller engines do not respect the robots exclusion protocol so your tags are useless.

Hope this helps Wullie

sales@freshlookdesign.co.uk

 
Wullie:
You say :"For a crawler to find your pages, there must first be a link to that page
somewhere on the internet. A spider will never just stumble across a
page that has no links pointing to it and has never been submitted to
their database."

Interesting! I had an internal two identical files on our corporate website that had NO links to anyplace and NO ONE was linked to them. One had the NOINDEX metatag and the other had no meta tags at all. The only difference in the files were the content in the title tags and the actual titles on the files.

About 6 months later, I was able to find the one with no meta tags in a search on Google and another search engine! These were a test pages for a new design I was working on and only contained text and html. No applets, scripts, or graphics were on these files. No other employee even had knowledge of the files as I am the only one with access to the web server. They were basically stripped-down files.

Any clue how the one became indexed while the other didn't? Thanks...

Lexus

 
Hi mate,

I cannot comment on a precise issue such as this because I don't know all the facts about it.

The biggest reason that spiders find pages that nobody is linked to is that the spider encounters a log file of some sort.

The log file could either be on yours or an external site that one of your pages link to. When you click on the link, the refferer page is shown in the log files, the spider finds this log and indexes the links.

I can definatly tell you that a spider will NEVER stumble upon an unknown page that is not linked from anywhere, unless someone submits it to be indexed, this is simply not possible.

A crawler reads a page, extracts the links and may then later index them. It does not guess url's and therefor will not ever find a page that is unknown to the internet.

I have had this discussion will loads of people. Most people think that the large engines contain EVERY page that is on the internet, they don't. Even if the spider knows about a page that does not mean that it will be included in the index.

I have test areas that spiders have indexed, this was my fault. I posted the url here to show someone the page and then the spiders found the url here and followed it. I also have other test areas that have never been found by a search engine because as I said, they are not linked from anywhere.

Hope this helps Wullie

sales@freshlookdesign.co.uk

 
Chinyere, if you have coldfusion support then do it that way.
What CF allows you to do is either only give certian pages authority to view the page, or either not give certaian pages authority to view certian pages.
Its also not that diffucult to do it that way. I can give a sample of how i'm doing it for one of my clients.
If you don't know CF then you can also do it via httacess, like wullie suggested... I have not failed; I merely found 100,000 different ways of not succeding...
 
If you go to google.com and type this instead of the normal URL :


you will still access the web site like you would normally. What you are doing is specifying the port on which you are going. By default HTTP is set to port 80.

I have two servers set up. one is set to port 80 (standard port for http) and another is set up on port 8080.

To go to the normal one I type and the second I type For testing this is good enough.

If it is very important that you not let anyone access this information you can use apache. Ask in a forum how to set it up with htaccess so it protects it with a password and username.

I hope this helps. Gary Haran
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top