Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Index Server Catalog / Directory Help

Status
Not open for further replies.

fcoomermd

Programmer
Nov 20, 2002
218
CA
This is the scenerio:
I have heirarchy of folders, with files at the very bottom of the tree of folders.
It looks like this
1996->19960101->fileFolder->files
19960202->fileFolder->files
19960302->fileFolder->files
19960402->fileFolder->files
19960502->fileFolder->files
and so on...
the 19960101 represeents 1996=year, 01=month, 01=day
now, initially i had these all gouped in a catalog at the root of 1996
so the catalog had the directory 1996, and all underlying folders and files.
The problem here was, the quering was slow and inaccurate. That is because, the directory contained over 200,000+ documents.

Here is what I want to do, but have tried many ways, and was unwilling to do it. I want to break it down into quarters
ie
catalog 1996q1 would have the following included in the directory:
1996->19960101->fileFolder->files
19960202->fileFolder->files
19960302->fileFolder->files

catalog 1996q2 would have the following included in the directory:
1996->19960401->fileFolder->files
19960502->fileFolder->files
19960602->fileFolder->files

Now, I can not restructure the files, because they reside on a seperate machine, and other systems are
dependant on how they are set up.

Is there a way to do this... If more information is needed, respond back... I have been stumped on this one for awhile.
Thanks in advance for any help.
 
You don't say what sort of files these are. If you have the ability to modify the files and they are HTML files then it may be that you can add META information to them and use META searches to improve accuracy.

If speed is an issue then look at putting Index Server on it's own dedicated server (usually it lives on the same box as the web server) and give it plenty of RAM :)

Make sure you only index documents that you need. Index Server, by default, will index files with unknown extensions. Disabling this will improve performance.

You can tune Index Server to reflect whether there are lots or few queries and whether the document base changes a lot or infrequently.

To improve the accuracy of the results Index Server has it's own Query Language as well as supporting SQL queries. You can use this to help limit the results. For example we index html files on our intranet, but we want to exclude files that purely consist of hyper-links (menus and so on). These files all contain "menu" in the file name, so we filter those out with the Query Language. It seems to me that you have some basis for filtering based on file names.

I'm not sure if any of that helps or if it gives you some other areas to think about.

Good luck

Rob W

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top