Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

googl.com style keyword search 1

Status
Not open for further replies.

dnador

Technical User
Jun 8, 2002
5
US
i have an situation im wondering if anyone can point me in the right direction...they are wanting to do a googl.com style keyword search on some varchar2 fields. I have no idea how google can get a return so fast. I have several ideas in oracle. For instance...if I take the keywords provided in a list and break them out into every combination of and/or clauses with a constant attached that is the number of and's divided by the number of total keywords, then run them all as a big union. This would allow me to return results with a relevancey rating based on the constant. I would have to remove dupes as well before returning a cursor type. Im pretty sure I can figure this out, but im not sure the best way to design the tables. Initially we have a big description field that would have to be searched with the big union above. This seems to me like it would take lots of time and I don't think indexing the description field would do any good. I also thought of breaking the description field down into single words with id's attached so I can use equajoins instead of likes. Then I could index the varchar2 field, but it would be adding a ton of rows. Have you ever done this sort of thing or know anyone who has?

I appreciate your help...

Dave
 
It seems that you are on your merry way to build a search engine.
However, several good ones already exist, like Autonomy and Verity.

Good Luck,
Dan
 
Thanks...but I want to do it internally if possible...
 
Have you looked into Oracle Text (Known as Intermedia Text in 8i)? This is part of Oracle, not an extra cost product. I haven't used it and don't really know much about it, but since it comes with Oracle, it might be worth checking out. You can view the manuals at:

 
What version of the database are you using?

Oracle 7/8 has a Context Cartridge
Oracle 8i has Intermedia text
and 9i has Oracle Text...

If you are using 9i you have a couple options. There are
three main index types. Context, ctxcat and ctxrule.

Context indexes are the most flexible, but they are not
transactional. That means that you need to run a seperate
command every so often [ctxsys.ctx_ddl.sync_index()] to
keep the dml changes to the data in sync. Context indexes
may be partitioned (globally or locally) and they support
the widest query options (NEAR, about, fuzzy, etc) though
in 9iR2 you can use context grammer on a ctxcat index.
Context indexes use the CONTAINS() operator for searching.

Ctxcat indexes are transactional, but they require more
space and take longer to generate than context indexes.
They do not support partitioning of the index so if you
have partitioned indexes that would benefit from partition
elimination on the index, it might be better to go with
context indexes. Ctxcat indexes make "structured" queries
faster. Those are queries where you need to find records
containing some text but also limit other colums in the
table with a where clause. Ctxcat indexes use the
CATSEARCH() operator for searching.

Ctxrule indexes index xml data. I've never used them
so I don't know much about them. Ctxrule indexes use the
MATCHES() operator for searching.

simple example:
Code:
/* create a context index */
CREATE INDEX EMP_T1 ON emp(resume)
INDEXTYPE IS ctxsys.context 
PARAMETERS ('datastore ctxsys.default_datastore 
             filter ctxsys.null_filter 
             section group ctxsys.null_section_group');

/* do a text query on the table */
select * from emp
where contains(resume, 'dba', 1) > 0
order by score(1)

If you have more questions let me know. The text application guide and text reference on tahiti.oracle.com are very useful guides to using Oracle text.

There are a ton of parameters that you can set by creating
text index attributes. Text indexes consist of underlying
tables that contain the keywords and you can specify
storage options such as tablespace with the text parameters
ctxsys.default_stoarage just sticks all the tables in the
same tablespace as the data. Using the TABLESPACE option
of create index will have no effect on where the tables
are created.

If you have more questions let me know. Depending on the
number of records, the format of the data and the way you
set up your indexes, oracle text can be very fast. I
search through 9 million short documents in < 1 sec.

.
.. Eat, think and be merry .
... ....................... .
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top