Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

text searching -- 3

Status
Not open for further replies.

link9

Programmer
Nov 28, 2000
3,387
0
0
US
hello --

Can someone give me a few pointers on the most efficient way to search out text in fields in my SQL Server database?

I've never been asked to do this before, and before I go read up on it, I would like some input from experts that might be able to suggest the best way to do it -- because I'm sure there's more than one way (isn't there always?)

Basically, I have a field that contains open ended survey questions -- free text that respondents are free to type whatever they want...

I have been asked to provide a text box (this is over the web) where a user can type in a word, and the resulting recordset would contain all records where the field in question contains that word(s)--

Thanks for any input --
Paul Prewett
 
The only way I know of - from a database perspective - is to use the like operator (select * from table where text_field like '%variable%')

Don't expect performance to be great. I would start with like and go from there.

There must be utilities to parse out & index key words. Anybody know of any?
 
Thx, foxdev -- that's exactly what I needed.

If anyone is interested on exactly how this is done, I'll post what I've found. Otherwise, I'll assume I was the only one that didn't know how to do this, and consider this thread closed. :)

Paul Prewett
 
Paul, I keep waiting for a requirement where I'll need Full Text Indexing, but as yet it hasn't appeared. I had to learn the intricacies of SQLS' full-text search capabilities for the test, but I must admit I've forgotten 92% of it.

It would be neat if, once you conquer the mountain, you FAQ'd it so when the rest of us do get the requirement, we can stand on your shoulders. Robert Bradley
Coming Soon:
 
hi link9. i could use this feature also. please go ahead and post your findings...
thanks!
 
Ok then --

Full text searching is not a feature installed by default with SQL Server 7.0 -- (i.e. it doesn't come with the 'typical' or 'minimal' installs), and if you don't have a "full text catalogs" option on your expanded database folder (the one that also has your tables and stored procedures on it), then you will need to install it. It's easy enough. Took me about two minutes. Stick the disk in, run the setup program, and select it on the right on the first screen that gives you options.

Then, there are two minimum requirements of a table that is to be indexed using this feature:
(1) There must be a unique identifier column
(2) There must be onr or more character based columns to be searchable through the full text search

I'll go through the steps to make the full text catalog here using the wizards available on client or server machines. It's worth noting that you can do all of what I'm about to explain using SPROCS -- but needless to say, it's easier this way, so that's what I'll talk about.

Once the feature is installed, restart SQL Server and Enterprise Manager, if needed, and you will see the "Full Text Catalogs" option on your expanded database folder (as mentioned above)--

Right click on it, and create the new catalog -- just go through the motions and follow the instructions telling the wizard what table to use, what column to index on, and what column(s) to search and catalog. It's real simple.

Once you have created the catalog, you must populate it.
Just right click on the catalog, itself, and select the 'start population' option and the 'full population' option under that. After you populate the catalog, you are ready to use it.

Please note that you can set up a schedule for re-population on the same right click menu for the catalog, and that a full population should only be run when you create the catalog, and if all data in the table in question has been changed. Otherwise, an incremental population is available, and can be scheduled to run however often you wish only on records that have changed or been added since your last population.

*phwew*

Ok --
Now, executing a query using the catalog is super easy, too (don't you love this stuff?!?).

There are four keywords that trigger the use of the catalog and they are used with regular old T-SQL statements
CONTAINS
FREETEXT
CONTAINSTABLE
FREETEXTTABLE

I'll hit the CONTAINS keyword here and trust that anyone reading is resourceful enough to his msdn at microsoft for more detailed explanations of the other --

CONTAINS
({column|*}, <'contains_search_condition'>)

is the syntax for building a query, and you use it following the WHERE keyword in a query--

Here's an example that searches my field, 'verbatims', and is looking for the word, 'kids':

SELECT verbatims FROM voc WHERE CONTAINS (verbatims, 'kids')

you can use AND, OR, & AND NOT boolean operators within the 'kids' portion, such as 'kids and pool' or 'kids or pool', which will return exactly what you would expect from such a query.

Anyways -- there are a multitude of advanced search parameters that you can add as you wish to the query to make your results more meaningful and the search more powerful.

I hope this helps someone out, and thanks again to foxdev for pointing me in the right direction.

Cheers! :)
Paul Prewett
 
Hello,
I've tried full-text indexing a column. From my experience it can only be used to find the exact word appeared in the string, but not a sub-string of it. For example, if one record contains the word 'Microsoft', this record will be returned if the query is
Code:
select * from table where contains (' &quot;Microsoft&quot; ')
but not
Code:
select * from table where contains (' &quot;Micro&quot; ')

Is there any way to make use of the full-text index so that it will still return this record using the second query? Thanks a lot! :> Please visit my WebCam!!
going to be
 
Yes, if you enclose the phrase in %'s such as:

%Micro%

then it will find it...

And I still intend to write that FAQ I promised, I've just been busy. I'll get to it.

:)
Paul Prewett
 
IC... thanks..
but how can SQL server do this? what's the algorithm behind the full-text search? Coz I'm testing full-text index on a column of 255 length and 370,000 records. SQL Server is able to return just the relevant records in a bang. This is quite amazing to me and boss. I've tried to find in BOL and on the web but isn't able to find a description of the algorithm, any ideas? Please visit my WebCam!!
going to be
 
You mean what's actually going on behind the scenes of the catalog???

Woooweee --

The whole thing is facilitated by Microsoft's Search Service, and more or less what's going on is that when you create the full text catalog, a new &quot;table&quot; of sorts is created that contains all the words in your database along with references to the records where those records exist -- so when you ask it to find a word, it isn't actually looking in the table you've asked it to search -- rather, it's searching the catalog that it created -- which is why it works so much faster... it's searching exponentially less real estate for the answer to your question.

The MS Search Service then returns all the references to the records that contain the word you searched for, and then the database takes that information, and returns the rows from the database that correspond to those references.

More or less, that's what's happening when you execute a full text search.

I wouldn't even attempt to explain it further.

:)
Paul Prewett
 
oic.. :>
I'll try to look more about the search service then..
I've tried to use the wild card search but it can't return the rows I expected. Using the previous example, my query has changed to this:

select * from table where contains (' &quot;Micro%&quot; ')

but it still can't locate the record. neither can

select * from table where contains (' &quot;_icrosoft&quot; ')

However, this query can return the record:

select * from table where contains (' &quot;%Microsoft_&quot; ')

Does it mean that the full-text index only catalog the whole word, i.e those delimited by a space , but not an actual substring? The reason I'm wondering behind the catalog is that if it can return the record for my (' &quot;Micro%&quot; ') query, then it is really powerful because there is an enormous number of combinations. If it can only cater for discrete words, then the number of combinations are limited for the full-text index.

On the other hand, how full-text index compared with Oracle's context option? I heard of that before but don't know what exactly it can do. can it do a real substring search? Thanks! Please visit my WebCam!!
going to be
 
Oops -- I supplied you with the % operator from memory -- I just cracked my book, and it says to use the * operator for wildcard searches...

i.e. I want to retrieve all records containing psychology and psychiatry --

my query would look like:

CONTAINS (tableName, '&quot;psych*&quot;')

or so I suppose in your case

CONTAINS (tableName, '&quot;Micro*&quot;')


I don't know anything about Oracle, but I'm sure someone else here can answer that for you.

:)
paul
 
Thanks! In fact I just found the usage of * from the book 'Inside SQL Server 2000'. And it can only be used at the end of the word, not in the front. therefore ' &quot;Micro*&quot; ' is OK but ' &quot;*icrosoft&quot; ' is not OK. for Oracle's context option, the same applies. I think the logic behind both is more or less the same. Thanks very much! Please visit my WebCam!!
going to be
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top