Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Oracle Text: Using CONTAINS

Status
Not open for further replies.

Ascalonian

Programmer
Jan 4, 2008
264
US
I am trying to use Oracle's Text searching but running into an issue.

Say I have a table that has a CUSTOMER_NAME column in it.

Now say I have several values in here, all starting with "Bank Of" and some name. For a good example, let me use "Bank Of America".

Now, if I do the following, I get results
Code:
SELECT *
  FROM some_table
 WHERE CONTAINS(customer_name, 'Bank Of') > 0

However, if I remove the 'f' from 'Of', it breaks
Code:
SELECT *
  FROM some_table
 WHERE CONTAINS(customer_name, 'Bank O') > 0

I do not understand why removing the 'f' breaks this. Any ideas?
 
I am sorry, i forgot part of my code...

Works:
Code:
SELECT *
  FROM some_table
 WHERE CONTAINS(customer_name, 'Bank Of') > 0
   AND customer_name like 'Bank Of%'

Broken:
Code:
SELECT *
  FROM some_table
 WHERE CONTAINS(customer_name, 'Bank O') > 0
   AND customer_name like 'Bank O%'
 
I'll assume you mean 'breaks' to mean 'stops returning rows' rather than 'has an error message' since you didn't report an error message.

"Contains" is not "Instr".

You might try using "Instr()" instead. And remember it is case sensitive so use a lower() or an upper() on the field:

Code:
SELECT *
  FROM some_table
 WHERE instr(lower(customer_name), 'bank o') > 0

Contains is used for a different purpose. It is there to look for similarities. Its use is substantially more complicated. While Instr will return a position of the string within the first string, or a zero if the item isn't present, Contains will return a number from zero to 100 based on how good of a score the string got when looking at the field instances.

I've not worked with this feature, but...
what is going on is that when a term is used frequently in a dataset the system considers it noise. So if you had a set of articles dedicated to chemistry the system would ignore extremely common terms as noise, such as 'chemical' or 'reaction'. In the case of 'Bank Of' you are seeing enough records that don't have 'Bank Of' in them to get a score. But there are so many that have 'Bank' that when you use 'Bank O' the system is ignoring 'O' and testing against 'Bank' and considering it noise.

Just to see more of what is going on you could try this and see what it does:

Code:
SELECT SCORE(1),
instr(lower(customer_name), 'bank o') position, 
s.*
  FROM some_table s
 WHERE CONTAINS(s.customer_name, 'Bank Of') > 0

The score function should, in tandem with the Contains function, give you a number indicating what score got returned by the contains function. Theoretically.

But the position column should return a 1 for all or almost all the names. The ones that give a higher number might get you thinking about your data too.

If you realy want to dig into it, you want to look at Salton's formula, which is what they are using to score.

A copy is here:

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top