Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SOUNDEX in TEXT field 1

Status
Not open for further replies.

tinapa

Technical User
Nov 12, 2008
81
GB
Hi guys, I am experimenting on the SOUNDEX feature of SQL Server 2K and came across with a roadblock.

I have this code and it's not returning a record but I've checked that there's a content in the TEXT data type field called Description the word "alcohol"(without the quotes).

SELECT myTable.Drinks
FROM myTable
WHERE
SOUNDEX('alcohol') = SOUNDEX(CONVERT(varchar, Description))

When I tried this code and modified it a bit and used a varchar field, it returned something. The DescriptionVarchar field also has "alcohol" in some of its rows.

SELECT myTable.Drinks
FROM myTable
WHERE
SOUNDEX('alcohol') = SOUNDEX(DescriptionVarchar)

Am I missing something? I have searched the internet and have not been successful yet in finding some clues.

Any help is appreciated.

Thanks guys.


 
Am I missing something?

Yes. Of course you are (or the functionality would work). The problem is not what you expect it to be, though. The problem is with your convert function. When you convert to varchar without specifying a size, it converts to 30 characters.

Ex:

Code:
Declare @Test VarChar(1000)

Set @Test = '123456789012345678901234567890ABC...XYZ'

Select Convert(varchar, @Test)

When you run the previous code, you will NOT see ABC...XYZ because the data is truncated at 30 characters.

to fix your problem....

Code:
SELECT myTable.Drinks
FROM myTable
WHERE
SOUNDEX('alcohol') = SOUNDEX(CONVERT(varchar[!](8000)[/!], Description))

Also.... you should be aware of how the Soundex function works. Basically, it creates a 4 character code for each word. It uses the first letter, and then a 3 digit numeric code. The problem is, it looks at the entire string and creates a soundex for it.

Ex:

Code:
Select Soundex('This drink contains alcohol')
Select Soundex('This')
The soundex for the previous phrase is the same as the soundex of 'This'.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
You're missing size of varchar in your first query.
 
Hi guys, I have specified the "(8000)" bit, but I am still not returning any records.
 
Can you show the data in the text column that contains the word alcohol?



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Perhaps what you really need is full-text index and use contains instead of soundex? I mean if the filed contains something like:
"beer|wine|rum|other alcohols" then the soundex value for that wouldn't be close to the soundex for "alcohol"

"NOTHING is more important in a database than integrity." ESquared
 
Let me say it again....

Soundex will only look at the first word in a phrase. So, "This drink contains alcohol" would return the same Soundex as "This". Also, since it looks for words, things like "non-alcoholic" will produce a different soundex.

What you could do is create a better soundex function that allows you to search each word within a string.

First, what does a Soundex code look like? Well... let's find out. Run the following code in a query window.

Code:
Declare @Temp Table(Data VarChar(100))

Insert Into @Temp Values('This')
Insert Into @Temp Values('Drink')
Insert Into @Temp Values('Contains')
Insert Into @Temp Values('Alcohol')
Insert Into @Temp Values('This drink contains alcohol')

Select Data, Soundex(Data) As SoundexCode
From @Temp

You will see...

[tt][blue]
Data SoundexCode
---------------------------- -----------
This T200
Drink D652
Contains C535
Alcohol A424
This drink contains alcohol T200
[/blue][/tt]

The problem is the phrase (multiple words). If the last one ("This drink contains alcohol") would create a soundex code making up each word ("T200 D652 C535 A424") then we could do a nice like comparison. Unfortunately, Soundex doesn't do this, but we could write our own function, like this:

Code:
Create Function dbo.SoundsLike(@OriginalText VarChar(8000))
Returns VarChar(8000)
As
Begin

  Declare @Output VarChar(8000)
  Set @Output = ''

  While Not @OriginalText Is NULL
    Begin
      Set @Output = @Output + Soundex(Left(@OriginalText, PatIndex('%[ ]%', @OriginalText + ' ')-1))
      Set @OriginalText = Right(@OriginalText, NullIf(Len(@OriginalText) - PatIndex('%[ ]%', @OriginalText + ' '), -1))
    End

  Return @Output
End

Now, when we use our function on the same data, we get...

[tt][blue]
Data SoundexCode SoundsLikeCode
---------------------------- ----------- --------------
This T200 T200
Drink D652 D652
Contains C535 C535
Alcohol A424 A424
This drink contains alcohol T200 T200D652C535A424
[/blue][/tt]

Now, with the new function, we can get your query to work.

Code:
Declare @Temp Table(Data VarChar(100))

Insert Into @Temp Values('This')
Insert Into @Temp Values('Drink')
Insert Into @Temp Values('Contains')
Insert Into @Temp Values('Alcohol')
Insert Into @Temp Values('This drink contains alcohol')

Select Data, 
       Soundex(Data) As SoundexCode,
       dbo.SoundsLike(Data) As SoundsLikeCode
From   @Temp
Where  dbo.SoundsLike(Data) Like '%' + Soundex('Alcohol') + '%'

This query returns the correct data.

Don't expect this query to run very fast. It's non-sargable, contains a user defined function, and operates on (potentially) large strings.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
hi gmmastros, the function worked! thank you very much.

there are not going to loads of rows in the table.

once again many thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top