Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Query for Best Leading Match 2

Status
Not open for further replies.

YOUNGCODGER

Programmer
Jul 9, 2007
102
GB

Hello,

I have a table with various telephone destinations. What I am trying to achieve is a lookup for the best match. For instance the entries include '07' - 'Personal numbers, messaging and mobiles', '077' - 'Mobile phone', '07702' - 'O2', etc. Whilst I can write a lookup to match for 'n' characters is there an easy way to match on a variable number of characters. I have coded a tree search in C++ in the past but was hoping to avoid this.

Many thanks,

YoungCodger [bigglasses]
 
sorry youncoder, your question is not clear. Some sample data can help us in understanding this better.

but from what i get try using the LIKE clause...

Known is handfull, Unknown is worldfull
 

Hi Vbkris,

The examples are just a few of many thousand entries:-

'07' - 'Personal numbers, messaging and mobiles'
'077' -
'07702' - 'O2'
,etc.

I then have tens of thousands of telephone numbers that I wish to categorise. For instance using the above data '0751234xxx' would return 'Personal numbers, messaging and mobiles', '07704567yyy' would return 'Mobile phone' and '07702987zzz' would return 'O2'. The last number is a match for all three but what I want to return is the best leading match to the telephone number. Hopefully this makes it clearer what I am trying to achieve.

Reagrds,

YoungCodger [bigglasses]

"The truth about action must be known and the truth of inaction also must be known; even so the truth about prohibited action must be known. For mysterious are the ways of action
 
wow,

this is a bit challenging. You have to do this when you join the master table with the data table.

The only issue here is that 07 and 077 will be a little bit tricky (if a LIKE clause is used). Will try to create a script for this and send it across to you...

Known is handfull, Unknown is worldfull
 
Is this what you are looking for?

I created a table variable to store your sample data. When you implement this in your database, you will need to remove the table variable stuff and replace with your actual table names.

Code:
[COLOR=blue]Declare[/color] @Temp [COLOR=blue]Table[/color](Data [COLOR=blue]VarChar[/color](10), Description [COLOR=blue]VarChar[/color](50))

[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @Temp [COLOR=blue]Values[/color]([COLOR=red]'07'[/color]   ,[COLOR=red]'Personal numbers, messaging and mobiles'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @Temp [COLOR=blue]Values[/color]([COLOR=red]'077'[/color]  ,[COLOR=red]'Mobile phone'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @Temp [COLOR=blue]Values[/color]([COLOR=red]'07702'[/color],[COLOR=red]'O2'[/color])

[COLOR=blue]Declare[/color] @Search [COLOR=blue]VarChar[/color](10)
[COLOR=blue]Set[/color] @Search = [COLOR=red]'07702987zzz'[/color]
[COLOR=blue]Set[/color] @Search = [COLOR=red]'07704567yyy'[/color]
[COLOR=blue]Set[/color] @Search = [COLOR=red]'0751234xxx'[/color]

[COLOR=green]-- Query starts here
[/color][COLOR=blue]Select[/color] 	Top 1 *
[COLOR=blue]From[/color] 	@Temp
[COLOR=blue]Where[/color]	[COLOR=#FF00FF]CharIndex[/color](Data, @Search) > 0
[COLOR=blue]Order[/color] [COLOR=blue]By[/color] Len(Data) [COLOR=#FF00FF]Desc[/color]

Try playing around with different values for the variable @Search to see how this query works.

-George

"the screen with the little boxes in the window." - Moron
 
George, that won't work:

Code:
Declare @Temp Table(Data VarChar(10), Description VarChar(50))

Insert Into @Temp Values('07'   ,'Personal numbers, messaging and mobiles')
Insert Into @Temp Values('077'  ,'Mobile phone')
Insert Into @Temp Values('07702','O2')
Insert Into @Temp Values('234'  ,'Electric pickles')

Declare @Search VarChar(10)
Set @Search = '071234zzz'

-- Query starts here
Select   Top 1 *
From     @Temp
Where    CharIndex(Data, @Search) > 0
Order By Len(Data) Desc
That returns 'Electric pickles' instead of the right thing.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
Fruit monkeys! Here's a fix:

Code:
Declare @Temp Table(Data VarChar(10), Description VarChar(50))

Insert Into @Temp Values('07'   ,'Personal numbers, messaging and mobiles')
Insert Into @Temp Values('077'  ,'Mobile phone')
Insert Into @Temp Values('07702','O2')
Insert Into @Temp Values('234'  ,'Electric pickles')

Declare @Search VarChar(10)
Set @Search = '071234zzz'

-- Query starts here
Select   Top 1 *
From     @Temp
Where    CharIndex(Data, @Search) [!]= 1[/!]
Order By Len(Data) Desc

-George

"the screen with the little boxes in the window." - Moron
 
I was not sure if you wanted something that started with 07702 to fall into all of the categories or not, if you did not, you might be about to use the following if you have 2005:
Code:
DECLARE @Phone TABLE (PhoneNumber VARCHAR(11))
INSERT @Phone
SELECT '07702987zzz'
UNION ALL SELECT '07704567yyy'
UNION ALL SELECT '0751234xxx'

Declare @Dest Table(Data VarChar(10), Description VarChar(50))
INSERT @Dest
SELECT '07', 'Personal numbers, messaging and mobiles'
UNION ALL SELECT '077', 'Mobile phone'
UNION ALL SELECT '07702', 'O2'


SELECT
	*
FROM
	(
	SELECT
		*,
		ROW_NUMBER() OVER (PARTITION BY P.PhoneNumber ORDER BY LEN(D.Data) DESC) AS RowNum
	FROM 
		@Phone P
	INNER JOIN
		@Dest D
		ON CHARINDEX(D.Data, P.PhoneNumber) = 1
	) Temp
WHERE
	RowNum = 1
 
Fruit monkeys!
lol! Thief.

I knew you could fix it, but I thought you'd rather do it than me. And a simple fix it was, indeed.

What do you propose if you need to do a join against many rows at once where TOP 1 isn't so convenient? A correlated subquery?

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
I thought you'd get a kick out of that. [smile]

There was a thread within the last year or so. Boris had posted a response that was very clever and worked well for this situation. Of course, I'm unable to find it.

I don't think I'd use a correlated subquery. In most situations, performance suffers. I'll think about this a little, and may post back with a response.

-George

"the screen with the little boxes in the window." - Moron
 
Found it.

thread183-1292362

I don't give out too many stars, but I really learned something from Boris that day. It really was a clever way of dealing with this problem.

-George

"the screen with the little boxes in the window." - Moron
 
Here's a working example:

Code:
[COLOR=blue]Declare[/color] @TelephoneDestinations [COLOR=blue]Table[/color](Data [COLOR=blue]VarChar[/color](10) [COLOR=blue]Primary[/color] [COLOR=blue]Key[/color], Description [COLOR=blue]VarChar[/color](50))

[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneDestinations [COLOR=blue]Values[/color]([COLOR=red]'07'[/color]   ,[COLOR=red]'Personal numbers, messaging and mobiles'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneDestinations [COLOR=blue]Values[/color]([COLOR=red]'077'[/color]  ,[COLOR=red]'Mobile phone'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneDestinations [COLOR=blue]Values[/color]([COLOR=red]'07702'[/color],[COLOR=red]'O2'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneDestinations [COLOR=blue]Values[/color]([COLOR=red]'234'[/color]  ,[COLOR=red]'Electric pickles'[/color])

[COLOR=blue]Declare[/color] @TelephoneCalls [COLOR=blue]Table[/color](Search [COLOR=blue]VarChar[/color](20))
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneCalls [COLOR=blue]Values[/color]([COLOR=red]'071234zzz'[/color])
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneCalls [COLOR=blue]Values[/color]([COLOR=red]'0751234xxx'[/color]) 
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneCalls [COLOR=blue]Values[/color]([COLOR=red]'07704567yyy'[/color]) 
[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @TelephoneCalls [COLOR=blue]Values[/color]([COLOR=red]'07702987zzz'[/color])

[COLOR=green]-- Query starts here
[/color]
[COLOR=blue]Select[/color] [COLOR=blue]D[/color].*,
       Amount = [COLOR=#FF00FF]COALESCE[/color](Tbl5.Data,Tbl4.Data,Tbl3.Data,Tbl2.Data,Tbl1.Data)
[COLOR=blue]FROM[/color]   @TelephoneCalls [COLOR=blue]D[/color]
       [COLOR=#FF00FF]LEFT[/color] [COLOR=blue]JOIN[/color] @TelephoneDestinations Tbl1 
          [COLOR=blue]ON[/color] [COLOR=#FF00FF]LEFT[/color]([COLOR=blue]D[/color].Search,1) = tbl1.Data
       [COLOR=#FF00FF]LEFT[/color] [COLOR=blue]JOIN[/color] @TelephoneDestinations Tbl2
          [COLOR=blue]ON[/color] [COLOR=#FF00FF]LEFT[/color]([COLOR=blue]D[/color].Search,2) = tbl2.Data
       [COLOR=#FF00FF]LEFT[/color] [COLOR=blue]JOIN[/color] @TelephoneDestinations Tbl3
          [COLOR=blue]ON[/color] [COLOR=#FF00FF]LEFT[/color]([COLOR=blue]D[/color].Search,3) = tbl3.Data
       [COLOR=#FF00FF]LEFT[/color] [COLOR=blue]JOIN[/color] @TelephoneDestinations Tbl4
          [COLOR=blue]ON[/color] [COLOR=#FF00FF]LEFT[/color]([COLOR=blue]D[/color].Search,4) = tbl4.Data
       [COLOR=#FF00FF]LEFT[/color] [COLOR=blue]JOIN[/color] @TelephoneDestinations Tbl5
          [COLOR=blue]ON[/color] [COLOR=#FF00FF]LEFT[/color]([COLOR=blue]D[/color].Search,5) = tbl5.Data

Those joins are not sargable, so you will end up with table/index scan. I honestly have no idea how this would perform compared to a correlated sub-query.



-George

"the screen with the little boxes in the window." - Moron
 
Here's my stab at it. I changed my table names to match George's, for everyone's convenience.
Code:
Declare @TelephoneDestinations Table(Data VarChar(10), Description VarChar(50))

Insert Into @TelephoneDestinations Values('07'   ,'Personal numbers, messaging and mobiles')
Insert Into @TelephoneDestinations Values('077'  ,'Mobile phone')
Insert Into @TelephoneDestinations Values('07702','O2')
Insert Into @TelephoneDestinations Values('234'  ,'Electric pickles')

Declare @TelephoneCalls Table(SearchString VarChar(12))
Insert into @TelephoneCalls Values('07702987zzz')
Insert into @TelephoneCalls Values('07704567yyy')
Insert into @TelephoneCalls Values('0751234xxx')
Insert into @TelephoneCalls Values('071234zzz')

-- SQL 2000
SELECT
	S.*,
	T.*
FROM
	@TelephoneCalls S
	INNER JOIN (
		SELECT
			S.SearchString,
			MaxLen = Max(Len(T.Data))
		FROM
			@TelephoneCalls S
			INNER JOIN @TelephoneDestinations T ON S.SearchString LIKE T.Data + '%'
		GROUP BY
		  S.SearchString
	) M ON S.SearchString = M.SearchString
	INNER JOIN @TelephoneDestinations T ON S.SearchString LIKE T.Data + '%' AND Len(T.Data) = M.MaxLen

--SQL 2005
;WITH Matches AS (
	SELECT
		S.SearchString,
		T.*,
		Rank = ROW_NUMBER() OVER (PARTITION BY S.SearchString ORDER BY Len(T.Data) DESC)
	FROM
		@TelephoneCalls S
		INNER JOIN @TelephoneDestinations T ON S.SearchString LIKE T.Data + '%'
)
SELECT * FROM Matches WHERE Rank = 1
George, I compared these to the left joins method. First, I put more rows in to get higher values for better comparison:

Code:
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
insert @telephonecalls select * from @telephonecalls
The execution plans appear to be of dubious value, I looked at CPU time and reads.

My SQL 2005 version kicks the pants off everything else. My SQL 2000 version is still much better than the multiple left joins:

[tt] CPU Reads Duration
E2 2000 15 3085 366
E2 20005 0 23 128
LEFTJONI 16 15365 205[/tt]

Even though the 2000 version had a higher duration, it was easier on the server, which is more important for busy databases. Duration may be more important for databases that have few clients.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
EE,

Try changing this...

[tt][blue]Declare @TelephoneDestinations Table(Data VarChar(10) [!]Primary Key[/!], Description VarChar(50))[/blue][/tt]

Then, re-run the numbers.

My thinking here is that telephone destination code is probably unique within that table. I assume this will cause all the queries to execute faster, but I wonder how they compare against each other.

-George

"the screen with the little boxes in the window." - Moron
 
Also... when you ran your test, was it all done on SQL 2K5? I wonder how the results would compare when run on a SQL 2000 engine.

-George

"the screen with the little boxes in the window." - Moron
 
Yes I missed that PK. With clustered PK now:

[tt] CPU Reads Duration
E2 2000 16 2061 276
E2 20005 15 22 119
LEFTJOIN 15 10245 223[/tt]

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
Actually, George, I fixed the LEFT JOIN query by adding some conditions:

Code:
Select D.*,
       Amount = COALESCE(Tbl5.Data,Tbl4.Data,Tbl3.Data,Tbl2.Data,Tbl1.Data)
FROM   @TelephoneCalls D
       LEFT JOIN @TelephoneDestinations Tbl1 ON LEFT(D.SearchString,1) = tbl1.Data AND Len(tbl1.Data) = 1
       LEFT JOIN @TelephoneDestinations Tbl2 ON LEFT(D.SearchString,2) = tbl2.Data AND Len(tbl1.Data) = 2
       LEFT JOIN @TelephoneDestinations Tbl3 ON LEFT(D.SearchString,3) = tbl3.Data AND Len(tbl1.Data) = 3
       LEFT JOIN @TelephoneDestinations Tbl4 ON LEFT(D.SearchString,4) = tbl4.Data AND Len(tbl1.Data) = 4
       LEFT JOIN @TelephoneDestinations Tbl5 ON LEFT(D.SearchString,5) = tbl5.Data AND Len(tbl1.Data) = 5
[tt] CPU Reads Duration
E2 2000 16 2061 276
E2 20005 15 22 119
LEFTJOIN 15 10245 223
LEFTJOIN2 0 2053 126[/tt]

Careful when reading the CPU column, you never get anything below about 15 except 0, so 0, 15, and 16 all mean effectively the same amount of time. More sample data would be needed to get more clarity on that part. The reads appear to be the best guide here. So maybe the left joins are okay (depending on how many are required... maybe more than 5 in actuality).

Oh, and something I forgot to add:

The LEFT JOIN method requires another join to get the Description, if that's what's desired. Removing Description from my two queries yields nothing really different, although you can see how the duration fluctuates and isn't such a good measure.

[tt] CPU Reads Duration
E2 2000 16 2061 291
E2 20005 0 22 127
LEFTJOIN2 0 2053 433[/tt]

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
Many thanks for your input guys. I will have to have a closer look when I get a few hours to myself (we are in a monthly accounting cycle at the moment). What I am unsure of at a quick glance is if it is reading in the 'telephone destinations' table. The 'numbers table' could be one field of various tables (not at same time) as there are numerous sources of data.

Thanks again,

Young Codger [bigglasses]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top