Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is there any way of speeding up Queries that use LIKE clauses ? 1

Status
Not open for further replies.

Connatic

Programmer
Apr 22, 2004
45
0
0
GB
I have a query (stored proc) that uses a LIKE statement on a Varchar column and it takes 6 seconds to run. As this is returning a results set to an application, this performance is unnacceptable.

Query :
Code:
SELECT m.m_mat_id, m.m_clt_code, m.m_mat_num, m.m_clt_name, 
       m_mat_desc, t.at_desc, a.add_type_id, a.add_id 
FROM matters m, addresstypes t, addresses a, mat_addr_lnk l WITH (NOLOCK) 
WHERE (m.m_clt_name LIKE '%' + @Variable + '%')
and m.m_mat_id = l.mat_id 
and a.add_id = l.add_id 
and a.add_type_id = t.at_type_id 
ORDER BY m.m_clt_code, m.m_mat_num ASC
The @Variable is obviously passed in to the stored proc!

If i take out the Wildcard character that comes before the variable it only takes about 1 second to run (which is fine), but i would really rather not do this, as users WILL Want to be able to do partial matches on both sides of the entered word.

e.g. entering 'Brown' would return ( for the Client Name Column)

1 Brown Associates
2 Brown & Jones
3 Mithell Brown Group
4 Davis, Brown & Partners

instead of

1 Brown Associates
2 Brown & Jones

So is thier any other way to speed up this type of query ??

Any help will be greatly appricated as i have spent all day looking at this and got nowhere !!!

Regards
Richard
 
Just for giggles did you try the query using "charindex" instead of LIKE? I doubt it would make a difference, and charindex might actually be slower, but who knows?

Assuming your db collation is case insensitive, the following should be identical to using like:

Code:
SELECT m.m_mat_id, m.m_clt_code, m.m_mat_num, m.m_clt_name, 
       m_mat_desc, t.at_desc, a.add_type_id, a.add_id 
FROM matters m, addresstypes t, addresses a, mat_addr_lnk l WITH (NOLOCK) 
WHERE charindex(@Variable, m.m_clt_name)>0
and m.m_mat_id = l.mat_id 
and a.add_id = l.add_id 
and a.add_type_id = t.at_type_id 
ORDER BY m.m_clt_code, m.m_mat_num ASC

 

No difference i am afraid TJRTech, still 6 seconds !!

Thanks for answering though !!

Before anybody starts mentioning FullText Searching using the CONTAINS key word, i have already tried that and yes it was significantly quicker, in fact the stored proc took approx 0 seconds to run, but UNFORTUNATLY you cannot search for PARTIAL words only full words, and i have to be able to let users search for partial words on this app, so i can't use it (gutted!).

If anybody out thier wants the code to enable Fulltext Searching fully on a database for a column of your choice, i would be happy to post it !!

Richard

 
How many records are in the table in question?

Have you considered FORCING the application to enter a short, descriptive, unique name of the client, and storing that in a column with a clustered index? You may not have this latitude, but I have seen just this type of approach in many systems when a short, sortable, descriptive and searchable "name" is needed for customers, etc.

For example see how a full name would map to (->)
short names below:

1 Brown Associates -> BrownAssoc
2 Brown & Jones -> BrownJones
3 Mitchell Brown Group -> MitchellBrownGrp
4 Davis, Brown & Partners -> DavisBrownPrts

Then, you search with strings like: Brown, Mitch, Davis....

Just a thought; but I doubt you have the ability to change at this point.

TR
 
YOUR problem is that you are using a leading % this will cause SQL to have to compare the search clause to EVERY row in the table (TABLE SCAN 100% of the time)

GET rid of the Leading % and your query can again use indexes.

You might be able to use a nonclustered index on the m_clt_name column, but this will again more likley that not get a table scan.

Best option for you would to build a covering index.
Code:
create Non Clustered index nc_cov on matters (m.m_mat_id, m.m_clt_code, m.m_mat_num, m.m_clt_name,m_mat_desc, m_clt_name)

Because every column referenced on the matters table is referenced in the index, the query optimizer will evaluate it and use it in the query.


By doing this you will be able to qery a smaller amount of data. Yes you will still need to do a table scan of the index, but it shouldn't have as many columns so should run much faster..

My 2c


Rob
 
For sure the search with %blah blah% is inefficient.
But I think that your query maybe optimized by using efficiently the joins

You have to try to use all the equalities like a.id=t.id in the ON statement of the join, leaving only the

SELECT m.m_mat_id, m.m_clt_code, m.m_mat_num, m.m_clt_name,
m_mat_desc, t.at_desc, a.add_type_id, a.add_id
FROM
(select m_mat_id, m_clt_code, m_mat_num, m_mat_desc
from matters
WHERE (m_clt_name LIKE '%' + @Variable + '%')
) as m
, addresstypes t, addresses a, mat_addr_lnk l WITH (NOLOCK)
WHERE m.m_mat_id = l.mat_id
and a.add_id = l.add_id
and a.add_type_id = t.at_type_id
ORDER BY m.m_clt_code, m.m_mat_num ASC


--------------------------------------------------
[highlight]Django[/highlight] [thumbsup]
bug exterminator
tips'n tricks addict
 
In the past I have found that when confronted by a nasty LIKE or IN (subselect) query, they can be optimised by first doing the LIKE/IN query and putting the results in a temproray table. You can the use the temporary table in an inner join (depending on your data) and this will usually be quicker/more efficient.

Gavin
 
Gavin, I'm surprised the optimizer doesn't do the like (in this case) prior to the joins. I thought the general rule for the optimizer was to reduce the size of any table, if possible, prior to performing the joins.
Richard, can you test Gavin's theory and let us know the result. His suggestion, if proved correct, has tremendous value.
-Karl
 

Thanks All for replying,

Only seen all the posts this morning as due to severe tiredness i slept all yesterday evening !

Unfortunatly TJRTech, i do not have the latitude to create a "short, sortable, descriptive and searchable "name"" as you put it.

This is an internal system i am working on and the users will need (and demand, i work for a firm of solicitors) to be able to search, by typing in partial names.

Thanks NoCoolHandle, i did not know that if you used a preceding % that the indexing wasn't used!

Could you tell me more about this Covering Index stuff, will it effect Insert & Updating speed ???


Tried TekTipDjango's code, but once again it runs at exactly 6 seconds !! (thanks anyway)

And Finally Gavinhuet, sounds interesting, i think i will try that out, gotta check out all the options after all !

Oh, yes There are around 200,000 records in the Matters Table !





 
Six seconds for 200,000 records does sound out of line. I have a similiar table that takes a split second to do a full table scan. Have you tried just running the like clause on the matters table in the QA. That's a first step toward Gavin's idea. If it takes more than a second, I would suspect that there's something wrong with the server. Maybe not enough ram, overworked, slow processor, over-heating processor, etc. Also try rebooting the server, I've noticed a dramatic improvement in execution times after a reboot.
-Karl
 

Cheers Donutman, have found out (new job and all that) that the Live server has much more ram than the Development server, and dual processors, so it should run faster there.

I have just gotta convince my boss to let me do some testing on the live server (obviously when its not being used!)!

-Richard
 
A final thought Richard. I think a full table scan of an index table is faster, so you might want to make m_clt_name an index. I'd be very interested in hearing if that makes a difference on you "six-second" development server.
-Karl
 

Hi Karl,

m_clt_name is already indexed, thats why i have been so annoyed that it takes 6 seconds to run!!

Cheers
Richard
 
AFAIK this case requires non-clustered index scan and bookmark lookup. Because LIKE '%blah' is unpredictable and bookmark lookup can be I/O heavy, engine may decide not to use NC index. Try forcing scan with SELECT... FROM table WITH( INDEX(indexname) )...
 
<snip> Six seconds for 200,000 records does sound out of line</snip>

What type of system are they useing to believe that?

I have aquery that searchs 20,000,000 records in less than .01 seconds on a pentium 4 600 with 372MB ram.

<snip>
Could you tell me more about this Covering Index stuff, will it effect Insert & Updating speed ???
</snip>

Yes there will be insert,delete and possibly update performance impact, but my guess is that this will be minimal.

The key to indexing is dont' go overboard.. More than about 15 to 20 won't show much of a preformance improvemnet as at some time the optimizer will just plain old give up evaluating indexes and do a table scan.

What a CoveringIndex is (and it will always be evaluated even in a '%ksdk%' situation) is a minature version of the table. NonClustered indexes by default must have each row in them that exists in the table. Because you have "filtered" out the columns that the query doesn't reference, it is like the temp table that someone else sugested. Howerver it doesn't need to ceate it every time you run the query.

Order is important/critical. Without knowing your data structures it would be difficult to know exactly which columns you should have first. But, if you have one column that has excellent selectivity (very few duplicates) it might be a great candidate. Other good choices would be the Join Column as you will then have the index "read minature table" presorted to find the matching records.

HTH


Rob
 
Hello again vongrunt & NoCoolHandle,

Firstly, i have tried using Index hints, but this did nothing for the performance.

I also tried NoCoolHandle's method of using a Covering index, once again no improvment.

Let me give a little description of the tables involved:

4 tables

1, Address Table
2, Matters Table
3, Link table, between the two !
4, Address Type (e.g. Correspondence)

The reason for this structure is that 1 Client can have many matters, which may have 1 or many addresses

The matters table has around 200,000 records, the Address table has around 9000 records, and the link table also has around 9000 records.

The link table has only 3 columns. ID / MatterID / AddressID

The matters table has around 17 columns & 21 for the Address Table.

The matters table has 3 indexes on it on id, clt_name & a Ref column that is also used

Hope this info helps, am going home now, will log on later to see if anybody has had any thoughts

Cheers All

Richard

 
Do your related tables have FK constraints, and are the FKs joined on PKs?

If so, I can't understand why the joins, and the LIKE operations on 200K records would take 6 seconds, even on a low-end development server.

Try the query from Query Analyzer, doing the following:

1. Try it once, note the execution time.

2. Try it again, immediately, does it still take 6 seconds?

3. Turn on Execution Plan and make sure it is using indexes, and the only table scan is for the LIKE operation.


TR
 
I realise you're using 'with (nolock)' in the statement, what for?

--------------------------------------------------
[highlight]Django[/highlight] [thumbsup]
bug exterminator
tips'n tricks addict
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top