SQL 2008 express - slower query than on SQL 2000 1

POSAPAY · Jun 13, 2013

I've got a Classic ASP vbscript application that I'm testing on two new Windows 2008 R2 servers, with one running SQL 2000, and the other running SQL 2008 express

The SQL 2000 seems to run it just fine.
But the SQL 2008 express times out. I set in the script the query timeout to be 0 (disabled), and now my script times out.
But instead of putting a band-aid on it, I'd rather figure out why the 2008 express is taking forever to process it.
Any ideas anyone?

djj55 · Jun 13, 2013

Do both 2000 and 2008 have the same indexes?

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!

gmmastros · Jun 13, 2013

I would recommend that you recreate all the indexes. This is especially true if you just recently migrated the database to sql 2008.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 13, 2013

Please forgive me for not being very knowledgeable on SQL.
I wasn't sure how to reindex, so I looke dit up and followed method 1 from this link:
Reindex methods

Since I still didn't see any improvement, and my script was still timing out, I looked around and changed the database's compatibility level from SQL 2000 to SQL 2008(100).
Now it seems to complete the query and present the results.

But I'm still not sure if I'm doing it right, and it still takes a while to get the answer back.
Any pointers on re-indexing and other options would be appreciated!

POSAPAY · Jun 13, 2013

BTW, on SQL 2000 and 2008. I took from an older server with SQL 2000 the database, detached and simply just attached at the new location. Fixed user accounts and that's it.

POSAPAY · Jun 13, 2013

Anyone with ideas/tips or help on efficient indexing?

gmmastros · Jun 13, 2013

Can you post the query?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 13, 2013

Code:

SELECT TOP 101 tblresumes.ResumeID, tblresumes.ResumeEncID, tblresumes.DesiredOccupation, tblresumes.City, tblresumes.State, tblresumes.DateAvailable, 
tblresumes.FirstName, tblresumes.LastName, tblresumes.Relocation, tblresumes.Skills, tblresumes.DesiredField, tblresumes.Polygraph, 
tblresumes.SecurityClearance, tblresumes.FullTime, tblresumes.PartTime, tblresumes.WorkRemote, tblresumes.EmploymentStatus, 
tblresumes.Citizenship, tblresumes.DateModified, tblresumes.DateCreated, tblresumes.active, tblresumes.activated  
FROM tblresumes 
WHERE ((tblresumes.active Like 't') and (tblresumes.activated Like 't')) and ((tblresumes.DesiredOccupation Like '%engineer%' ) or 
(tblresumes.Skills Like '%engineer%' ) or (tblresumes.DesiredField Like '%engineer%' )) and (tblresumes.State Like '%CA%') 
ORDER BY DateModified DESC

gmmastros · Jun 13, 2013

Can you post the results of this query?

Code:

sp_helpindex 'tblresumes'

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 13, 2013

Code:

index_name	index_description	index_keys
IX_tblresumes	nonclustered located on PRIMARY	DateModified(-), DesiredOccupation, FullTime, PartTime, State
PK_tblResumes	nonclustered, unique, primary key located on PRIMARY	ResumeEncID

gmmastros · Jun 14, 2013

My guess.... Since you where clause isn't able to use any indexes to speed up the query, your query was probably relying on statistics to help speed this up. I also suspect that the statistics on the SQL2000 version are different than the statistics on the 2008 version.

I find it a little strange that the primary key is nonclustered. Normally it is. In fact, I would encourage you to change it to a clustered index. This will likely help speed up your query.

This is a difficult query to improve the performance for because you have a lot of like comparisons which are difficult to improve. You also have a couple of other columns (active and activated) that are probably true/false columns (guessing here).

If it's true that most of the data in this table does not have 't' for active and activated, then you will probably see good performance improvements by adding this index.

Code:

CREATE NONCLUSTERED 
INDEX  idx_ActiveResumes
On     tblresumes(ResumeEncId)
Where  tblresumes.active = 't'
       and tblresumes.activated = 't'

That may be all you need to do without changing your original query. However, I would encourage you to also try this query, which should produce identical results, but may (or may not) be any faster.

Code:

SELECT	TOP 101 
        tblresumes.ResumeID, 
        tblresumes.ResumeEncID, 
        tblresumes.DesiredOccupation, 
        tblresumes.City, 
        tblresumes.State, 
        tblresumes.DateAvailable, 
        tblresumes.FirstName, 
        tblresumes.LastName, 
        tblresumes.Relocation, 
        tblresumes.Skills, 
        tblresumes.DesiredField, 
        tblresumes.Polygraph, 
        tblresumes.SecurityClearance, 
        tblresumes.FullTime, 
        tblresumes.PartTime, 
        tblresumes.WorkRemote, 
        tblresumes.EmploymentStatus, 
        tblresumes.Citizenship, 
        tblresumes.DateModified, 
        tblresumes.DateCreated, 
        tblresumes.active, 
        tblresumes.activated  
FROM    tblresumes
        Inner Join (
          Select ResumeEncId
          From   tblresumes
          Where  tblresumes.active = 't'
	         and tblresumes.activated = 't'
        ) As ActiveResumes
          On tblresumes.ResumeEncId = ActiveResumes.ResumeEncId
WHERE	tblresumes.State Like '%CA%'
        And (tblresumes.DesiredOccupation Like '%engineer%'
             or tblresumes.Skills Like '%engineer%'
             or tblresumes.DesiredField Like '%engineer%'
        ) 
ORDER BY DateModified DESC

This query should use the new index create to quickly narrow down the result set prior to applying the additional where clause criteria that has a lot of like comparisons. I suspect this will only make a difference if a small subset of rows match active = 't' and activated = 't'.

If this does not improve the performance, then I would encourage you to drop the index. Indexes are great for speeding things up, but having additional indexes on a table that are never used is not helpful.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 17, 2013

Hi George,
Thanks for the tips.
I tried removing all but the primary and changed primary to clustered.
Also added your suggested index.

There wasn't much speed improvement. After login, it runs up to 10 seconds long, subsequent query soon after can be as quick as 6.125 seconds.

The active and activated are T/F (single char... probably should have been type bit at design time)
Most records are active = t and activated = t (very few are false / inactive)
Got over 100,000 records

Looking through the design of the table, only the variable Skills is data type text. Most others are varchar(50) or varchar(100)

Obviously the point of searching through resumes is to find key words that an employer is looking for in a candidate. So basically searching as many fields as possible is the key for helpfulness of the results.
I am not sure what would be the ideal way of searching through a large amount of data quickly by keywords in multiple columns.

In other words, if the database grows a lot, what should I consider as first steps in keeping it functional and fast?

BTW, a side issue is that I'm running it on a VPS windows 2008 R2 server with only 2GB of memory. Budget constraints!

Once Windows and SQL 2008 express with this large table is loaded up, it is using 1.8GB of memory.
Any suggestions to keep an SQL query going fast and be efficient on memory?

Is resume table query is the bottleneck. it actually peaks the processor to 100% for 6-8 seconds with each time the query is run.
Any thoughts or ideas would be most appreciated!
Thanks,
-Peter

gmmastros · Jun 17, 2013

I would suggest that you implement some sort of "Keyword" system.

There is probably a finite number of "words" that users will search on. It's probably a shorter list than you think.

What I would do is add another table named "ResumeKeywords". This table would store the unique identifier for each resume, and a list of keywords, where each row is a single word. Then, when a resume is added or updated, you repopulate the ResumeKeywords for that particular resume.

You would then add an index in the ResumeKeywords table For (Keyword, ResumeId) and join that with your real resume table. This process should scale up a lot while giving you really good performance.

You could also normalize this one step further by keeping a list of Keywords with an int primary key value and then have an association table.

For example:

Code:

Keywords
KeywordId  Word
---------  ----------
1          Engineer
2          Technician
3          Software
4          Hardware
5          Microcontroller

Code:

ResumeKeywords
ResumeID   KeywordId
---------  ---------
1          4
1          2
2          3
2          1
3          4
3          1

This structure will be ridiculously fast while also giving you more flexibility. For example, if users search on "Hardware Engineer". You would split up their search in to individual words and then search on each word. ResumeId = 1 would match on "Hardware". ResumeId = 2 would match on "Engineer", and ResumeId = 3 would match on "Hardware" and "Engineer". You could count the number of matches and present the list ordered by number of matches so that the most relevant matches are at the top of the list.

The nice thing about this approach is that each keyword/resume match only requires 8 bytes of storage, which is super efficient.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 17, 2013

George,
Thanks! Those are really great ideas!

The current search form gives a couple of options to search on:
Keywords (one or more words, and using logic such as + means must have it and - meaning excluding it.
Also from a drop-down selection to search All of the words, or Any of the words, or Exact Phrase.

Other options include:
Full time, or Part time, or both
Searching Brief, detailed or none of the description
City and/or state

And obviously the user can select what field to sort by in ascending or descending order, such as date last modified, date created, or alphabetically by resume title.

With this complexity, I'm not sure if forcing the keyword structure would still work.
The only efficiency feature is that the user must select a state... so each search is always limited to a single state.

I was wondering if cutting search of text data type fields out, or having a particular column data type might benefit the speed of a query. Any thoughts?

I am thinking about your keyword table idea...I'll probably try it on a development box. I should track the keywords users use and perhaps offer candidates to select per resume their top 5 keywords.
While it will probably be faster, it may not offer as much flexibility of possible candidates per employers...but obviously too many results is also not helpful.

So I'm just wondering on the better data types to use...thoughts?

POSAPAY · Jun 18, 2013

Any thoughts on my last comment?
Any particular data types for a column to be used for queries with "LIKE" comparison?

I am still not sure what is the best index setup..been reading more on it and trying stuff, but still couldn't get the response under 5 seconds.

gmmastros · Jun 19, 2013

The changing the data type won't help. I also doubt that indexes will help much either.

Think of an index like an old fashioned telephone book. Telephone books are great for finding phone numbers because everyone knows that the data is sorted by last name and then first name. If you want to find the phone number for George Washington, you flip to the back of the book and quickly find the W's, then then Wa's and so on until you find the entry you're looking for. In DB terms, this is called a seek.

Now, imagine you wanted to find all the phone numbers for people with a first name of George. You would need to examine every entry to see if it qualifies. In DB terms, this is called a scan.

With you write a where clause with a like search where you are looking anywhere within the string, you have to scan every entry.

Ex:

Code:

Select * 
From   PhoneBook
Where  Name Like '[!]%[/!]George%'

Since you have a percent at the beginning of the search string, you need to scan all the data (think slow).

There is only one way that you can make a like search fast. It can only occur when you start your search from the beginning, like this:

Code:

Select *
From   PhoneBook
Where  Name Like 'Washin%'

If there is an index on the name column, then SQL Server can perform a seek on the data instead of a scan, which will be much faster.

One thing you could do to speed up your query is to index the state column and then change the query to:

Code:

and (tblresumes.State [!]=[/!] 'CA')

This will allow the query to narrow down the search to just those resumes with state = 'CA'. Now... if most of your rows have state = 'CA', then this will probably not help either, but it's certainly worth a try.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

POSAPAY · Jun 19, 2013

Good point! I'll update my query to not use LIKE and percent on state... as roughly less then one tenth of the database has "CA" for state.

Thanks for the explanations regarding index... I am beginning to understand it better.

I'll see if I can also add couple of other limiting items to be required... "Full Time" vs. "Part Time" candidates, and "Citizen","Permanent Resident" or "non-citizen/non-perm.resident"

Should I build an index for this separately?
State Desc
Full Time Asc
Part Time Asc
LegalStatus Asc

Is it worth to create a non clustered index as such? AM I doing it right?

gmmastros · Jun 19, 2013

I would encourage you to do a little research regarding "Index Selectivity".

Basically, SQL Server can decide to ignore certain indexes if the data is not selective enough. 1/10 of your data has state='CA', which is good.

I'm guessing FullTime only contains t or f. This is NOT selective.

Think of it this way...

Suppose you had a table of people that included every person living in the united states. Then you write a query like this...

Code:

Select Name
From   People
Where  gender = 'female'

In most cases, an index would NOT be used even if you had an index on gender because approximately 1/2 of your data would satisfy the index requirements. Instead, it would actually be faster to scan the table data to determine which rows to include.

That being said, you could still create an index on State, but INCLUDE the other columns. Ex:

Code:

Create nonclustered Index idx_Resume_State 
On tblResume(State) 
Include(FullTime, PartTime, LegalStatus)

The include columns will cause the index data to include those columns, but it will not affect the sorting of the data in the column.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL 2008 express - slower query than on SQL 2000 1

IS-IT--Management

Programmer

Programmer

IS-IT--Management

IS-IT--Management

IS-IT--Management

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Programmer

IS-IT--Management

IS-IT--Management

Programmer

IS-IT--Management

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor