Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Why is SQL Select and Update sooooooo slow?

Status
Not open for further replies.

stanlyn

Programmer
Sep 3, 2003
945
US
Hi,

Using SQL Server 2008r2 on an I7 quad processor machine...

This code is so slow, as it took 6 minutes and 38 seconds to do.

It also caused all other inserts and updates to stop for the whole duration...

This is insane as VFP takes 1-2 seconds...

The table has 2.6 million records total, while there is about 325,000 records that matches the (county_id = 'KY066') test, and less than a 100 records that matches the (_Ocr_Text is not null) test.

The full text index includes this _Ocr_text field. All indexes for this table has been rebuilt.

Why is everything else locked as this is only a select statement returning only a 5 char field?


use [Library]
go
select top (100) county_id from Assets.Page
WHERE county_id = 'KY066'
and _Ocr_Text is not null
go


If I take the null test out, results are instant...
The (_ocr_text) field is a vartext memo field.

Any ideas?

Thanks, Stanley
 
6 minutes is an eternity for such a small table. Sounds to me like there may be some blocking going on. Meaning... something else is in the process of insert, update, or deleting from the table while you are running your select query.

9 times out of 10, when things are slow like this, it's because you are missing indexes. Can you run the following query and post the results?

Code:
sp_helpindex 'Assets.Page'

This will show you what indexes are on that table.

Lastly, you said, "vartext memo field". SQL Server does not have a vartext memo field. I suppose you mean varchar(max)? Is this correct?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hi George,

Indexes are generally the answer to speed, but how do you index the problem field here which is a text field named _ocr_text that is particapating in a full text index?

It is currently of type: memo and was wondering if changing it to type varchar(max) would be better. And can I get rid of the whole null thing for this fulltext indexed field as it is not allowing me to uncheck the "allow nulls" column in the designer. Wonder whats going on with that?

There are no other inserts, updates, or deletes going on here. If I take out the test on whether _ocr_text is null or not, I get immediate results.

I'm trying to create a FAST query that:
1. find records where _ocr_text IS NULL or not,
2. or ''''''''''''''''''''''''' is empty or not,
3. or ''''''''''''''''''''''''' ='' or !=''
I can change the table structure to whatever is best for performance...

Here is the output from exec sp_helpindex 'Assets.Page'
Index_name Index_description Index_keys
IX_Page_BookNum nonclustered located on PRIMARY BOOK_NUM
IX_Page_CountyId nonclustered located on PRIMARY county_id
IX_Page_CountyIDPageKey clustered located on PRIMARY county_id, PAGE_KEY
IX_Page_LibSort nonclustered located on PRIMARY LIB_SORT
IX_Page_PageDate nonclustered located on PRIMARY Page_Date
IX_Page_PageNum nonclustered located on PRIMARY Page_Num
IX_Page_PageType nonclustered located on PRIMARY Page_Type
IX_Page_Password nonclustered located on PRIMARY SECURE_KEY
PK__Pages__D51AF5F47F60ED59 nonclustered, unique, primary key located on PRIMARY identity_id

Screenshot 18 below shows the reulting command, time and results for the query with the _ocr_text removed that takes 00:00:00 to run. Screenshot 19 below shows it using the _ocr_text evaluation and it takes 00:11:47 to run...

Thanks, Stanley

 
 http://www.mediafire.com/?6bluvupz4u9qndy
I don't know much about full text indexes, so take all of my advice with a "grain of salt".

Basically, there should be no good reason to ever use Text (or nText) columns anymore. The problem is, there are certain functions that were built in to SQL Server for handling text columns. If you change your column's data type to varchar(max), you may inadvertently break some of your other code.

However, if you can change the data type, I strongly recommend it.

Now... for your particularly query, there are several things you can do.

The first, and simplest, may not improve your performance, but it's worth a try.
Code:
select top (100) county_id from Assets.Page  
WHERE county_id = 'KY066'  
    and [!]_Ocr_Text > ''[/!]

My query will not return rows that have an empty string or null. Your query would only return rows that are not null (but could be an empty string).

I doubt this will make much difference in performance, but it's worth a try.


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
There is another way you can get better performance out of this query.

You could create a computed column that determines whether there is any data in the ocr text column. You could then include this computed column in an index. Something like this.

First, create the computed column with this:

Code:
Alter 
Table  Assets.Page 
Add    HasOcrText As (Case When SubString(_ocr_text, 1, 8000) > '' Then Convert(Bit, 1) Else Convert(Bit, 0) End )

Then create a new index like this:
Code:
Create Index idx_Page_HasOcrText On Assets.Page(county_id, HasOcrText)

Finally, change your query to:
Code:
select top (100) county_id 
from   Assets.Page  
WHERE  county_id = 'KY066'  
       and HasOcrText = 1

If this doesn't return your data instantly, I'll eat my left shoe.


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,

When trying:

select top (100) county_id from Assets.Page
WHERE county_id = 'KY066'
and _Ocr_Text > ''

I get "The data types text and varchar are incompatible in the greater than operator." error...

Ok, the computed column you mention would be self maintaining when values change as in adding data to that field or emptying it?

I can see this being very fast... the fastest possible!,

However, I would'nt want to create these computed indexes everytime I done a quick lookup for a null, not null, empty or not on a varchar(max) or text field. Can you think of a way that yields reasonable speed on that query for general ad-hoc quering, as I need to know the best way to layout the structure for a query as simple as this one with reasonable performance. However, if there is no other way, then I'll create these computed indexes on all columns in my control, but it just seams that there has to be reasonable performance related to doing this.

Also, how would implement the computed column that would lend itself to answering both the null and empty values? What would your case statement look like if we changed it from bit to char(2) with possible values as NN=NotNull, IS=IsNull, NE=NotEmpty, and IE=IsEmpty? At the expense of an additional character, we could answer all those questions in a query... correct? We can change the column name to OcrTextStatus reflect any of the 4 possible values.

Its fine to create the computed index on theses field types if the user UI or reports needed this frequently, such as a report that listed all items that needed the ocr text validated.

George, another related question, if the record is actually null, is the word null anywhere in the record, or is there some magical bit set that means null? I suspect a magical bit...

Thanks, Stanley

BTW, This query took 08:59 minutes to do...
and SubString(_ocr_text, 1, 2) > ''


 
The computed column would be self maintaining, so would the index.

There's a flaw with your logic regarding the status. It's possible to be not null and empty, or not null and not empty at the same time. Basically, you only need 3 status values. NULL, Empty, or Not Empty.

Code:
Alter 
Table  Assets.Page 
Add    OcrTextStatus As (
           Case When _ocr_text is NULL Then 'N'
                When SubString(_ocr_text, 1, 8000) > '' Then 'D'
                Else 'E' End)

N = NULL
D = Data
E = Empty

Code:
Create Index idx_Page_HasOcrText On Assets.Page(county_id, OcrTextStatus)

Your query
Code:
select top (100) county_id 
from   Assets.Page  
WHERE  county_id = 'KY066'  
       and OcrTextStatus <> 'N'


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Still too slow when evaluating OcrTextStatus = 'N' on a table with 2.6 million records total, but 760,000 records that matches the county)id column. In this case there are not any null records and it appears it is making a trip thru the entire table. I cancelled it after 4 minutes...

If I change it to OcrTextStatus = 'D', then it takes 6 seconds, as almost all of the 760,000 records matching the county_id has data in them and we are stopping at 1000...

I'm currently changing the computed field to be persistent, as it was not originally. If the computed field's value is not persistent, would'nt it take a while to evaluate that many records on the fly?

Thanks, Stanley

 
If the computed field's value is not persistent, would'nt it take a while to evaluate that many records on the fly?

Yes, but....

I suggested that you create an index on the computed column. This means that the data would be persisted in the index. And, unless you're not showing the entire query, all of the data you need is in the index. If you look at the execution plan, you'll probably see an index scan, but you should not see a book mark lookup.

Did you create the index like I suggested?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Yes, I did all that you suggested on the 1st go around... I did however change your

Create Index idx_Page_HasOcrText On Assets.Page(county_id, OcrTextStatus)
to
Create Index idx_Page_HasOcrText On Assets.Page(OcrTextStatus)

I removed the county_id from the index as I only trying to speedup queries using the ocr_text field, and thought it was irrevelant, unless you needed the county_id in the results. I used it in my select statement just to show something happened...

and later I changed
SubString(_ocr_text, 1, 8000) > ''
to
SubString(_ocr_text, 1, 10) > ''

I'm still waiting to test... hoping for even faster execution...

How much speed increase can I expect to see if I enter single user mode when making table changes to large tables. The system has been applying some table changes I've made 9+ hours ago and it still isn't done...

I come from vfp where everything is about speed and so far sql server is natively slower in almost all areas... simple queries like seeing all non null records taking 10 minutes, a table restructure taking 9 hours.

I know there is ways around these speed issues and that knowledge is what I'm seeking.

Thanks George for sharing...

Stanley

 
There is no doubt in my mind that SQL Server is faster then VFP. Since you're not seeing it, there must be something else going on here that we haven't identified.

I had suggested an index on County_ID, OcrTextStatus because that index would completely cover the query you are running.

When an index covers a query, it means that all of the data needed by the query can be found in the index. Given that this table is pretty large, if you don't need to use any table data, you are likely going to significantly speed up the query because you are searching a much smaller table (actually the index).

Think about it this way, each non-clustered index contains the data that makes up the key, and a pointer back to the row in the actual table. CountyId appears to be a 5 character string, the OcrTextStatus column would be a single character. Add in the row pointer and each row of the index would be approximately 22 bytes. With 8060 bytes per page 8K page, you should be able to get approximately 366 rows per page. That's approximately 7100 8k pages. That means this index should only be 58 megabytes.

This is one of those situations where I wish I could get my hands on the database because I am certain there are huge performance gains to be had.

Did you recently move the data from VFP to SQL Server? If so, how did you actually move the data?

Can you run the following query and post the results?

[tt]sp_helpdb 'YourDatabaseNameHere'[/tt]

Also, I would encourage you to run SQLCop. This is a free SQL Server utility that I created a couple years ago. This application can analyze various aspects of your database and configuration. Please download and run this application. Once you log in to your database, expand indexes -> fragmented indexes, please also expand all of the nodes under configuration. Once you have done this, click the "i" button at the top, and a report will be created for you. Copy/paste the contents of the report and post it here.

If you do all of this, it will go a long way towards diagnosing whatever problem is occurring in your DB.

The link to download SQLCop can be found in my signature.



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Still waiting for table restructure finish... Its taking days, hopefully it will finish tonight...

Over the last 2 days, I've moved the log file to its own disk, and the backups to their own disk in an effort to speedup things.

Then it got into recovery mode and it came out sometimes this morning.
i then done a log backup with truncate, and then a full backup, so days won't be wasted if this restructure fails, and I have to rollback.

I'm putting timestamps and progress messages in the script so I can see what stage its in, however it seams that the messages are buffered, which is not a good indicator, so

? How can I make this not buffered and show immediately:
PRINT CAST('Creating Tmp Table...' AS CHAR) + CONVERT(VARCHAR(20),current_timestamp,100) + CHAR(13);

Another question,
I started this script and stopped it... and ran it again and now it has generated:

Starting Conversion... Feb 27 2012 4:33PM

Dropping Constraint... Feb 27 2012 4:33PM

Msg 3733, Level 16, State 2, Line 3
Constraint 'DF_Page_is_deleted' does not belong to table 'Page'.
Msg 3727, Level 16, State 0, Line 3
Could not drop constraint. See previous errors.
Creating Tmp Table... Feb 27 2012 4:33PM

Msg 2714, Level 16, State 5, Line 2
There is already an object named 'DF_Page_is_deleted' in the database.
Msg 1750, Level 16, State 0, Line 2
Could not create constraint. See previous errors.

I'm wondering if this will timeout or continue processing? I have the timeout very high like 30000 seconds. I previously had it set to 12000, and it timed out there, so I bumped it. And of-course the 12000 number was bumped from default.

Why is SQL so lame on generic and default feedback to the user on whats happening, like status bars, progress bars, and other messaging and/or output windows?

Thanks, Stanley
 
To show immediately, move it to its own batch with the keyword GO after it...

PRINT CAST('Creating Tmp Table...' AS CHAR) + CONVERT(VARCHAR(20),current_timestamp,100) + CHAR(13);
GO

Works as expected...
 
George,

I almost have sql server back where I can resume some testing. I've made major changes such as bumping the ram from 6gb to 24gb, adding a raid 1+0 subsystem and placing the mdf file on it, moved the ldf file to a different disk, moved the backups to a different disk, created 2 file groups, where one stores the .tif images and the other stores the .sla images. I then removed the image fields from the main table, reindexed everything, done both log and full backups and shrinking.

The problem is the database size should actually be 12-15GB, when its currently over 250GB, and it refuses to shrink.

1. What should I do?

2. If I copy the tables to new ones and delete the old ones, will I get back the disk space?

I've shrunk both the database and all the file groups.

Thanks, Stanley
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top