Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simpler code to speed up process? 1

Status
Not open for further replies.

EmilyPaterson

Technical User
May 14, 2010
3
GB
Hi all,

I'm quite new to SQL, so please bear with me.

I've managed to write some code which does what I want it to do, but it is taking a really long time to process. I don't fully understand the process SQL goes through to execute a query, so I was hoping that someone could help me combine/simplify the process and make the run time shorter.

Essentially what I'm trying to do is update a table with an index of whether the windspeed for a given event is above a given threshold or below. The code I wrote takes approx 1 hour to run currently for each threshold, except that I have over 50 thresholds, so as you can imagine, it's taking a long time to run.

Here's the code that I've written (I've only included 3 thresholds for sake of simplicity):

update dbo.Event_WindSpeed
SET WS_200 = (CASE WHEN (PEAKGUST >= 200) THEN 'OVER' ELSE 'UNDER' END);

update dbo.Event_WindSpeed
SET WS_210 = (CASE WHEN (PEAKGUST >= 210) THEN 'OVER' ELSE 'UNDER' END);

update dbo.Event_WindSpeed
SET WS_220 = (CASE WHEN (PEAKGUST >= 220) THEN 'OVER' ELSE 'UNDER' END);

Thanks!
 
You can write on update command doing all these updates, e.g.

update dbo.Event_WindSpeed
SET WS_200 = (CASE WHEN (PEAKGUST >= 200) THEN 'OVER' ELSE 'UNDER' END),
WS_210 = (CASE WHEN (PEAKGUST >= 210) THEN 'OVER' ELSE 'UNDER' END)
WS_220 = (CASE WHEN (PEAKGUST >= 220) THEN 'OVER' ELSE 'UNDER' END)

etc.

So, you'll update all your fields.

Also, the structure of this table looks un-normalized.

It would be much simpler to have

Threshold WindSpeed
200 500
220 300
etc.

fields instead.


PluralSight Learning Library
 
You have 50 of these to do? I bet I can make it 50 times faster. [smile]

Code:
update dbo.Event_WindSpeed
SET WS_200 = (CASE WHEN (PEAKGUST >= 200) THEN 'OVER' ELSE 'UNDER' END)[!],[/!]
    WS_210 = (CASE WHEN (PEAKGUST >= 210) THEN 'OVER' ELSE 'UNDER' END)[!],[/!]
    WS_220 = (CASE WHEN (PEAKGUST >= 220) THEN 'OVER' ELSE 'UNDER' END)

Please take note of the code above. It shows you how to update multiple columns within the same query. This is an important technique to learn.

However, I am more concerned about your table structure. Your question implies that you have a table with AT LEAST 51 columns. I suspect there are many more columns that you are not showing here (like location, date, etc...). There are MANY better ways of doing this.

First and foremost, you are storing 4 or 5 characters per column, with 50 columns. This requires approximately 250 bytes per row. If it's taking 1 hour to update your table, that implies a lot of rows (millions maybe?). Instead of storing OVER/UNDER, you could use a bit column instead. 1 (or true) could indicate over, and 0 (or false) would indicate under. Also... you could forget about having 50 columns and use 'bit' packing instead. This would allow you to have one column varbinary(50) with 50 bits (approximately 6 bytes).

Personally, I would forget about storing this data at all. It violates best practices, which states that you should not store calculated values. Your 50 columns are just that.

I suspect that you want these columns for reporting purposes. As such, I would suggest you create computed columns in your table for this purpose. Example:

Code:
Alter Table dbo.Event_Windspeed 
Add WS_200 As (CASE WHEN (PEAKGUST >= 200) THEN 'OVER' ELSE 'UNDER' END)

With a computed column, you do not need to worry about updating the value because the calculation occurs at the time you select the data. This occurs each time, so you can be sure that the data is always accurate.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
You are updating every record every time (including the ones you updated earlier, this process will get slower with time). Try adding a where clause such as where WS_200 is null.

Alternatively you could ditch the case statement altogether.
(Don't know if this will be faster but it's worth trying:

Code:
update dbo.Event_WindSpeed
set ws_200 = 'over',
ws_210 = 'over',
ws_220 = 'over'
where peakgust >= 220 and ws_220 is null

update dbo.Event_WindSpeed
set ws_200 = 'over',
ws_210 = 'over',
ws_220 = 'under'
where peakgust >= 210 and ws_220 is null


update dbo.Event_WindSpeed
set ws_200 = 'over',
ws_210 = 'under',
ws_220 = 'under'
where peakgust >= 200 and ws_220 is null

update dbo.Event_WindSpeed
set ws_200 = 'under',
ws_210 = 'under',
ws_220 = 'under'
where peakgust <=200 and ws_220 is null

Include all your thresholds in each query and start from the highest one down. At least this way the case statment doen't have to figure out what to do and since you are processing all fields, you probably only need the one field to indicate if the record has been updated already or not.
Another thing to consider about your performance is are there any triggers on the table? It could be that they are making it slow, especially if someone wrote them to process row-by-row and you are doing a loarge number fo records ata time. Could be the triggers need to be rewritten more than teh query. I fixed a trigger like that on out database and changed a 40,000 record insert from 40 minutes to around 45 milleseconds.






"NOTHING is more important in a database than integrity." ESquared
 
Hi everyone,

Thanks everyone for responding so quickly, and for helping out a newbie. I think I've got it sorted now.

I created a new table and formatted the threshold columns as bits, and then combined all the update queries into one.

I tested it out on 2 thresholds, and it took 8 mins to update the two columns on over 24 million rows. Much faster than the 2 hours it previously took!

Thanks for all your help!
Emily



 
So I just ran the query on all 50 columns, and it actually took under 5 mins. (Not sure why it took 8 mins on 2 columns...)

You guys are great!

Thanks again.
 
You created a new table, so it was probably stored as efficiently as possible (given that the 50 columns were empty). By running the update on 2 columns, you probably got table fragmentation because sql server needed to make room in the table for the new data. When you updated all 50 columns, you probably ended up with less additional fragmentation, which caused less page splits, which caused the query to run faster.

I know this may seem a little complicated, but rest assured that many of us are not surprised that your 2nd run was faster even though it updated more data.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top