Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimize Sql Query

Status
Not open for further replies.

mholbert

IS-IT--Management
Jan 10, 2006
105
US
I am building a stored procedure against a SQL 2005 database and need some help optimizing the query. It will end up running several hundred times a day so I want to make it as efficient as possible.

I ended up having to use temp tables to get where i was going, and that is what i would like to eliminate. I couldn't get the logic to pull the Top(1) appointments without first pulling the data together, ordering it appropriately, and then using Max to get the appropriate appointment returned.

The query is as follows, any assistance is appreciated in cleaning it up. Thanks in advance. MH

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO


ALTER PROCEDURE [dbo].[cuslrdc_LOV_NOV]
@PersonID UNIQUEIDENTIFIER,
@EncID UNIQUEIDENTIFIER,
@ProvID UNIQUEIDENTIFIER,
@EntID VARCHAR(5),
@PracID VARCHAR(4),
@LOV VARCHAR(8) OUTPUT,
@NOV VARCHAR(8) OUTPUT
AS

select a.appt_date
into #LOV
from appointments a
where a.person_id = @PersonID
and a.rendering_provider_id = @ProvID
and a.event_id in
(select event_id
from events e
where e.[event] like 'office%'
or e.[event] like 'consult%'
or e.[event] like 'reev%'
or e.[event] like 'complete ph%')
and (a.cancel_ind is null or a.cancel_ind = 'N')
and (a.delete_ind is null or a.delete_ind = 'N')
and (a.resched_ind is null or a.resched_ind = 'N')
and a.appt_kept_ind = 'Y'
and a.appt_date !=
(select CONVERT(varchar(8), pe.enc_timestamp, 112) from patient_encounter pe where pe.enc_id = @EncID)
order by a.appt_date desc

select @LOV = max(appt_date)
from #LOV

select a.appt_date
into #NOV
from appointments a
where a.person_id = @PersonID
and a.rendering_provider_id = @ProvID
and a.event_id in
(select event_id
from events e
where e.[event] like 'office%'
or e.[event] like 'consult%'
or e.[event] like 'reev%'
or e.[event] like 'complete ph%')
and (a.cancel_ind is null or a.cancel_ind = 'N')
and (a.delete_ind is null or a.delete_ind = 'N')
and (a.resched_ind is null or a.resched_ind = 'N')
and a.appt_kept_ind != 'Y'
and a.appt_date > (CONVERT(varchar(8), getdate(), 112)-1)
order by a.appt_date asc

select @NOV = max(appt_date)
from #NOV

GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO



 
How long does it take for this query to run?

How many rows end up in the #LOV table?

How many rows end up in the #NOV table?

How many rows do you have in the appointments table?

I'm gonna guess that the appointments table is the biggest one involved here. So.... can you share with us the indexes on that table? (run this)

sp_helpindex 'appointments'

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
The query takes 00:00:00.078 seconds to run.

Both #LOV and #NOV only return the # of appointments for a given patient, so they could contain a hundred records each, but generally it would be far less than that.

There are 1,902,333 records in the appointments table, and it grows by about 600 records per weekday.

The results of the sp_helpindex are pasted below.

index_name index_description index_keys
ak_appointments clustered, unique located on NEXTGEN_CORE appt_date, practice_id, appt_id, cancel_ind, delete_ind
ak_appointments1 nonclustered, unique located on NEXTGEN_INDEX_1 appt_nbr, site_id
CusLRDC_12642 nonclustered located on NEXTGEN_CORE practice_id, enc_id, casemgt_case_id
CusLRDC_2873 nonclustered located on NEXTGEN_CORE practice_id, resched_ind, appt_link_id
CusLRDC_4991 nonclustered located on NEXTGEN_CORE enc_id, casemgt_case_id, appt_date
cusLRDC_54870 nonclustered located on NEXTGEN_CORE rendering_provider_id, resched_ind, delete_ind
inx_appointments_25 nonclustered located on NEXTGEN_INDEX_1 practice_id, appt_id, appt_date, person_id, enc_id
inx_appointments1 nonclustered located on NEXTGEN_INDEX_1 practice_id, appt_date, appt_id
inx_appointments10 nonclustered located on NEXTGEN_INDEX_1 appt_link_id
inx_appointments11 nonclustered located on NEXTGEN_INDEX_1 orig_appt_id
inx_appointments2 nonclustered located on NEXTGEN_INDEX_1 practice_id, event_id
inx_appointments3 nonclustered located on NEXTGEN_INDEX_1 practice_id, begintime, endtime
inx_appointments4 nonclustered located on NEXTGEN_INDEX_1 enc_id, practice_id, location_id, appt_date, appt_kept_ind
inx_appointments5 nonclustered located on NEXTGEN_INDEX_1 person_id, practice_id, appt_type, cancel_ind
inx_appointments6 nonclustered located on NEXTGEN_INDEX_1 appt_date, practice_id, appt_id, event_id, begintime, duration, description, appt_status, cancel_ind, delete_ind, appt_type
inx_appointments7 nonclustered located on NEXTGEN_INDEX_1 last_name, appt_date
inx_appointments8 nonclustered located on NEXTGEN_INDEX_1 appt_id
inx_appointments9 nonclustered located on NEXTGEN_INDEX_1 appt_recur_id
pk_appointments nonclustered, unique, primary key located on NEXTGEN_INDEX_1 practice_id, appt_id


MH

 
It currently takes 78 milliseconds, and you want to improve on this because it is executed several hundred times a day?

In my opinion, I wouldn't bother optimizing the query because 78 milliseconds is already pretty good. That being said, there is room for improvement (but probably not much).

First, I would suggest a cte or a table variable instead of a temp table. You are not guaranteed to get better performance, but you *probably* will.

It looks like you are returning 2 scalar values (both are dates converted to strings, yuck). I took a close look at the 2 queries and the only difference I see is "and a.appt_kept_ind != 'Y'". One query has != and the other has =. In this case, I would create a table variable with 2 columns, the date and the appt_kept_ind. Then I would write a single query that populates but output variables simultaneously.

For example:
Code:
Declare @Temp Table(appt_date DateTime, appt_kept_ind Char(1))

Insert
Into    @Temp(appt_date, appt_kept_ind)
select  a.appt_date, a.appt_kept_ind
from    appointments a
where   a.person_id = @PersonID
        and a.rendering_provider_id = @ProvID
        and a.event_id in
          (select event_id
           from events e
           where e.[event] like 'office%'
                 or e.[event] like 'consult%'
                 or e.[event] like 'reev%'
                 or e.[event] like 'complete ph%')
        and (a.cancel_ind is null or a.cancel_ind = 'N')
        and (a.delete_ind is null or a.delete_ind = 'N')
        and (a.resched_ind is null or a.resched_ind = 'N')
        and a.appt_date != 
          (select    CONVERT(varchar(8), pe.enc_timestamp, 112) from patient_encounter pe where pe.enc_id = @EncID)

Select	@LOV = Max(Case When appt_kept_ind = 'Y' Then appt_date end),
        @NOV = Max(Case When appt_kept_ind <> 'Y' Then appt_date End)
From    @Temp

Notice that we cut in half the number of times we go to the tables. This will undoubtedly speed things up a bit. Of course, there are other ways to accomplish this without a temp table, too.

The cte version would look like this:

Code:
; with AppointmentStuff As
(
  select  a.appt_date, a.appt_kept_ind
  from    appointments a
  where   a.person_id = @PersonID
          and a.rendering_provider_id = @ProvID
          and a.event_id in
            (select event_id
             from events e
             where e.[event] like 'office%'
                   or e.[event] like 'consult%'
                   or e.[event] like 'reev%'
                   or e.[event] like 'complete ph%')
          and (a.cancel_ind is null or a.cancel_ind = 'N')
          and (a.delete_ind is null or a.delete_ind = 'N')
          and (a.resched_ind is null or a.resched_ind = 'N')
          and a.appt_date != 
            (select    CONVERT(varchar(8), pe.enc_timestamp, 112) from patient_encounter pe where pe.enc_id = @EncID)
)
Select	@LOV = Max(Case When appt_kept_ind = 'Y' Then appt_date end),
        @NOV = Max(Case When appt_kept_ind <> 'Y' Then appt_date End)
From    AppointmentStuff

You could also do a derived table, like this:
Code:
Select	@LOV = Max(Case When appt_kept_ind = 'Y' Then appt_date end),
        @NOV = Max(Case When appt_kept_ind <> 'Y' Then appt_date End)
From    (
        select  a.appt_date, a.appt_kept_ind
        from    appointments a
        where   a.person_id = @PersonID
                and a.rendering_provider_id = @ProvID
                and a.event_id in
                  (select event_id
                   from events e
                   where e.[event] like 'office%'
                         or e.[event] like 'consult%'
                         or e.[event] like 'reev%'
                         or e.[event] like 'complete ph%')
                and (a.cancel_ind is null or a.cancel_ind = 'N')
                and (a.delete_ind is null or a.delete_ind = 'N')
                and (a.resched_ind is null or a.resched_ind = 'N')
                and a.appt_date != 
                  (select    CONVERT(varchar(8), pe.enc_timestamp, 112) from patient_encounter pe where pe.enc_id = @EncID)
      ) As AppointmentStuff


Please try these variations and respond back with the execution time. We may be able to improve a little more yet, but I'd like to see where we stand before trying anything else.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Awesome examples. I created the stored procedure 3 more times, one for each of your examples above. I then tested them each by right clicking on them and choosing execute, then populating the appropriate values for each input variable, and then running the resulting logic. I used the same values for the variables each time. When I reran my original logic, it took significantly longer (0.294), which leads me to believe that the 0.078 value I gave you was for a second run with the same values, not the initial run.

Is this the appropriate way to test the speed of the stored procedure?

The results are as follows:

SP Details Time
LOVNOV Original 0.294
LOVNOV1 Table Variable 0.023
LOVNOV2 CTE 0.018
LOVNOV3 Derived Table 0.019

Assuming my testing logic is sound, the CTE is the most efficient. When testing should I be using fresh input values each time to make sure the DB isn't caching the original stuff and reusing it?
 
Performance testing is a complicated topic. First of all, you should probably not *necessarily* assume that the CTE version is the fastest. In the numbers you present, the difference between the CTE version and the derived table version differ by 1 millisecond. I never trust differences this small. There's a million side affecting reasons why a query could be several milliseconds faster or slower. For example, someone else could be querying the database. Your computer could be running a virus scan, checking for emails, etc...

When testing should I be using fresh input values each time to make sure the DB isn't caching the original stuff and reusing it?

This, too, is a difficult question to answer. You may have heard that SQL Server is a memory hog. This is true, but only because it's trying to help you. You see... hard drives are many times slower than memory. To allow your queries to be faster, SQL Server will store data in memory. It will continue storing data in memory until it runs out of memory. At this point, it will unload some data to make room for more data. It does this in a "smart" way by unloading data that hasn't been used recently.

A similar process occurs for execution plans. Execution plans are created by SQL Server when you create a stored procedure. You can think of this as the compile process for a desktop app. When you run a stored procedure, sql server checks to see if the execution plan is already loaded in memory. If it is, it will use it. If not, sql server will load the execution plan from disk and then run it.

Bottom line... the first time you run a procedure, sql server needs to do more work, so it takes longer. Usually, the execution time for the first one does not matter because when you put this in to production, it's the long term, repeatable performance that matters the most.

There are several metrics that you can look at, specifically "buffer cache hit ratio" and "Page Life Expectancy". There is a free application that you can download that will explain this in more detail. Download and install SQLCop. When you run the application, click "Health" on the left side of the screen and then expand Buffer cache hit ratio and Page life expectancy.

Anyway... I usually test performance in several ways. First, for timing. Copy/paste this to a query window:

Code:
Declare @Start DateTime
Declare @LOV VarChar(8)
Declare @NOV VarChar(8)

Set @Start = GetDate()
Exec LOVNOV  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT
Select 'Original',DateDiff(Millisecond, @Start, GetDate())

Set @Start = GetDate()
Exec LOVNOV1  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT
Select 'LovNov1',DateDiff(Millisecond, @Start, GetDate())

Set @Start = GetDate()
Exec LOVNOV2  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT
Select 'LovNov2',DateDiff(Millisecond, @Start, GetDate())

Set @Start = GetDate()
Exec LOVNOV3  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT
Select 'LovNov3',DateDiff(Millisecond, @Start, GetDate())


Run the code above several times before you actually look at the numbers.

It's also helpful to compare execution plans. To do this, open a query window and copy/paste this code:

Code:
Declare @LOV VarChar(8)
Declare @NOV VarChar(8)

Exec LOVNOV  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT

Exec LOVNOV1  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT


Exec LOVNOV2  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT


Exec LOVNOV3  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT

Now hold the CTRL key and tap the M key. CTRL-M will show you the actual execution plan after you run the query. Pressing CTRL-M will not appear to do anything until after you run the query. Running the query now will take a little longer (don't be concerned about this) but you will now see another tab for the execution plan. There are some confusing things in there, but, you should see 4 sections (one for each query). There will be a percentage next to each one. The query with the lowest percentage will be the faster one.

Next, you may want to compare how many reads each query does. Many people think that the number of reads is the best indication of performance. The least number of reads the better. To do this...

Code:
Declare @LOV VarChar(8)
Declare @NOV VarChar(8)

Set Statistics IO On

Exec LOVNOV  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT

Exec LOVNOV1  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT


Exec LOVNOV2  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT

Exec LOVNOV3  'Paramvalue1','paramvalue2',etc..., @LOV OUT, @NOV OUT

Set Statistics IO On

Run the code above by pressing F5. When it's done, look at the messages tab. You will see a line for each query that shows the number of reads, physical, logical, etc...

The last thing I would like to mention is the procedure cache and the memory buffers. There are commands you can use to clear out the memory without restarting the service.

[!]!!! CAUTION !!![/!]
Do not run the following commands on a production server.
[tt][blue]
DBCC DropCleanBuffers
DBCC FreeProcCache
[/blue][/tt]

The previous 2 commands will cause your server to be slower until the buffers and caches are repopulated. On a production server, this can cause very real problems with performance for a while. Do not do this!

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top