Query Optimisation

idd · Nov 2, 2007

I am using SQL Server 2000
and have created a stored procedure so that it can be called in a crystal report, I used the stored procedure to possible speed up the report.

however it still takes an awful long time to run and resource utilisation on the Server (Processor usage) goes sky high while the report is querying the database.

There are other processes running at the same time and other users in the org could be setting off all sorts of other jobs at a similar time, making use of some of the same table I have used in this procedure.

How could I optimise this stored procedure which accepts one parameter, which is a branch name. The parameter (branch name) supplied by the user could vary each time however there are only 28 branches.

The code I used to create the Stored Procedure is given below

Code:

Create Proc PRS_PortalUser @Branchname varchar(255)
as

Select mem_ref
, mem_id
, mep_id
, Pn_id
, pn_first_name
, Pn_surname
, prole_id
, prod_name as Grade
, pds_name
, pd_status
, mep_status
, br_name						
, PD_Start_Date						
, PD_End_Date						

From pd_scheme 
inner join PD on pds_id = pd_pds_id 
Inner join membership on PD_mem_id = mem_id
Inner join member_period on Mem_id  = Mep_mem_id 
Inner join member_location as Grade on Mep_id = mel_mep_id 
Inner join member_location as mem_loc on Mep_id = mem_loc.mel_mep_id 
Inner join Product on grade.mel_prod_id = prod_id
Inner join person_role on mem_loc.mel_prole_id = prole_id
Inner join person on Prole_person_id = pn_Id
Left Join branch_xref					
	on mep_id = brxref_record_id			
left join branch					
	on brxref_br_id = br_id
Where pn_id not in (select Pn_id from person 
                            Inner join person_role on pn_id = prole_person_id
                            Inner join portal_user on prole_id = pu_prole_id)  
and pds_name = 'Continuing Professional Development'
and pd_status = '30-Current'
and mep_prod_id = '1-ish' 
and left(mep_status, 2) between '02' and '24'
and grade.mel_sy_type = '2'
and grade.mel_current = '1'
and mem_loc.mel_sy_type = '0'
and BR_Name = @branchname

Any help or suggestions appreciated

philhege · Nov 2, 2007

The first place to look for query performance is your index usage. Show the execution plan in Query Analyzer and see if any of the operations are using a table scan. Create indices for those based on the field(s) being compared.

Make sure the statistics for relevant indices are updated.

If you can live with dirty data in the results, consider setting your TRANSACTION ISOLATION level to READ UNCOMMITTED. This takes the other operations out of the performance mix.

If you are querying a database that is transaction-intensive, you should replicate the data to an analysis instance, and run SELECT queries on the analysis data, NOT the production OLTP database.

Phil Hegedusich
Senior Programmer/Analyst
IIMAK

http://www.iimak.com

-----------
Pity the insomniac dyslexic agnostic. He stays up all night, wondering if there really is a dog.

SQLSister · Nov 2, 2007

Not in can be a poorer performer than using a derived table and a left join. But you would need to test to see for sure.

Make sure you have indexes on all the fields in your joins and where clauses.

"NOTHING is more important in a database than integrity." ESquared

Crowley16 · Nov 2, 2007

try reordering your joins so you have the most restrictive joins at the top, i.e. joins which filters out the most records.

also try reordering your where tests so the most restrictive and fastest tests are at the top i.e. the direct = tests first, then the between and in.

there's other stuff you can do as well but I don't immediately see any obvious ones...

usually worth trying this stuff before considering indexes/schema changes...

--------------------
Procrastinate Now!

idd · Nov 5, 2007

Thanks to all of ou for your advice,

I will look into trying this techniques.

Idd

fredericofonseca · Nov 5, 2007

also if mep_status happens to be part of a index used, then the following should also be rewritten.
and left(mep_status, 2) between '02' and '24'
as
and mep_status between '02'||x'00' and '24'||x'FF'

as the "left" implies that column is not used to access the index.

Probably of less importance though than what has been said so far by the other posters

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

idd · Nov 5, 2007

FredericoFonseca,
I not too sure what you mean by
and mep_status between '02' ||x'00' and '24'||x'FF'

Mep Status is not part of an index,

It is a varchar field,

and the status in the fields are along the lines of

10 - renewal created
12 - reminder 1
13 - reminder 2
22 - current
25 - renewing
90 - deceased

So to filter out the ones I want I can add a line for each status I would like, I think that might be the fastest as it is all equals and definate values rather than a between operator. something like

(mep status = "10 - renewal created"
or mep status = "12 - reminder 1"
or mep status = "13 - reminder 2"
or mep status = "22 - current")

would that be a better alternative to the
left(mep_status, 2) between '02' and '24'

fredericofonseca · Nov 5, 2007

If the field is not part of an index, then its of no importance.

As for the explanation.

your sql
and left(mep_status, 2) between '02' and '24'
is basically stating all values of mep_status where the first 2 digits are between '02' and '24' e.g. they will include the following (just a minor sample)
02azzz
02c888
05gggg
068888
20bbbb
21xxx
24zzzzz

my suggested code
and mep_status between '02'||x'00' and '24'||x'FF'
,which by the way is invalid in TSQL. My mistake, and it should be instead as follows.
and mep_status between '02' + cast(0x00 as varchar) and '24' + cast(0xff as varchar)
means get everyting that lays between a value that starts with a "02" followed by a "low value" (0x00), and ends with a "24" followed by a "high value" (0xff)

gives you exactly the same results, but still allowing the field to be used for index access IF it was part of an index. Mute point on this case.

Only so you can see the difference, I did simulate the above situation on one of my tables, using a indexed field, and the differences between the explain table were as follows.

With field_name between '02'||x'00' and '24'||x'FF'
AvgRowSize="19"
EstimateCPU="0.00198692"
EstimateIO="0.0690509"
EstimateRebinds="0"
EstimateRewinds="0"
EstimateRows="1663.56"
LogicalOp="Clustered Index Seek"
NodeId="0"
PhysicalOp="Clustered Index Seek"
EstimatedTotalSubtreeCost="0.0710378"

IndexScan Ordered="true"
ScanDirection="FORWARD"
ForcedIndex="false"
NoExpandHint="false"

With Seek Predicates
Start Range: field_name >= Scalar Operator (convert...)
End Range: field_name <= Scalar Operator (convert...)

With left(field_name, 2) between '02' and '24'
AvgRowSize="23"
EstimateCPU="0.0204894"
EstimateIO="0.741644"
EstimateRebinds="0"
EstimateRewinds="0"
EstimateRows="1663.56"
LogicalOp="Clustered Index Scan"
NodeId="1"
Parallel="false"
PhysicalOp="Clustered Index Scan"
EstimatedTotalSubtreeCost="0.762133"

IndexScan Ordered="false"
ForcedIndex="false"
NoExpandHint="false"

With Predicate
substring(convert_implicit(field_name)) >= '02'
and
substring(convert_implicit(field_name)) <= '24'

As you can see by the results, one results on an index scan (slower), and the other on a index seek (faster)

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

ESquared · Nov 6, 2007

idd,

Use aliases. Please. Even though the column names have the table names in them (ouch). The moment you have the same column name in the query twice (such as having the same table twice) you're going to have to use them anyway. It's better to write your query so it doesn't break just because a new table is added to the join list.

Code:

[s]Where pn_id not in (select Pn_id from person 
                            Inner join person_role on pn_id = prole_person_id
                            Inner join portal_user on prole_id = pu_prole_id)[/s]

left join (
   person_role pr2
   Inner join portal_user on pr2.prole_id = pu_prole_id
) on pn_id = pr2.prole_person_id

where
...
and prole_id is null

you'll need an alias for your other person_role table. Depending on your data structure you might not need a second join to person_role anyway.

fredericofonseco said:
mep_status between '02' + cast(0x00 as varchar) and '24' + cast(0xff as varchar)

Code:

mep_status between '02' and '24' + char(255)
--careful with unicode, though...

Code:

--better is (not only simpler but tolerant of unicode):
(mep_status >= '02' and mep_status < '25')

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.

_{The best part about anything that has cheese is the cheese.}[/color]

TheSQL · Nov 6, 2007

Along the lines of what Crowley16 posted, order your most restrictive joins at the top of the WHERE clause - think of it as an effort to reduce the number of rows you have to work on as quickly as possible. Without knowing anything about the data schema or size of the tables, BranchName is probably a good place to staert. You're passing BranchName into the sproc, so it's likely a key restrictor. To help determine which clauses are the most restrictive, try commenting out all but one of the WHERE clauses and going through the proc for each WHERE clause item in turn, seeing which one returns the least rows. That's probably a good place to start.

Be VERY hesitant to add an index. It won't necessarily speed up the query - indeed, and index can sometimes slow it down - and the column likely isn't a prime candidate for an index unless it's highly selective. Keep in mind that any time you add an index, you're also adding the overhead associated with maintaining that index - every time you do an INSERT into the table, or and UPDATE if you're updating the column referenced by the index, you're writing to the index now as well as the actual data table. I'm not saying that indexing is evil - just that it needs to be approached with due caution. You don't want to speed up your crystal sproc and slow down every transaction against the table!

Also, if you have access to changing the application itself, you might want to rethink columns like:

Code:

and pds_name = 'Continuing Professional Development'
and pd_status = '30-Current'
and mep_prod_id = '1-ish'

It's generally more efficient for an enumeration to store it's information in two different fields (ex. for pd_status, you would have pd_statusid and pd_statusdesc, containing 30 and 'Current' respectively). That's more of an overall data structure tip though. Hope this helps.

ESquared · Nov 6, 2007

Folks, table join order in the query doesn't necessarily have anything to do with the query engine's join order.

idd · Nov 7, 2007

Thanks All for your thoughts and comments,

I will take these into account and try and implement as many as possible,

I will let you know of any outcomes

Esquared

I usually use this method described by you

Code:

--better is (not only simpler but tolerant of unicode):
(mep_status >= '02' and mep_status < '25')

I thought it wasn't a definate match as the data is actually text and I was comparing numeric values with text. Don't get me wrong, it used to work for me, I just thought that the left (... function would be better.

That was just based on my assumption though.

Ther are only five status that I am interested in for the purpose of this query, so I could even add a line for each of them something like

Code:

(mep status = "10 - renewal created"
or mep status = "12 - reminder 1"
or mep status = "13 - reminder 2"
or mep status = "22 - current")

would that be a better alternative as it is matching against four definate values.

Idd

ESquared · Nov 7, 2007

Your statuses should not have the descriptions in them. You should have a separate lookup table that you join to when you want to know the descriptions. Otherwise, the status column in your data should be some kind of integer value.

There are so many reasons this is true.

1. Number of rows per page is negatively affected by putting the full description in the data rows.
2. You can't just say "WHERE Status = #" as you have found out. You have to go through weird shenanigans.
3. Your data is denormalized. What if you want to change the text for a status? Now you have to update it in all the rows in the table.

I'm sure there are more reasons, but those each should be enough individually to convince you to stop putting descriptions in your data rows. Then you wouldn't have this kind of problem.

idd · Nov 12, 2007

ESquared,

I totally agree with that last post,

however, the database has been created by an external supplier. I cannot change the statuses but still have to be able to report on them.

Idd

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Query Optimisation

idd

Programmer

philhege

Programmer

SQLSister

Programmer

Crowley16

Technical User

idd

Programmer

fredericofonseca

IS-IT--Management

idd

Programmer

fredericofonseca

IS-IT--Management

ESquared

Programmer

TheSQL

Programmer

ESquared

Programmer

idd

Programmer

ESquared

Programmer

idd

Programmer

Similar threads

Part and Inventory Search

Sponsor