Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Slow stored procedure execution 2

Status
Not open for further replies.

jbradley

Programmer
Sep 7, 2001
248
US
I have a stored procedure in a SQL Server 7 database to find duplicate records in one of the tables that is taking a very long time (over a minute and a half) to execute. I'm hoping someone can show me a way to speed it up.
Code:
CREATE PROCEDURE [spDataInputDupsRange]

@StartDate smalldatetime,
@EndDate smalldatetime,
@CostCenter char(5)

AS

SELECT DISTINCT DataInput.CostCenter,
                DataInput.EmpID,
                EmployeeList.EmployeeName,
                DataInput.CategoryName,
                DataInput.Date,
                DataInput.AccountNumber,
                DataInput.TransID,
                DataInput.SupID,
                DataInput.MgrID,
                DataInput.Title,
                DataInput.WorkCategory,
                DataInput.CategoryType,
                DataInput.Standard,
                DataInput.Volume,
                DataInput.Time,
                DataInput.Amount,
                DataInput.InputDate
FROM DataInput INNER JOIN EmployeeList ON DataInput.EmpID = EmployeeList.EmpID
WHERE (DataInput.Date BETWEEN @StartDate AND @EndDate)
   AND DataInput.CostCenter=@CostCenter
   AND DataInput.AccountNumber IS NOT NULL
   AND LEN(DataInput.AccountNumber) = 16
   AND (DataInput.CostCenter IN
     (SELECT CostCenter
      FROM DataInput AS Tmp
      GROUP BY CostCenter, EmpID, CategoryName, Date, AccountNumber
      HAVING COUNT(*) > 1
	 AND EmpID = DataInput.EmpID
	 AND CategoryName = DataInput.CategoryName
	 AND AccountNumber = DataInput.AccountNumber
	 AND [Date] = DataInput.Date
	 AND CostCenter = DataInput.CostCenter))
      ORDER BY DataInput.EmpID, DataInput.CategoryName, DataInput.AccountNumber
I'm calling this from a VB6 SP5 program using ADO and OLEDB:
Code:
rs.Open "spDataInputDupsRange '" & dtStartDate & "', '" & _
				   dtEndDate & "', '" & _
                                   strCostCenter & "'", cn, adOpenForwardOnly, adLockReadOnly
The DataInput table contains about 1.5 million records. I've had to set the CommandTimeout property to 120 seconds to prevent a Timeout Expired error.

Are there any changes I can make to the stored procedure to get a better execution time?

Thanks in advance for any suggestions.
 
Is the DataInput table indexed at least? If not try to create indexes on some of the fields used in the WHERE clause of your query.
 
Given the large number of conditions and the complexity of your sql, statement, maybe the time to process 1.5 million records is not unreasonable. But a few observations anyway:
- You could try pre-selecting the records from DataInput into a cursor or work table before joining, especially if you're actually selecting only a small percentage of the 1.5 million records. That may or may not help.
- Maybe also pre-select the list of cost centers into a work table, especially if there are only a few of them.
- In general try to reduce the number of conditions - try not to use functions ( i.e. len() ) in the where clause
- You probably already thought of this, but make sure all fields montioned in the JOIN and WHERE clauses are indexed, and as simple indexes, not compound indexes, and without functions. Functions in indexes kill the servers ability to optimize the query.
- Noting the READONLY flag on the OPEN statement, it's not waiting for READONLY access, is it?

Lastly, is the server hardware underpowered? Adding RAM to the server may make it faster.

Hope this helps.



Mike Krausnick
Dublin, California
 
Thanks for the suggestions. The idea to pre-select from the DataInput table is especially appealing as what the customer is usually looking at is just the most recent data, from a single cost center at a time.

All the fields referenced in the WHERE clause are already indexed individually and the server is pretty beefy.

I'm opening th e recordset as READONLY because I was under the impression that it was the fastest option. Is this not the case?
 
And also use JOIN instead of IN clause:

DECLARE @StartDate datetime,
@EndDate datetime,
@CostCenter char(1)

SELECT DISTINCT DataInput.CostCenter,
DataInput.EmpID,
EmployeeList.EmployeeName,
DataInput.CategoryName,
DataInput.Date,
DataInput.AccountNumber,
DataInput.TransID,
DataInput.SupID,
DataInput.MgrID,
DataInput.Title,
DataInput.WorkCategory,
DataInput.CategoryType,
DataInput.Standard,
DataInput.Volume,
DataInput.Time,
DataInput.Amount,
DataInput.InputDate
FROM DataInput
INNER JOIN EmployeeList ON DataInput.EmpID = EmployeeList.EmpID
INNER JOIN ( SELECT CostCenter
FROM DataInput AS Tmp
GROUP BY CostCenter, EmpID, CategoryName, Date, AccountNumber
HAVING COUNT(*) > 1
) AS Cost_Center ON Cost_Center.CostCenter = DataInput.CostCenter
AND Cost_Center.EmpID = DataInput.EmpID
AND Cost_Center.CategoryName = DataInput.CategoryName
AND Cost_Center.AccountNumber = DataInput.AccountNumber
AND Cost_Center.[Date] = DataInput.Date
WHERE (DataInput.Date BETWEEN @StartDate AND @EndDate)
AND DataInput.CostCenter=@CostCenter
AND DataInput.AccountNumber IS NOT NULL
AND LEN(DataInput.AccountNumber) = 16
ORDER BY DataInput.EmpID, DataInput.CategoryName, DataInput.AccountNumber

----------------------------------------
With pre-selecting data to temp table as 'Mike Krausnick' suggested:

SELECT CostCenter, EmpID, CategoryName, AccountNumber, [Date]
INTO #temp_table
FROM DataInput AS Tmp
GROUP BY CostCenter, EmpID, CategoryName, Date, AccountNumber
/* filter just wanted records - this WHERE clause may speed up your query, but can be omitted */
WHERE CostCenter = @CostCenter
AND ([Date] BETWEEN @StartDate AND @EndDate)
AND LEN(AccountNumber) = 16
HAVING COUNT(*) > 1

SELECT DISTINCT DataInput.CostCenter,
DataInput.EmpID,
EmployeeList.EmployeeName,
DataInput.CategoryName,
DataInput.Date,
DataInput.AccountNumber,
DataInput.TransID,
DataInput.SupID,
DataInput.MgrID,
DataInput.Title,
DataInput.WorkCategory,
DataInput.CategoryType,
DataInput.Standard,
DataInput.Volume,
DataInput.Time,
DataInput.Amount,
DataInput.InputDate
FROM DataInput
INNER JOIN EmployeeList ON DataInput.EmpID = EmployeeList.EmpID
INNER JOIN #temp_table AS Cost_Center ON Cost_Center.CostCenter = DataInput.CostCenter
AND Cost_Center.EmpID = DataInput.EmpID
AND Cost_Center.CategoryName = DataInput.CategoryName
AND Cost_Center.AccountNumber = DataInput.AccountNumber
AND Cost_Center.[Date] = DataInput.Date
WHERE (DataInput.Date BETWEEN @StartDate AND @EndDate)
AND DataInput.CostCenter=@CostCenter
AND DataInput.AccountNumber IS NOT NULL
AND LEN(DataInput.AccountNumber) = 16
ORDER BY DataInput.EmpID, DataInput.CategoryName, DataInput.AccountNumber


-------------------------------
Also try to experiment with table variable ( instead of temp table ) such as:

DECLARE @table_variable table ( CostCenter char(5),
EmpID integer, -- just for example
Date datetime,
...
)

and
INSERT INTO @table_variable
SELECT CostCenter, EmpID, CategoryName, AccountNumber, [Date]
INTO #temp_table
FROM DataInput AS Tmp
GROUP BY CostCenter, EmpID, CategoryName, Date, AccountNumber
...

and use it in from clause : INNER JOIN @table_variable ON ...


Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Thank you very much! Those are all great ideas. I'll learn much working with them.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top