Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Count of bad file number inserts. 1

Status
Not open for further replies.

sparkbyte

Technical User
Sep 20, 2002
879
US
I am trying to get a count of all file numbers that are invalid in our tracking table to show trending.

To see if programming changes improve or worsen when implimented.

this is the script I wrote to count bad records from a prorticulr date forward.

Code:
Select 
	Year(TrackingDate) As 'Year',
	Month(TrackingDate) As 'Month',
	count(Tracking_ID) As 'Number of Bad A file Numbers'
From dbo.tblTrackingTable
WHERE
	(dbo.IsFileNumberValid(FileNumber) = 0) 
	AND (TrackingDate > CONVERT(DATETIME, '2010-06-01 00:00:00', 102)) 
	AND (FileNumber <> '.BOX.END.')
Group By Year(TrackingDate), Month(TrackingDate)
ORDER BY Year(TrackingDate), Month(TrackingDate)

the dbo.IsFileNumberValid is a function that returns 0 or 1 depending on pass/fail.

One problem with this approach is that the script take a looong time to run. (6 minutes)

Results:
Code:
2010	6	204
2010	7	17
2010	8	9
2010	9	6
2010	10	6
2010	11	9
2010	12	1

any suggestions on how to speed this up?



Thanks

John Fuhrman
 
What do your indexes look like?

sp_helpindex 'tblTrackingTable'

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
the first index I added after running the tuning advisor.
I ended up stopping the querry after 22 minutes.(after adding the _dta_index_tblTrackingTable_12_1666104976__K6_K1_K5 index.

Code:
index_name                                               index_description                                     index_keys
-------------------------------------------------------- ----------------------------------------------------- ---------------------------------------
_dta_index_tblTrackingTable_12_1666104976__K6_K1_K5      nonclustered located on PRIMARY                       TrackingDate, Tracking_ID, FileNumber
Boxnumber                                                nonclustered located on PRIMARY                       BoxNumber
DateTime                                                 nonclustered located on PRIMARY                       TrackingDate
Filenumber                                               nonclustered located on PRIMARY                       FileNumber
PK_tblTrackingTable                                      clustered, unique, primary key located on PRIMARY     Tracking_ID

Thanks

John Fuhrman
 
I would hazard a guess that it is the function IsFileNumberValid that is slowing it down (easily tested: comment out the where line (dbo.IsFileNumberValid(FileNumber) = 0) and time it again). What does the function do?
 
I would suggest that you drop the index that was created by the tuning advisor since it doesn't appear to be helping. I would also suggest that you change the query slightly. You see, you are counting Tracking_ID, but you don't need to. You can count any field that cannot be null and actually get the same value. so....

Code:
Select 
    Year(TrackingDate) As 'Year',
    Month(TrackingDate) As 'Month',
    count([!]TrackingDate[/!]) As 'Number of Bad A file Numbers'
From dbo.tblTrackingTable
WHERE
    (dbo.IsFileNumberValid(FileNumber) = 0) 
    AND (TrackingDate > CONVERT(DATETIME, '2010-06-01 00:00:00', 102)) 
    AND (FileNumber <> '.BOX.END.')
Group By Year(TrackingDate), Month(TrackingDate)
ORDER BY Year(TrackingDate), Month(TrackingDate)

Now, looking at the code, we are only using one table, and only 2 columns from this table. So, we can create an index that includes both of these columns, which should speed up the query considerably.

Code:
Create Index idx_TrackingTable_TrackingDate_FileNumber On tblTrackingTable(TrackingDate,FileNumber)

I also agree with SimonSellick about the function. However, it could be that the optimizer is running the function on the entire table before filtering on the date. This would cause the function to run a lot more than it needs to. By changing the code as I suggest and adding the index, I am positive that the query will return in 6 minutes or less. If it's less, it will likely be considerably less.

I suggest you add the index and change the code and re-run. Let us know what the execution time is now. If it's not acceptable, then post the code for the function. There may be some optimizations for the function that would help speed things up.


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I am sure the dbo.IsFileNumberValid(FileNumber) = 0 is slowing it down a bit. But, I can run this on my workstation with SQL express and it runs in 27 seconds and I ended up killing the script on the production server after 27 minutes.

This is the File validation function
Code:
USE [MailroomTracking]
GO
/****** Object:  UserDefinedFunction [dbo].[IsFileNumberValid]    Script Date: 12/07/2010 10:18:59 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER Function [dbo].[IsFileNumberValid]
    (@FileNumber VarChar(15))
Returns Bit
As
Begin
    Declare @Prefix VarChar(15)
    Declare @IsValid Bit

    Select @IsValid = 0,
           @Prefix = Left(@FileNumber, PatIndex('%[^a-z.]%', @FileNumber + '1')-1)

    If (@FileNumber like '[a-zA-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
        OR @FileNumber like '[a-zA-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
        OR @FileNumber like '[a-zA-Z][a-zA-Z][a-zA-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
        OR @FileNumber like '[a-zA-Z][a-zA-Z][a-zA-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
        OR @FileNumber like '.BOX.END.')
        If Exists(Select 1 from tblFileNumPrefix Where FileNumPrefix = @Prefix)
            Set @IsValid = 1
    Return @IsValid
End

I have made the indexes on both SQL Express and the Pord server the Same.

Code:
index_name                     index_description                                        index_keys
------------------------------ -------------------------------------------------------- ----------------
BoxNumber, FileNumber          nonclustered located on PRIMARY                          BoxNumber
Date, BoxNumber, FileNumber    nonclustered located on PRIMARY                          TrackingDate
PK_tblTrackingTable            clustered, unique, primary key located on PRIMARY        Tracking_ID

And the Querry

Code:
USE MailroomTracking

SELECT 
	Year(TrackingDate)  AS 'Year',
	Month(TrackingDate) AS 'Month',
	count(TrackingDate)  AS 'Number of BAD file Numbers'
FROM dbo.tblTrackingTable
WHERE
	(dbo.IsFileNumberValid(FileNumber) = 0) 
	AND (TrackingDate > CONVERT(DATETIME, '2010-01-01 00:00:00', 102)) 
	AND (FileNumber <> '.BOX.END.')
GROUP BY Year(TrackingDate), Month(TrackingDate)
ORDER BY Year(TrackingDate), Month(TrackingDate)

Changed as suggested.

Also the SQL Express DB has RowGIUD and ModifiedDate columns added to it. Although I have not finished creating the triggers to keep the ModifiedDate column up to date.

The production server only has about 4 users plus myself connected and using it.

Prod server and SQL Express both have the DB, Indexes, and Logs on the same physical drive. (I know this is not ideal but it is what I am stuck with for the ime being.) I have a new server for Production but am waiting on approvals the the OS and DB Server versions before I can start to bring it on line. (5 months now..the wheels turn slowly.)

Eventualy the Express copy will have triggers added to all tables like the following.
Code:
USE [MailroomTracking]
GO
/****** Object:  Trigger [dbo].[utblFileNumPrefix]    Script Date: 12/07/2010 10:25:27 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[utblFileNumPrefix] 
   ON  [dbo].[tblFileNumPrefix]
   AFTER UPDATE
AS 
BEGIN
	SET NOCOUNT ON;

	Update dbo.tblFileNumPrefix
		Set ModifiedDate = getdate(),
			FileNumPrefix = Upper(a.FileNumPrefix)
	From dbo.tblFileNumPrefix a
		Inner Join Inserted As i
            On a.FileNumPrefixPK = i.FileNumPrefixPK
END

I may add an updated by column as well but haven't decided.

THANKS!!!!!!!!!!!!!!!!!!!!!

Thanks

John Fuhrman
 
Execution Plan


Index Seek (Cost 81%)
|
Filter (17%)
|
Computer Scalar (0%)
|
Hash Match(Agrgregate) (2%)
|
Compute Scalar (0%)
|
Sort (0%)
|
Select (0%)




Thanks

John Fuhrman
 
How many rows are in tblFileNumPrefix and what indexes do you have for it?

sp_helpindex 'tblFileNumPrefix'

Also, it looks like you didn't add the index I suggested:

Code:
Create Index idx_TrackingTable_TrackingDate_FileNumber On tblTrackingTable(TrackingDate,FileNumber)

Since the query only uses 2 columns, and both columns are included in the index, SQL should be able to use this index completely for the query without actually having to go to the table. This SHOULD speed things up considerably.

If this doesn't help, we may want to think about adding computed columns for the year and month columns and then indexing that.



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George, this table only has 16 entries.

Computed columns for Date??





Thanks

John Fuhrman
 
Computed columns for Date??

Yeah, kinda.

In your query, you are grouping by year and month and then ordering by year and month. If you create a computed column for each of those and index it, you should get improved performance.

Something like this:

Code:
Alter Table tblTrackingTable Add TrackingYear As Year(TrackingDate)
GO
Alter Table tblTrackingTable Add TrackingMonth As Month(TrackingDate)
GO
Create Index idx_TrackingTable_Year_Month_FileNumber On tblTrackingTable(TrackingYear, TrackingMonth, FileNumber)

Then your query would be:

Code:
Select 
    TrackingYear As 'Year',
    TrackingMonth As 'Month',
    count(TrackingYear) As 'Number of Bad A file Numbers'
From dbo.tblTrackingTable
WHERE
    dbo.IsFileNumberValid(FileNumber) = 0
    AND TrackingYear >= 2010
    And TrackingMonth >= 6
    AND FileNumber <> '.BOX.END.'
Group By TrackingYear, TrackingMonth
ORDER BY TrackingYear, TrackingMonth



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George, that seems to have made a marked improvement. It finished in 1 minute 3 seconds as apposed to having to kill the process after 27 minutes.

I will have to consider how i may be able to use calculated fields to improve other processes.

Thanks!!!!!!!!!!!!!!!!!!!!!

Thanks

John Fuhrman
 
Just for fun, can you add an index on that table that only has 16 rows and then tell me if it affects performance?

Code:
Create index idx_FileNumPrefix On tblFileNumPrefix(FileNumPrefix)

The reason I suggest this.... It seems like most of your time would be spent in the function because it calculates for every row in the table that matches your other criteria (for date and .Box.End). Anything you can do to speed up the function is likely to have a big impact on the performance. Eliminating the function entirely would probably cause the query to finish in a second.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
It went to 1:57 from 1:03.

Thanks

John Fuhrman
 
Cool. Kinda surprised, but cool. I didn't expect an index on a table with just 16 rows to make much difference.



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top