You're making regular backup anyway, right? It should be 'relatively' simple to install an express instance on a computer, and then restore your full DB to the new instance. Enable full text indexing and then spend an hour playing around with it.
Of course, the computer you test this one will not likely be "as good as" the actual server, but that's not a bad thing either. If you can make the query acceptably fast on a slower computer, it'll be even better on the server computer. You don't want to use a super old computer (think paper weight here), but something a little older is good to test on.
Just curious here. If you have .5 million records change every day, with only 3.3 million total, does this mean you have one week's worth of data at any given point? Also, can you explain how you archive old data (done daily, weekly, hourly, etc...) and how new data is added... is this done continuously throughout the day, or are the emails loaded during the night?
You see, the real problem here is with the way you are doing your searches. Let's take the subject line of the email as an example. Suppose the subject line for the email is "Peter is on vacation next week". When you index this column, SQL creates an index that has a pointer to the row in the table where the data is found and another column with the contents of the data (the full text). But, because it is an index, the column is sorted. Now, if you do a search
starting at the beginning of the data, SQL can quickly jump to the data you are looking for. Imagine if you were to look in a dictionary for words that start with ST. You would probably be able to quickly skip to part of the book where the ST words begin, and then skip to the last entry where ST words begin. SQL does something similar, and it's called an index seek. Now suppose you wanted to find all the words in a dictionary where the letters ST appears ANYWHERE within the word. It would take you much longer because you would have to start with the first word in the dictionary and examine every word, up through the last one. This would take a lot longer to do. In SQL, this would be considered and index scan (or table scan). Scans are MANY times slower than seeks.
Let's take a closer look at your query.
[tt][blue]
WHERE [!]EmailTo LIKE '%domain.com'[/!]
AND Deliver = 0
AND Expire = 0
AND (EmailFrom LIKE '%peter%' OR Subject LIKE '%peter%')
[/blue][/tt]
Your EmailTo search is a problem because you are looking at the END of the data. Again, imagine you wanted to find all the words in the dictionary that end in ST. You would need to scan all the words to pick out the ones you want, and this would be considerably slower.
[tt][blue]
WHERE EmailTo LIKE '%domain.com'
AND Deliver = 0
AND Expire = 0
AND [!](EmailFrom LIKE '%peter%' OR Subject LIKE '%peter%')[/!]
[/blue][/tt]
You also have 2 columns where you are searching ANYWHERE within the string. Again, you are causing an index/table scan with that search.
Full Text indexing was designed to make these types of searches faster, which is exactly why I recommend that you research this a bit.
Without full text indexing, there are other ways that you MAY be able to speed up this query, but it will take considerable work to do so.
For example, when you load the data in to the table, you could store a "ReverseEmailTo" column. At the time you load the data, you use the Reverse function to store the reverse of the EmailTo column so "email@domain.com" also gets stored (in another column) as "moc.niamod@liame". You could then index this column and change your query to:
[tt][blue]
WHERE [!]ReverseEmailTo LIKE Reverse('%domain.com')[/!]
AND Deliver = 0
AND Expire = 0
AND (EmailFrom LIKE '%peter%' OR Subject LIKE '%peter%')
[/blue][/tt]
Now, instead of an index scan, you'll get an index seek on that column.
Another thing you could do is to create another table that stores individual words for your subject... something like this...
[tt]
tblEmailSubject
QuarID SubjectWord
------- -----------
9999 Peter
9999 is
9999 on
9999 vacation
9999 next
9999 week
[/tt]
Of course, you would want to have a multi-column index on this table where the Subject word appears first and the QuarID appears second.
[tt]Create Index tblEmailSubject_SubjectWord_QuarID On tblEmailSubject(SubjectWord, QuarID)[/tt]
Then you could search through this table and you would get really fast index seeks. Ex:
[tt][blue]
Select Distinct QuarID
From EmailSubject
Where SubjectWord = 'Peter'
[/blue][/tt]
Now, I recognize that each email probably has a subject with 10 words (or more) so this table would have 30 million rows instead of just 3 million. And I know this sounds like a lot, but... this is just for one column. Even though there will be more rows, each row will be smaller and the total size of the table will be smaller. Plus, you'll get index seeks on this table so the performance will be MANY times better than your current query.
Similarly... create another table for the EmailTo column. To store peter@domain.com, the data would look like this:
[tt]
tblEmailTo
QuarID EmailPart
------- ---------
9999 Peter
9999 domain
9999 com
[/tt]
Again, since there will be several parts to your email, there will be more rows in this table than your original, but the table size will be smaller and the performance will be much better. With these two additional tables and your reverse email from column, your query would probably look something like this:
Code:
SELECT TOP 5 QuarID, EmailFrom, EmailTo,
Subject, MsgDate, MsgID, RejectDesc,
RejectDetails
FROM tblQuarantine
Inner Join tblRejectCodes
On tblQuarantine.RejectID = tblRejectCodes.RejectID
Inner Join (
Select QuarId
From tblEmailSubject
Where SubjectWord = 'peter'
Union
Select QuarId
From tblEmailFrom
Where EmailPart = 'peter'
) as SubjectEmailSearch
On tblQuarantine.QuarId = SubjectEmailSearch.QuarId
WHERE ReverseEmailTo LIKE Reverse('%domain.com')
AND Deliver = 0
AND Expire = 0
ORDER BY MsgDate DESC, EmailFrom, Subject
I know this sounds like a lot of work, but if you are unable or unwilling to use full text indexing, this is probably the best way to get acceptable performance. All things considered, I would expect your performance to drop from 28 seconds to less than 1 second.
Sad but true... often times code needs to get more complicated in order to get better performance.
-
George
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom