markknowsley
Programmer
I want to write an algorithm that will search through a list of documents held in a database and pick out those documents that are duplicated once or more, and show the location of each copy of the document.
Here's the process I normally follow for such things with a few hundred rows:
1. Create a view of all the documents in the db.
2. Connect to the database and fill a dataset with the list of documents using SqlDataAdapter
3. Use a pair of ForEach statements, like this:
I'm looking at a database that contains over 600,000 documents. If my maths is right that's 600,000 ^ 2 searches - err... (frantic button pushing) lots of searches.
Does any one have any tips, or can anyone recommend any resources, on how to make this process more efficient?
Here's the process I normally follow for such things with a few hundred rows:
1. Create a view of all the documents in the db.
2. Connect to the database and fill a dataset with the list of documents using SqlDataAdapter
3. Use a pair of ForEach statements, like this:
Code:
foreach (DataRow rowDocuments in dsDocuments.Tables[0].Rows)
{
foreach (DataRow rowDuplicate in dsDocuments.Tables[0].Rows)
{
if (rowDuplicate[0] == rowDocuments[0])
{
//chuck the duplicate into a seperate DataTable
}
}
}
I'm looking at a database that contains over 600,000 documents. If my maths is right that's 600,000 ^ 2 searches - err... (frantic button pushing) lots of searches.
Does any one have any tips, or can anyone recommend any resources, on how to make this process more efficient?