darylbewise
Programmer
I have a MS SQL database table with around 500,000 rows. Each day, this table is updated using a csv file.
Currently I have been looping round each row within the CSV file to check to see if the row that it matches within the database has changed. If this has changed I will make a UPDATE sql query to the database, if the row does not exists I will make CREATE sql query.
In the above situation, there will be 2 database queries for each row within the CSV file. Therefore for a CSV file with 500,000 rows, in order to update the database, it will require 1,000,000 database queries.
I have been looking at creating a more efficient way of updating this database. What I was thinking of downing was loading the entire database into either a datatable (array or arraylist).
If I used a datatable:
1. Load the entire database into the datatable
2. Loop through the datatable for each row of the CSV file in order to find the relevant row
3. Either update the relevant datatable row using something like:
OR
Add a new row to the datatable:
4. Once I had looped through the entire CSV file, I would the bulk upload the datatable back to the database:
By making all requred alteration to the datatable instead of directly to the database, I was hoping to save processing time however with loading such a large database into a datatable, I am now starting to worry about the amount of memory that the datatable would consume.
Any suggestions of the best way to tackle this problem would be much appreciated. Should I update directly to the database, use a datatable to manipulate or array or arraylist????
Currently I have been looping round each row within the CSV file to check to see if the row that it matches within the database has changed. If this has changed I will make a UPDATE sql query to the database, if the row does not exists I will make CREATE sql query.
In the above situation, there will be 2 database queries for each row within the CSV file. Therefore for a CSV file with 500,000 rows, in order to update the database, it will require 1,000,000 database queries.
I have been looking at creating a more efficient way of updating this database. What I was thinking of downing was loading the entire database into either a datatable (array or arraylist).
If I used a datatable:
1. Load the entire database into the datatable
2. Loop through the datatable for each row of the CSV file in order to find the relevant row
3. Either update the relevant datatable row using something like:
Code:
table.Rows[row].BeginEdit(); table.Rows[row]["column"] = "1";
table.Rows[row].EndEdit();
table.AcceptChanges();
Add a new row to the datatable:
Code:
table.Rows.Add(“columns1”, “column2”, ..., “column7”);
Code:
SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationManager.ConnectionStrings["Database"].ConnectionString, SqlBulkCopyOptions.TableLock);
bulkCopy.DestinationTableName = tablename;
bulkCopy.WriteToServer(table);
By making all requred alteration to the datatable instead of directly to the database, I was hoping to save processing time however with loading such a large database into a datatable, I am now starting to worry about the amount of memory that the datatable would consume.
Any suggestions of the best way to tackle this problem would be much appreciated. Should I update directly to the database, use a datatable to manipulate or array or arraylist????