Inserting records in bulk, but also checking for duplicates

LucL · Jul 8, 2006

Hi Guys,

I've got a database that contains about 2 million records.

I also have a form that allows the user to insert 500 new records at a time.

I'm trying to figure out the best way to add the new records in bulk but also check to make sure they are not already in the database (only one column matters).

my insert command looks something like.

"INSERT INTO db1 (col1,col2,col3) VALUES (v1,v2,v3),(v4,v5,v6),etc"

Right now I'm dumping all of the col1 (the only column that I want compared) into an array in perl then scanning that array for duplicates and insert the data which is not a duplicate but this method if a bit flawed sine I have to dump all the COL1 records into an array and these can take up quite a bit of RAM.

Would using a "SELECT" statement with the 500 new records (ex: (SELECT col1 FROM db1 WHERE col1=v1) OR (SELECT col1 FROM db1 WHERE col1=v2) OR etc...") to check for duplicates be better?

Thanks!
Luc L.

johnwm · Jul 8, 2006

Check out the DISTINCT keyword for your SELECT statement

http://dev.mysql.com/doc/refman/5.0/en/select.html

________________________________________________________________
If you want to get the best response to a question, please check out FAQ222-2244 first.
'If we're supposed to work in Hex, why have we only got A fingers?'
Drive a Steam Roller

r937 · Jul 8, 2006

have you declared a unique constraint on the target column?

this will allow you to avoid having to check for duplicates altogether (since the database engine will do it for you)

r937.com | rudy.ca

Kozusnik · Jul 9, 2006

Also be sure to leave off the "or die;" section in your script's insert sql. When it encounters a dup, it will simply write to sdtout (if in text mode) and skip. If you're running a web app and also using sendmail, keep an eye on the mailboxes. If updating within a script, the dups may show as errors in the mailbox of the user running the script. The messages could queue up and eventually fill your disk.

Later,
Mark

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Inserting records in bulk, but also checking for duplicates

LucL

Programmer

johnwm

Programmer

r937

Technical User

Kozusnik

Programmer

Similar threads

Part and Inventory Search

Sponsor