abrahamvionas
Technical User
My situation is that I have a text file (all_data.txt) which I have successfully import into my database table (overture_categorydl). After "delete * from overture_categorydl" and altering the table to make one of the columns a primary key (search_term) and then attempting to reload the file using "LOAD IGNORE DATA INFILE all_data.txt INTO overture_categorydl;" Using IGNORE because I want to strip out the duplicate "search_term" entries.
When I execute the query it comes back saying that 8900 records inserted, 9400 duplicates. My record size when first loaded was 18300.
But it gets more interesting. I create another table without a primary key and again load (successfully) the entire file again. I then use "SELECT *, count(*) cnt FROM overture_category2 GROUP BY search_term HAVING cnt > 1;"
The result of this query tells me that I have '12' duplicates.
'12' duplicates is much more what I expect to see, and what makes sense. Looking at my data it doesn't appear to have 9000+ duplicates in it. And yet, from what I've been told, and what I know, a Primary Key (or Unique Key) violation of a constraint is only when one field would match another already existing field. So, I'm confused. It seems to me that a lot of my data is being considered "duplicate" when it doesn't look to be duplicate to me...
If I have two fields: (1) containing 'travel back rest' and (2) containing 'travel back support' will they be considered duplicates because they share two out of three words in common?
It doesn't seem like such should be the case, but I can't imagine any other way that my problem could be occuring.
Any advice is MUCH appreciated!
Thanks, Abe
When I execute the query it comes back saying that 8900 records inserted, 9400 duplicates. My record size when first loaded was 18300.
But it gets more interesting. I create another table without a primary key and again load (successfully) the entire file again. I then use "SELECT *, count(*) cnt FROM overture_category2 GROUP BY search_term HAVING cnt > 1;"
The result of this query tells me that I have '12' duplicates.
'12' duplicates is much more what I expect to see, and what makes sense. Looking at my data it doesn't appear to have 9000+ duplicates in it. And yet, from what I've been told, and what I know, a Primary Key (or Unique Key) violation of a constraint is only when one field would match another already existing field. So, I'm confused. It seems to me that a lot of my data is being considered "duplicate" when it doesn't look to be duplicate to me...
If I have two fields: (1) containing 'travel back rest' and (2) containing 'travel back support' will they be considered duplicates because they share two out of three words in common?
It doesn't seem like such should be the case, but I can't imagine any other way that my problem could be occuring.
Any advice is MUCH appreciated!
Thanks, Abe