Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Eliminate duplicate records in a table 2

Status
Not open for further replies.

kcary

IS-IT--Management
Jun 10, 2003
11
0
0
US
I tried this starting at the query level and haven't had lotsa luck. What I'm using now is a maketable qry which leaves me with a table containing duplicate records.
I'd like to filter out the records where ALL fields match any other record in the table. Like this:

F1 F2 F3 F4 F5*
a b c d 1
a b c d 1
a b c d 2
a a a a 2

What I want is ONLY the last two lines. I can use SQL (with very specific instruction), VBA (with suggestions to get me started), or any functionality in Access 2000.
 
Try this:

SELECT DISTINCTROW tblA.*
FROM tblA;


 
Moxy1 -

It still returns every record. I ran across a note that DISTINCTROW requires at least two joined tables.

- kcary
 
I think DistinctRow means distinct rows from the input table looking at all of the fields. Because it is likely that only certain fields are selected for the make table, "Distinct" is better because it means unique rows based on the selected fields.

Another option is to remove the existing rows from your output table, put a unique index on the combination of fields, and then change your MakeTable query to an Append one. This has the benefit or preventing manual entry of duplicate rows but is less efficient with regard to query run time (only noticeable on large tables).

By the way, your example output looks like an unnormalized table. If you are going to use this table for further querying, you should reconsider the layout. If it is your final output then it is less important but may still be a good idea.
 
Thanks, Moxy1 and JonFer!

I was able to put everything together at last. The trick was to make two queries. The first, a MakeTable using DISTINCT, created a table which still included a single record for the the items I wanted to have removed. In the second query, I used MS's standard query for duplicates:

In (SELECT [F1] FROM [table1] As Tmp GROUP BY [F1] HAVING Count(*)>1 )

...which eliminated the extraneous records!
 
I spoke too soon, maybe. I went back, changed the MakeTable to a Select query, then used the second query to query the first... It was then that I ran into a dilemma. One of the fields I left out of the query was part of the PK for the main table... rendering my query "not updateable".
Is there someway to create an index at this point? I need to be able to use the datasheet from the final query to correct the main table.
 
It is not clear what you are trying to do. Do you have a table with duplicates and you are building another table with just the duplicate rows so you can go back to the main table and delete them? Or are you building a table with the unique rows so you can replace your main table?

Can you post the structure of your table with duplicates and your SQL for the queries? It would also help to clarify what your desired outpt is.

 
From the top:

I have an unnormalized table. The table contains 42 fields and >3000 records. This table contains items that our company sells, and is precisely the style of flat file which drove the creation of relational databases. For example, we may have 5 records for one item - each record listing the location where it's being sold, the packaging for the item, costs, etc.

What I'm trying to accomplish is to ensure that details such as dimensions and weight are constant over the entire table. For example, the same item should weigh the same amount, no matter where it goes. So I search the DB for all records with the same item, and verify that all applicable fields match on all of those records. If not, the 1 or 2 new records can be changed through the query without much fuss.

I can email a sample record or two, but I'm not sure how to get the structure of 42 fields to appear coherently in the forum.

As for desired output... the updateable query makes things easy and quick to update the records. The records that should be returned should be only those records where EVERY line is not identical as far as those 16 fields are concerned. All other fields will vary and should not cause records to appear in the output. And all updates must be manual, we'll automate later.

 
That helps a lot. Do you have control over the table structures? If so, I would add an Item table with the item characteristics and ignore the fields with the item characteristics in your unnormalized table. If there is a lot of processing tied to the table, you could rename the table and then create a select query joining the unnormalized table and the Item table and use the old table name for the view.

If you're stuck with the unnormalized table, I would do a Group By query off of it to make a new table. Group By the item code and the characteristics you are checking AND add a Count(*) field to see what is the most common set of characteristics. Add an index to the item code field and then you should be able to join it to the unnormalized table for updates. On the count(*) field put the condition:

(Select Max(CountField) from ItemTemp as a where a.Item=
MainTable.Item)

I didn't test the update part and the rules are not always clear for what join queries allow updates but it often works if your join fields are indexed and you are updating the many side of a one-to-many relationship.
 
You, sir, are a genius! Well, almost. One more step to go (I think). I've implented the query to add the Count field. From there, I've added another query to check the one above for count>1, which provides a list of items with multiple records and at least 1 which doesn't match the others.
It's perfect... except that it's not updateable. Trying to join it back to the original table results in more duplicate records appearing (8 records become 22, actually).
I tried to use Select Distinct in the second query (the one I added)... it's still running after 25 minutes. My goal was to remove the dups, then join the query to the original table to retrieve an updateable recordset. I've accomplished it in the past using append queries and a template with a PK for the item number, but this db is getting a little crowded.
There are already 30 tables, 50 queries, 15 reports, 15+ users, a page or two of macros, and a partridge in a pear tree. I don't know how much more Access can handle before we need to move this to it's own server.... but I digress.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top