Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

dedupe help needed 1

Status
Not open for further replies.

mginty

IS-IT--Management
Feb 2, 2005
6
GB
I have a db where I have been given data comprised of 3 different ordering systems.
2 of these are old and 1 is our present system and I will be receiving updated data to enter into the database.

I intend to have a 1:Many relationship between tblCustomers and tblOrders. Then have a 1:many relationship between tblOrders and tblOrderlines. My structure will be as at the bottom of this post.

There are some duplications of customers and orders that I need to remove.
Normally, I would copy the structure only and append however, as an example, one duplicated record has extra fields and data that I wish to keep yet there are fields and data in the other duplicate record.
I am a little stuck on how best to proceed.
Here is an example:
(formatting will be out I think)
Step5 Find duplicates for tblOrders

OrderURN orderValue orderstatus
Record1 128
Record2 2001117 4.99

So here, I have orderurn and ordervalue in one record and orderstatus in another. I want to deduped but keep all the data in the fields. Haven’t got a clue how to do it unfortunately!

So, if anyone cal help I would be very happy indeed!
Thanks


PK= Primary Key/ FK= Foreign Key

* tblCustomers
g_user_id
source_customer_id
PK new_user_id


* tblOrders
customer_guid
source_customer_id
fk new_user_id
order_group_id
PK new_order_id

* tblOrderlines
order_number
order_id
fk new_order_id






 
mginty

deduped = De-duplicate = consolidate ???
New word for me.

Sounds like a mess, and it will take effort. How many order lines are we talking about?

I have done this type of thing for contact databases, never seen the problem with order systems, but I imagine the process is the same.

This one approach...
One tool - One of the queries in the query wizard is to find duplicates. The other tool, since you want to preserve / consilidate your data is to use VBA programming.

You may want to export to a spreadsheet to become familiar with your data. You can use queries for this too. Sort the data on different fields. Reviewing the customer info should be fairly easy. Reviewing the orders and determine what is new, duplicated or supplements the data will be much tougher. You really need to develop a feel for the data.

Develop a plan on how to review your data. Are there simple things that can be done first to reduce the number of records? By-the-way, I see you have "new_order_id" which suggests you have already worked on addressing the primary key stuff. Good work.

You may want to work in steps. Perhaps something like..
Go through the customer table
- Move all orders from a duplicate customer to the original customer by changing the new_user_id on the order.
- Delete the duplicate customers with no orders

How do you tell if you have a duplicate customer. This is where being familiar with the data helps. Cusomter name, or similar name, same phone number, same address or similar address.

Go through the orders. Look for things like the same customer and same order date or the same purchase or requisition or customer reference number; if they match, then look at the line items...
- if the same delete one set
- if new items, then add the record
- if the item is similar but different -- ??? tough one
- once the order has been consolidated, delete the duplicate order line items and then delete the duplicate order.

So that is the flow, but the details are a bit tougher...

If you do not have a lot of data, options include...
- exporting to a spreadsheet, massage the data and import back (break the relationships first, and recreate later)
- create a form with subforms to display matching records. use an even such as double click to transfer a record from one to the other, or over-write the other.

Coding would depend greatly on the data. Assuming the duplicate record query did not work, then an approach may be...

Code:
Private Sub WalkCustomerTable ()
Dim rst as DAO.Recordset
Dim strSQL as String, lngID as Long, strQ as String

strQ = Chr$(13)
strSQL = "SELECT * FROM tblCusomters Order By CustomerName"

Set rst = CurrentDB.OpenRecordset(strSQL)

With rst
   .MoveFirst

   Do While Not .EOF

      lngID = !New_User_ID
      If [COLOR=blue]FoundDupCustomer (lngID)[/color] Then
         .Edit
         CustomerName = "DeleteMe"
         .Update
         [COLOR=blue]TransferCustomer (lngID)[/color]
      End If
      .MoveNext
   Loop
   .Close
End With
Set rst = Nothing

strSQL = "DELETE * FROM tblCustomers WHERE CustomerName = " _
& strQ & "DeleteMe" & strQ

DoCmd.RunSQL strSQL

End Sub

This subroutine which you can use the OnClick event from a command button, will
- "walk" through the customer table
- call a function to check for duplicate
- if the record is a duplicate, flag for deletion
- call another function to transfer any new data
- delete the flagged records

Note that I am assuming you use field names CustomerName, and the primary key is a long integer.

Code:
Function FoundDupCustomer (lngID as Long) as Boolean

Dim rst as DAO.Recordset
Dim strSQL1 as String, strSQL2 as String, strQ as String
Dim intCount as Interger, booNoDup as Boolean
Dim strCustomerName as String, strPhoneNo as String
Dim strAddress as String, PostZipCode as String

booNoDup = True

'Load compare fields
strSQL1 = "SELECT * FROM tblCustomers WHERE new_user_id = " & lngID

Set rst = CurrentDB.OpenRecordset(strSQL1)

With rst
   .MoveFirst
   strCustomerName = !CustomerName
   strPhoneNo = !PhoneNo
   strAddress = !Address
   strPostZipCode = !PostZipCode
   .Close
End With

If strCustomerName <> "DeleteMe" Then
   strSQL1 = "SELECT * FROM tblCustomers WHERE new_user_id <>" & lngID

   'Check name
   strSQL2 = strSQL1 & " AND CustomerName = " & strQ &   strCustomerName & strQ
   set rst = CurrentDB.Openrecordset(strSQL2)

   If rst.RecordCount Then
      booNoDup = False
   End If

   rst.Close

   'Check phone number
   strSQL2 = strSQL1 & " AND PhoneNo = " & strQ & strPhoneNo & strQ
   set rst = CurrentDB.Openrecordset(strSQL2)

   If rst.RecordCount Then
      booNoDup = False
   End If

   rst.Close

   'Check Address
   strSQL2 = strSQL1 & " AND Address = " & strQ & strAddress & strQ
   set rst = CurrentDB.Openrecordset(strSQL2)

   If rst.RecordCount Then
      booNoDup = False
   End If

   rst.Close

   'Check postal / zip code
   strSQL2 = strSQL1 & " AND PostZipCode = " & strQ & strPostZipCode & strQ
   set rst = CurrentDB.Openrecordset(strSQL2)

   If rst.RecordCount Then
      booNoDup = False
   End If

   rst.Close

End If

Set rst = Nothing
FoundDupCustomer = Not booNoDup

End Function

Pretty simple function. Look for matches based on name, phone number, address and postal / zip code. Any match will return True which means the record is a duplicate.

Obviously, you would have to adjust this for your field names and requirements.

The TransferCustomer function would
- Find a/the matching record
- Look at each field and ...
-- copy over missing info
-- ignore the same / duplicate info
-- perhaps prompt the end user if the values for each field differed.

Tedious, but it would work for most of the data.

Hope this gives you some ideas on how to approach the problem.

Richard
 
Thank you ANdrew for this post. Incredibly informative and a great starting point for me.
It will take me some time to digest what you've written.

At present I am at the point where I have combined the 2 databases together and the correct relationships are there. As stated, I now need to de-duplicate (yes dedupe is a shortened version!) the orders. I'll post further info as I work thru and encounter more problems!
Just wanted to say thanks first.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top