Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

a question regarding data integrity 7

Status
Not open for further replies.

NuJoizey

MIS
Aug 16, 2006
450
US
I'm sure what I'm about to pose here is far from a-typical. What I am looking for is how this situation is commonly addressed in a professional db environment.

The issue:

You have a table with CompanyID(PK) and CompanyName.
One user enters ABC Co Inc. for the company name. A couple of days later, another user enters in Associated Business Contractors, and on another day, a different user enters Smith Enterprises, dba Assoc. Biz Inc.

All of these entries refer to the same company, but since there are thousands of Company records, no one users seems to notice that it has been already entered under a different name. Consequenty, database integrity gets marred because then different users update related records for different ABC companies.

What is a good strategy to deal with this?
 
Restrict that type of input to a position or department which takes data integrity seriously.

If that's not possible, hopefully your applications provide a bit of intelligence to help the user select an existing record rather than creating a new one.

If that's not the case, perform routine data cleansing operations.
 
i did write a routine that does a basic check, but it's really not enough.

Restrict that type of input to a position or department which takes data integrity seriously.

yeah, i wish!

Thing is that either we give these people some authority, and they mess up the integrity, or we restrict it, but then I or my ops manager ends up wasting time doing simple entry every day.

By data cleansing, I suppose you mean something like doing a data dump on the table every so often and looking for duplicates and update the database accordingly, right?
 
For data cleansing, you could write a routine or create a report which someone looks at to find duplicates, and do updates on your database from that.

I've also seen systems (and written similar logic myself) which provide "merge" capabilities so that an aware user can pick two entities and merge them into one, including all of the transactional data as well. There are several ways to do that, but the best way includes leaving a historical trail so that you are aware of what has been merged and when. These may require multiple database operations, so it would work best in an ACID compliant DBMS.
 
I had thought of designing screens to allo people to the merge idea, leaving the fix to the the "aware" user, with the historical trial as you describe. Of course that could probably get a little complex, but i can see how that would be doable.

So you sound like you're basically confirming my hunch that for this kind of thing, there is really no way to automate fully - it's more of a quesiton of management of the amount human intervention required - is that about right?
 
Yes that's exactly right.

However, one thing I forgot to mention is that there is software and services out there for purchase and fees that may be able to help you find duplicates, but the cost may outweigh the benefits. The only time I've used such a service was for address cleansing, and if your records contain the addresses of the businesses, then it may be worth a look. Whether or not there is anything out there that would be able to map ABC to Associated Business Contractors I have no idea. Someone else may have more insight into the algorithms these systems use, but regardless, I would be hesitant to automate updates based on these types of software.
 
Would it be possible to include a strictly regulated abbreviation field that would take the first letter of each element of the company name? It might then be possible to show the user other entries with the same abbreviation. Soundex might also be helpful.
 
Do your users enter addresses for the new companies? If so you might do a check on the address with existing companies and ask them if they meant (ABC company) when they type American Business Company. Then have a process that either purges the new record or dedups them depending on what other work happens.

You could also set up a report for management of all companies that are located in the same place (or have the same name)and ask them to identify and get rid of duplicates.

You should set up a formal dedup process. In ours we select the profile we want to keep and it adds any records that don't match in tables that are a one to many relationship (all the orders for both profiles would be kept for instance, but the id would be changed to the one you were keeping) and in other tables asks the user to choose the correct information by comparing them side by side (Such as is it ABC or American Business Company). Depending on the data associations this may be something that has to be done one at time with manual decision making or a group of records can be identified and the dedupping happen automatically based on a defined set of business rules.

"NOTHING is more important in a database than integrity." ESquared
 
great discussion. I'm grateful for your interest. Soundex is interesting.

we do track addresses, but they are not required, but even if we did, typos and other mistakes could still derail this, don't you think? However on the other hand, checks like this could at least close the gap and make it incrementally harder to enter potential dups.

Also seems like writing a little mini-app that does all this stuff could be quite time consuming, which I guess leads me to the this rhetorical question that only I and the firm can answer: - how painful of a problem is this to management? Where does working on an app(let) that makes data entry considerably more air-tight fall with the myriad of other priorities they have for their data management system?
 
Do your users enter addresses for the new companies? If so you might do a check on the address with existing companies and ask them if they meant (ABC company) when they type American Business Company. Then have a process that either purges the new record or dedups them depending on what other work happens.
Taking this even a step further, you could enter the duplicate, rejected company name into a stop-list to prevent people from repetitively entering duplicate records. The stop-list should also link to the appropriate company record.
 
I feel a bit queasy here, maybe I'm the one who's barking mad, but isn't this all twaddle?

<rant on>
Allowing duplicates and then cleaning them, whatever next?

I suspect that this is a fundamental cock-up in business processes. Each company can exist only once (that's a legal obligation in the U.K., and I suspect also in the U.S.) so why are you even contemplating duplicates?

Users must only be allowed to select from a list of already-known companies, or be allowed to request the addition of a new potential business partner. Each company must have a unique identifier (again, it's a legal obligation) so why don't you get the users to do enough work to know with whom they are dealing? Surely striking a multi-billion dollar deal with the Acme Widget company is a disaster, when actually you dealt with the Acme Wodget company.

If users don't know who they're dealing with, why oh why are they being allowed to add reference data to a database?

Are your punters so dappy that they can't be bothered to contact Companies House (or the U.S. equivalent thereof) and download a definitive list of all registered companies. After that, users may only select from legitimate companies, not post code (zip code) plonkers!

<rant off>

<hastily dons asbestos-lined trousers>



Regards

T
 
well you do have a point and it is well taken despite the rant. but let me ask you a simple question. Do you want to be the guy that all the account people call every day to add new accounts to the system? Is your time best spent doing rote data entry - or should there be some way to automate that, which is the basis of the attempt at automation.

actually, i think the single point of entry is the way we're going to deal with it. Someone (not me) is going to just have to suck it up and do it.
 
Sadly, I fear you're correct.

However, I do think there might be some mileage in downloading a master list from Companies House. Users could then select, and where the "company doesn't exist" manual intervention would be necessary. At least that way you would only be fettling the system when essential, not every time someone makes a new business contact.

Also, by assigning a unique company identifier within your system, you immediately simplify the merging problem later on, as you can just transfer the "meat" of the contacts, and ignore the already-known demographic stuff, such as address.

Regards

T
 
Star for Tharg - good to see a commonsense approach. I have always found it worrying when a company allows IT guys to make decisions that fall outside their areas of knowledge and make work for others which IT should have been preventing.

___________________________________________________________
If you want the best response to a question, please check out FAQ222-2244 first.
'If we're supposed to work in Hex, why have we only got A fingers?'
Drive a Steam Roller
Steam Engine Prints
 
I would think the account people could enter the data as they know it and it be immediately available until verification and validation is performed and the record is finalized?

< M!ke >
[small]Where are we going and why am I in this handbasket?[/small]
 
company allows IT guys to make decisions that fall outside their areas of knowledge

this was not an IT decision, and it is hardly something that falls outside of our knowledge. It was a mangement decision, which IT implemented. I will grant you that this was an IT oversight, and IT should have recognized that this was not a good idea an advised management accordingly, since it is not management's responsibility to be able to forsee such issues.

 
Users must only be allowed to select from a list of already-known companies, or be allowed to request the addition of a new potential business partner.
This is so naive that I would subtract stars if I could.

First of all, your "list of already-known companies" is always outdated. Always.

You (and your applauders) have obviously never dealt with sales or marketing. Do you think you sell to IBM, period? That's the company, isn't it?

The truth is that this is much more complicated than you represent. Very often a "customer" is an independently operating part of a larger business: they order things on their own and they pay invoices on their own. If those invoices were directed to "Official Co. HQ, Inc." HQ wouldn't know what to do with them.

This type of data deconfliction takes human intelligence to be able to tell whether the business unit "Official Co., Western Region" is the same as "Official Co., Denver, CO." What's your guess?

Getting back to sales, you must enable your staff to take orders, period, without the need to guess at a selection of an existing business or (horrors!) wait for approval to enter a business.

M!ke above offers a reasonable method of dealing with "sloppy" data entry.
 
wow i like the debate here.

the message I'm taking away from all this is there seems to be no simple or de-facto way to get around the human intervtion aspect, but if you try, you're probably into a lot of creative engineering which may or may not be worth the trouble, when compared with just having a single point of entry.

harebrain hit it on the head, I support all marketing and account management types, and information changes frequently and rapidly, causing a lot of outdated information which is hard to stay on top of no matter how you slice it.



 
Also I know of no US equivalent list that lists all of the millions of companies in this country. Further to put all of them on a drop-down list would be impractical.

"NOTHING is more important in a database than integrity." ESquared
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top