Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

a question regarding data integrity 7

Status
Not open for further replies.

NuJoizey

MIS
Aug 16, 2006
450
US
I'm sure what I'm about to pose here is far from a-typical. What I am looking for is how this situation is commonly addressed in a professional db environment.

The issue:

You have a table with CompanyID(PK) and CompanyName.
One user enters ABC Co Inc. for the company name. A couple of days later, another user enters in Associated Business Contractors, and on another day, a different user enters Smith Enterprises, dba Assoc. Biz Inc.

All of these entries refer to the same company, but since there are thousands of Company records, no one users seems to notice that it has been already entered under a different name. Consequenty, database integrity gets marred because then different users update related records for different ABC companies.

What is a good strategy to deal with this?
 
Yeah, at best you would have to aggregate all 50 states lists of tax ID (which I'm not sure are public info anyway.) Not to mention, I believe it's legal in some circumstances to be a "business" without a tax ID. Then you have 'ABC, Inc.' doing business as 'XYZ Assoc.', 'ABC, Inc' dba 'JKL, Services', etc. All different entities.

As harebrain mentioned, you have different offices of 1 company that may be separate billing entities.

And just for more fun as an example, go to Google and search for "Northstar Consulting". Many different companies with the same name.

There's no real way to automate this.

_____
Jeff
[small][purple]It's never too early to begin preparing for [/purple]International Talk Like a Pirate Day
"The software I buy sucks, The software I write sucks. It's time to give up and have a beer..." - Me[/small]
 
Folks,

my rant was about the fundamental idiocy of deliberately allowing dross into a db, only to have the anti-dross workers come around later to clean it up - the thread's title mentions that dreaded word "Integrity" and sql sister's signature implies that she too thinks it's kind of important every now and then.

So, to take issue with the esteemed protagonists involved in this 'ere debate, I proceed in order of response:-

Harebrain, you are quite correct, in that so many companies have different offices around the country, etc. This only goes to demonstrate the well known phenomenon of "hidden complexity". As your post elegantly points out, there's more to it than meets the eye. My response is "so bleeding what?". Dealing with hidden complexity in business processes is why software developers get paid. If it was easy, we'd all be doing it. If the system designers can't handle the complexity, get yourself some different designers.

Boiling down all the complex business interactions involved, and producing a clear cut requirements specification would be no trivial exercise. Nonetheless, if the business wants excellence, it will just have to cough up for it. A master list might not be feasible in the end, but a free-for-all is far worse. If human intervention is unavoidable, then it should be carefully controlled and audited.

As regards customer recognition, isn't it bad to ring up a customer and discuss something, only to have them respond "but I talked to your colleague xyz only two days ago, don't you guys talk to each other?". Resolving differences is (IMHO) really important to have joined-up interaction with customers.

Second, the illustrious SQL Sister. I must say that I am amazed that companies can trade with no legal obligation to have an address of incorporation. So in the U.S. of A anyone can set up a company, start trading, open business bank accounts, not bother to pay taxes etc., all with no governance - amazing.

Second, the notion of a drop down list with a million vaues is risible. Any developer worth their salt would craft a form offering appropriate filtering, selection criteria, recent favourites etc. I don't believe that's a genuine issue.

As for the issue of billing addresses etc., any general purchase ledger software that can't cope with a business incorportated as address A, having goods delivered to its premises at addressed b,c,d, and with a billing address at e is dross, and should be replaced immediately. This is old hat for GPL software, so why should it be any different for other suites attempting (and apparently failing) to offer similar functions.

Regards

T
 
Tharg, in the US a "corporation" is simply one of many ways to setup a business. Simple companies can be setup where there is no separation between your personal assets and company assets. You can have many different legal entities using the same business address or one corporation using many billing addresses. You can have one entity "doing business as" another entity with legal separation between them. In general, it's a free for all.

To open a business bank account you would need a tax ID for the business, but that may be federal or may be state issued. There is no master list.

_____
Jeff
[small][purple]It's never too early to begin preparing for [/purple]International Talk Like a Pirate Day
"The software I buy sucks, The software I write sucks. It's time to give up and have a beer..." - Me[/small]
 
That explains some of the problems. Here, every limited company (even those run by one man and his dog) are obliged to have on public display a certificate of incorporation at the company's registered address. Each company has a unique number, assingned by Comopanies House, who liaise with H.M. Customs and Excise to make sure that all duties and taxes are paid.

You do seem to have a bit of a "snowflake in a blizzard" problem your side of the pond. I still believe that there is no excuse for poor software though. If business in the U.S. is like that, then shouldn't the major software houses at least make a fist at dealing with the situation? I must say that I'd want Messrs. Clinton and Obama to do something about such a situation though - it seems ripe for fraudsters etc.

Maybe we could set up a Nigerian Trading company based in the U.S., and see how many punters we can fleece before the old bill come calling?

Sorry, I just realised that I've wandered off thread.

Regards

T
 
[ ]

In the U.S. all you need to open a business bank account is a Federal Tax ID for the business. If the business has no employees, then a Federal ID is not required and one can use their personal Social Security Number instead.

Businesses here can be Corportations, Limited Liability Companies (LLC), Partnerships, Limited Parterships, Sole Proprietors, etc all with different Federal, State, County, and local requirements for registration or non-registration as the case may be.

Nor does is a business required to have a distinct business name. Anyone can use their personal name for the business name so long as no one else is using the same name in the counties that it is registered in. In reality, there can be over 3000 totally unrelated businesses with exactly the same name, one in each county. However, in practice, most businesses register in all or most counties in one or more states. If it is a Corporation or some other state regulated business, then registering with the state automatically registers the business in all of the counties in the state.

As for how to handle the data integrity problem, I use Zip Codes as the primary entry with a drop-down list of available customers in that zip.

If perchance the zip is unknown, the second choice is using phone numbers for lookup, HOWEVER, new customers are not allowed to be entered just because a phone number cannot be found since most companies have several or even thousands of different phone numbers.

Finally, lookups are allowed using company names. This uses a modified Soundex where names like 'Eden Truffles,' Dental Supplies,' and 'Teen Eating House' would all come up on the same list. This does not work very well when the same company name (AAMCO for example) shows up hundreds of times along with unrelated variations (AMCO, AMOCO, etc).

After all of these have been checked, then and only then, can a new customer be added. This does not totally solve the problem, but it eliminates most duplicates up front.

Since I generally only need to worry about a maximum of 20,000 business entities this seems to work well for me.



mmerlinn


"Political correctness is the BADGE of a COWARD!"
 
my rant was about the fundamental idiocy of deliberately allowing dross into a db
Picture yourself and the owner of a business, standing behind a salesman taking an order over the phone and entering it into the computer. Now ask the owner, "What do you want, this sale or data integrity?" The business owner knows what "sale" means.

My point was that operational reality often clashes with CS dogma. Welcome to the real world, where we must reconcile ugly realities with pretty models.
 
harebrain said:
Picture yourself and the owner of a business, standing behind a salesman taking an order over the phone and entering it into the computer. Now ask the owner, "What do you want, this sale or data integrity?" The business owner knows what "sale" means.

My point was that operational reality often clashes with CS dogma. Welcome to the real world, where we must reconcile ugly realities with pretty models.

Very good point. So taking away the possibility of a user to add a company would be no good idea. Nevertheless having two records for the same company messes up all statistics on that business contact etc.

First it seems a little too less information stored into the companies record, only a PK and the company name.

If companies are customers, why not use the wide spread practice of adding your own identiefier, eg a customer number you may ask of recurring customers to reidentify them exactly or some other data which suits as an identifier as telefone number or mail adress, as they are by their nature existing only once. Or add your own 3-5 letter acronym to the company name, which would less likely differ each time someone enters it.

I understand that a company can have many phone numbers and mail adresses, but that may reflect the fact you have to do with different departments with their independant accounting office.

The idea for an algorithm to make suggestions for what company the name entry could be using Soundex is a start. But Soundex does not help with eg abbreviations. Having a full text index you could search for parts of the company names entered combined with an acronym search would perhaps help more.

Bye, Olaf.
 
thargtheslayer,

So, what about all the sole traders who don't need to register with Companies House?

NuJoizey,

If you wish to go down the controlled route, Dun and Bradstreet are the leaders in solutions for this problem.
Craig
 
They are required to declare themselves as sole traders. Again, that's mandatory, and has subtly different legal connotations. I believe that a sole trader may have his personal assets seized against any debts, whereas a limited company (as its name suggest) has liability limited to its assets, and not the personal assets of its directors.

Regards

T
 
No, sole traders do not need to register with Companies House. They are unincorporated and therefore not a company by definition.

And then, what would you do with either an individual who wanted to trade with you or an international company? Would you turn them away because 'you don't have a company number'? I'd love to see you explain to the boss that the reason the business has to turn down work is because 'Computer says no'.

C
 
To add to the mess of using a TaxID, if the business entity is an individual and not a Inc, then you would be storing that Individual's SSN. Now you get into SOX, and privacy issues of insuring that the SSN and all other "privacy Information" is encrypted and secured and all employees that look at that account with the SSN have signed Confidentiality notices.

So to use the SSN/FEID as the unique look up value, just snowballed into data security. Phone numbers and addresses tend to be a good starting place for comparing against existing parties. Sure you're going to have the occasional typo, but it does weed out many of the potential dupes you will encounter. It's a fairly simple check to see if phone/address all ready exists, especially if you control the exact format that the phone number is entered in.

Not perfect, but the cheapest and easiest way to help. A weekly/monthly report of all new clients could also cut into the dupe count, greatly. And with a small company that cannot see the ROI on buying "look up" data, it's about all you have.

"If I were to wake up with my head sewn to the carpet, I wouldn't be more surprised than I am right now.
 
Hmmmmm...

In real-life databases, duplication happens. There's stuff you can do to make it less likely, but it's bound to happen in a database of any size. A few random thoughts:

One approach I've used to reduce duplicate records is to only allow users to enter a new record after they've searched existing ones. For example:
[ol]
[li]The user populates the customer details by clicking a [Customer...] button.[/li]
[li]The button pops up a dialog asking for name, address and other search parameters. The user enters what (s)he knows and clicks [Search].[/li]
[li]A list of matching entries is shown. The user may pick one of these, or click a [New...] button[/li]
[li]Only at this point may they add a new customer[/li]
[/ol]
This approach works best when there's a good chance that the record is already in the database (I've used it for address entry where we've got a table of postal addresses). If a large percentage of customers are new ones, this approach may not be for you.

Since companies can and do operate under multiple names, your database should/could be designed to cope with this. An extra table can be used to hold aliases for the companies you deal with, and factored in to any search/duplicate spotting routines you have.

Nonetheless, duplicates will still occur from time to time, so I suggest you should have a process in place to merge them together. Reports that identify potential duplicates are a good idea too.

Having said that, you also need to determine how serious a problem a few duplicate records is to your business. I'm working in a Social Services department at the moment, and it's really important to us to know that the John Smith whose father is occasionally violent is the same John Smith that's turning up in the hospital twice a week. Does it really matter that you think you've got 2053 customers, but you've actually got 2050 and 3 duplicates? That's a cost-benefit question for the business to answer.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
I too work with a database which must allow duplicates. It's a medical application for hospitals. For reasons of either vanity or embarassment, some folk lie about who they are when being treated. They say that this is the "first time I've been treated."

Despite the fact that their medical history may contain information which is invaluable to the clinician attempting diagnosis and treatment, this happens often. Some repeat offenders are recognised by staff who quietly enter the correct demographic data. The NHS Number is used as the unique identifier in this case, and we have to merge duplicate records together sometimes, when tests reveal that two supposedly different people are one and the same. Duplicate elimination is a major hassle and fraught with risk. What if you wrongly merged Joe Smith with Joseph Smith, only to find out later that they are different, and one is allergic to penicillin and the other isn't?

There's nowt so queer as folk!

Regards

T
 
Therg,

That sort of problem was dealt with years ago by Martin Fowler et al by the Cosmos project for the NHS. In his book, Analysis Patterns, he describes merge/unmerge strategies pretty clearly and how to deal with the risks. It was canonical for me when working on NPfIT.

C
 
tharg said:
some folk lie about who they are when being treated.
I've worked around clinical settings (ED) too. Some folks are confused. Some folks are unconscious. Some folks are otherwise incapacitated or uncommunicative. How about not tarring them all as "liars?
 
No tarring implied hare, just stating facts. The software does allow for unknown persons to be treated. For example, the A&E department might get a call from an ambulance crew saying "We've got three casualties from the pile-up, we're on the way now, ETA ten minutes, we'll need 5 units of blood on arrival."

Our system duly allows A&E staff to book out units of blood for an unknown person of unknown age and/or gender, as it's recognised that the patient may be unknown (because right now they're in the back of an ambulance doing 70 m.p.h.) or unable to speak for him or herself. The patient may be so bad that immediate transfusion of o negative is required to save their life - the lab will do a full cross match when things have stabilised.

That's a different functional requirement, which is duly catered for.

We do get liars though - bizarre really, they jeopardise their own treatment for some strange reason. In this instance, the person is fully functional (otherwise how could they lie?) and deliberately providing false data. Obviously this is not intended to cast aspersions on those folk who are mentally ill, and may be incapable of telling wrong from right.

Regards

T
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top