Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Quality Audit - Done one?? 1

Status
Not open for further replies.

ad2

Technical User
Dec 31, 2002
186
US
Hi,

Has anyone done an audit on the quality of the data in your system(s)? Was the result what you thought it would be? Were errors cofined to one area (customer name/address data) or accross the system? What audit method was used.

Was the effort worth the time and cost? What tips or insights can you give on your experience?

My company is thinking of doing an audit.
 
ad2,

What kind and volume of data do you have? This may help folks give you better tips.

First, let me say I have always performed checks on my data for logic, nullness/non-nullness, etc., BUT - I'm pretty much a rank amateur who works with small databases (the largest had ~65,000 records). However, I'm going to offer my two cents worth, anyway.

I check for dates that are in the future when they should not be (Mr. Smith resigned on 9/15/2025), or too distant in the past (Mr. Bill Gates founded his company in May 1911). I check for null fields that should have data, or fields that have data when they should not. An example of the latter might be looking for data in the "Severence Pay" field when there is no data in the "Date Laid Off" field.

Which checks you need depends on your situation. For example, I live in North Carolina where every ZIP Code begins with 27 or 28. This makes for an easy check to see if any NC ZIP's are wrong. Reversing the query can find ZIP's beginnign with 27 or 28 where something other than NC is recorded as the state. You get the picture.

Basically, I run various queries looking for logical and complete data, which usually involves comparing one or more fields (e.g., Date Fired should be AFTER Date Hired...) I also sort by various columns to check for Max and Min values to see if they are within appropriate ranges.

If you wait a bit, someone who knows much more than me is likely to post.

Best of luck!
Tim

[blue]_____________________________________________________
If you need immediate assistance, please raise your hand.
If you are outside of Raleigh, raise your hand and say
[/blue] [red]Ooh! Ooh![/red]
 
Thanks Tim,

I posted to get others perspective on the issue and you've been helpful. Sounds like you have on going procedures in place to cleanse data, very good idea. We do likewise, but my merger two databases in the near future and were thinking that an audit on each before hand would be wise.
 
Hello,
I'm sure you're going to get a ton of opinions, so I'll start with a couple of mine.
1. The quality of your data is usually controlled by how much you application "edits" the data before allowing insertion or change -- and the whims of your user base. No edits in a free-form field will result in words being misspelled exponentially based on the number of characters!
2. Quality should not be a phase we decide to undertake on Wednesdays or the first of the month. It needs to be a constant battle.
3. Standards are good if they make sense and can be implemented. Great standards that can never be implemented are great wastes of storage space.
4. I learned long ago that I am basically a lazy person when it comes to some tasks. If you're going to perform an audit, I'll bet you the importance of doing it again will be proportional to how much junk you find this time. I usually write routines to allow me to repeat the process. Even if you plug all of the holes in the code that allows bad data, it would be easier to confirm next time (thus prop your feet up while your code runs).
5. I assume your data is like most peoples and it flows in various forms from one application to another. Some applications may not lend themselves to too many edits (third-party that you can't control), but let's hope those are at the end of the life cycle.
6. The cost is a gray area -- it largely depends on the impact of bad data. If you are a marketing firm and your brochures are always going out with typos, lost business opportunities = lost revenues. Consider the audience for your data. If it's only for your personal use, who cares? If it's a report to your boss, he just may think your pay should be proportional to the quality. Heaven forbid if your data was submitted to your boss and then he sent it to his boss without looking it over. You probably know of some things that flow downhill.

In summary (IMHO), yes quality is very important and deserves the attention of all workers -- not just a few.
Good Luck!
Wayne


35+ years of 'progress' -- can't we all just go wire boards again?
 
Thanks Wayne, very good points, especially "write routines to allow me to repeat the process", yes, yes.
 
While "Audits" arfe reasonable and (sigh ... moan ... gumble .. snarl ... gnash teeth .. et al) they mostly can and should be replaced with ye good olde V&V (VALIDATION AND VERIFICATION!!!!!!!!!!!!!!!!!!) prior to the adoption acceptance and integration of the 'sttuuuffff'.

If you know the rule to audit the data, why the )(*&#)$(*&#)(*$& DIDN'T YOU APPLY IT TO THE sttuuuffff BEFORE it got into the db?

First of all, the Audit is just going to fix the damage that might get done in the future, but doesn't help the poor manager who relied opn the bogus stttuuuufffffff you fed him for the period between audits.

Secondly, doing the V&V, the rules are applied to a record at a time, so there is no snooze alarm program running for the (entire?) afternoon, possibly tying up the system and almost certainly at least slowing dow the entire network.

Thirdly, V&V items can (and -for me- usually are slipped into the processing as a series of minor updates to the code as situations are noted, along with the batch update of the historical info. In many instances these are not even noticed by data entry and seldom noticed by even most middle tier users. I do take the time to detail the change, including the number of records affected and at least a cursory impact report to department heads. Using this approach, I seldom get negative feedback and have had to 're-state' reports only once in the last decade or so.

I will not have a 'finally', as I'm sure that there are (or certainly should be) many other points which can be made in support of this approach (vs the barn door after ... )





MichaelRed


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top