Wednesday, August 31, 2011

Top Ten Root Causes of Data Quality Problems: Part Five

Part 5 of 5: People Issues
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them.  Companies rely on data to make significant decisions that can affect customer service, regulatory compliance, supply chain and many other areas. As you collect more and more information about customers, products, suppliers, transactions and billing, you must attack the root causes of data quality. 

Root Cause Number Nine: Defining Data Quality

More and more companies recognize the need for data quality, but there are different ways to   clean data and improve data quality.   You can:
  • Write some code and cleanse manually
  • Handle data quality within the source application
  • Buy tools to cleanse data
However, consider what happens when you have two or more of these types of data quality processes adjusting and massaging the data. Sales has one definition of customer, while billing has another.  Due to differing processes, they don’t agree on whether two records are a duplicate.

Root Cause Attack Plan
  • Standardize Tools – Whenever possible, choose tools that aren’t tied to a particular solution. Having data quality only in SAP, for example, won’t help your Oracle, Salesforce and MySQL data sets.  When picking a solution, select one that is capable of accessing any data, anywhere, at any time.  It shouldn't cost you a bundle to leverage a common solution across multiple platforms and solutions.
  • Data Governance – By setting up a cross-functional data governance team, you will have the people in place to define a common data model.

Root Cause Number Ten: Loss of Expertise

On almost every data intensive project, there is one person whose legacy data expertise is outstanding. These are the folks who understand why some employee date of hire information is stored in the date of birth field and why some of the name attributes also contain tax ID numbers. 
Data might be a kind of historical record for an organization. It might have come from legacy systems. In some cases, the same value in the same field will mean a totally different thing in different records. Knowledge of these anomalies allows experts to use the data properly.
If you encounter this situation, there are some business processes you can follow.

Root Cause Attack Plan
  • Profile and Monitor – Profiling the data will help you identify most of these types of issues.  For example, if you have a tax ID number embedded in the name field, analysis will let you quickly spot it. Monitoring will prevent a recurrence.
  • Document – Although they may be reluctant to do so for fear of losing job security, make sure experts document all of the anomalies and transformations that need to happen every time the data is moved.
  • Use Consultants – Expert employees may be so valuable and busy that there is no time to document the legacy anomalies. Outside consulting firms are usually very good at documenting issues and providing continuity between legacy and new employees.

This post is an excerpt from a white paper available here. More to come on this subject in the days ahead.

See also:


No comments:

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.