Friday, April 29, 2011

Open Source and Data Quality

My latest video on the Talend Channel about data quality and open source.


This was filmed in the Paris office in January. I can get excited in any time zone when it comes to data quality.

Monday, April 25, 2011

Data Quality Scorecard: Making Data Quality Relevant

Most data governance practitioners agree that a data quality scorecard is an important tool in any data governance program. It provides comprehensive information about quality of data in a database, and perhaps even more importantly, allows business users and technical users to collaborate on the quality issue.

However, there are multiple levels of metrics that you should consider. There are:

METRIC CLASSIFICATION
EXAMPLES
1
Metrics that the technologists use to fix data quality problems

7% of the e-mail attribute is blank. 12% of the e-mail attribute does not follow the standard e-mail syntax. 13% of our US mail addresses fail address validation.
2
Metrics business people use to make decisions about the data
9% of my contacts have invalid e-mails.  3% have both invalid e-mails and invalid addresses.
3
Metrics managers use to get a big picture
This customer data is good enough to use for a campaign.

All levels are important for the various members of the data governance team.  Level one shows the steps you need to take to fix the data.  Level two shows context to the task at hand. Level three tells the uniformed about the business issue without having to dig into the details.

So, when you’re building your DQ metrics, remember to roll-up the data into metrics into slightly higher formulations. You must design the scorecards to meet the needs of the interest of the different audiences, from technical through to business and up to executive. At the beginning of a data quality scorecard is information about data quality of individual data attributes. This is the default information that most profilers will deliver out of the box. As you aggregate scores, the high-level measures of the data quality become more meaningful. In the middle are various score sets allowing your company to analyze and summarize data quality from different perspectives. If you define the objective of a data quality assessment project as calculating these different aggregations, you will have much easier time maturing your data governance program. The business users and c-level will begin to pay attention.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.