Tuesday, July 21, 2009

Data Quality – Technology’s Prune

Prunes. When most of us think of prunes, we tend to think of a cure for older people suffering from constipation. In reality, prunes are not only sweet but are also highly nutritious. Prunes are a good source of potassium and a good source of dietary fiber. Prunes suffer from a stigma that’s just not there for dried apricots, figs and raisins, which have a similar nutritional benefit and medicinal benefit. Prunes suffer from bad marketing.

I have no doubt that data quality is considered technology’s prune by some. We know that information quality is good for us, having many benefits to the corporation. It also can be quite tasty in its ability to deliver benefit, yet most of our corporations think of it as a cure for business intelligence constipation – something we need to “take” to cure the ills of the corporation. Like the lowly prune, data quality also suffers from bad marketing.

In recent years, prune marketers in the United States have begun marketing their product as "dried plums” in an attempt to get us to change the way we think about them. Commercials show the younger, soccer Mom crowd eating the fruit and being surprised at its delicious flavor. It may take some time for us to change our minds about prunes. I suppose if Lady Gaga or Zac Efron would be spokespersons, prunes might have a better chance.

The biggest problem in making data quality beloved by the business world is that it’s well… hard to explain. When we talk about it, we get crazy with metadata models and profiling metrics. It’s great when we’re communicating among data professionals, but that talk tends to plug-up business users.

In my recent presentations and in recent blog posts, I’ve made it clear that it’s up to us, the data quality champions, to market data quality, not as a BI laxative, but as a real business initiative with real benefits. For example:

  • Take a baseline measurement and track ROI, even if you think you don’t have to
  • If the project has no ROI, you should not be doing it. Find the ROI by asking the business users of the data what they use it for.
  • Aggregate and roll-up our geeky metrics of nulls, accuracy, conformity, etc into metrics that a business user would understand – like according to our evaluation, 86.4% of our customers are fully reachable by mail.
  • Create and use the aggregated scores similar to the Dow Jones Industrial Average. Publish them at regular intervals. To raise awareness of the data quality, talk about why it’s up and talk about why it has gone down.
  • Have a business-focused elevator pitch ready when someone asks you what you do. “My team is saving the company millions by ensuring that the ERP system accurately reflects inventory levels.”
Of course there's more. There’s more to this in my previous blog posts, yet to come in my future blog posts, and in my book The Data Governance Imperative. Marketing the value of data quality is just something we all need to do more of. Not selling the business importance of data quality... it’s just plum-crazy!

Monday, July 13, 2009

Data Quality Project Selection

What if you have five data intensive projects that are all in need of your very valuable resources for improving data quality? How do you decide where to focus? The choice is not always clear. Management may be interested in accurate reporting from your data warehouse, but revenue may be at stake in other projects. So, just how do you decide where to start?

To aid in a choice between projects, it may help to plot your projects on a “Project Selection Quadrant” as I’ve shown here. The quadrant chart plots the difficulty of completing a project versus the value it brings to the organization.




















Project Difficulty
To find the project on the X axis, you must understand how your existing system is being used; how various departments use it differently; and if there are special programs or procedures that impact the use of the data. To predict project length, you have to rely heavily on your understanding your organization's goals and business drivers.

Some of the things that will affect project difficulty:
Access to the data – do you have permission to get the data?
Window of opportunity – how much time do you have between updates to work on the data
Number of databases – more databases will increase complexity
Languages and code pages – is it English or Kanji? Is it ASCII or EBCDIC? If you have mixed languages and code pages, you may have more work ahead of you
Current state of data quality – The more non-standard your data is to begin with, the harder the task
Volume of data – data standardization takes time and the more you have, the longer it’ll take
Governance, Risk and Compliance mandates – is your access to the data stopped by regulation?

Project Value
For assessing project value (the Y axis), there is really one thing that you want to look at – money. It comes from your discussions with the business users around their ability to accomplish things like:
• being able to effectively reach/support customers
• call center performance
• inventory and holding costs
• exposure to risk such as being out of compliance with any regulations in your industry
• any business process that is inefficient because of data quality

The Quadrants
Now that you’ve assessed your projects, they will naturally fall into the following quadrants:

Lower left: The difficult and low value targets. If management is trying to get you to work on these, resist. You’ll never get anywhere with your enterprise-wide appeal by starting here.
Lower right: These may be easy to complete, but if they have limited value, you should hold off until you have complete corporate buy-in for an enterprise-wide data quality initiative.
Upper left: Working on high value targets that are hard complete will likely only give your company sticker shock when you show them the project plan. Or, they may run into major delays and be cancelled altogether. Again, proceed with caution. Make sure you have a few wins under your belt before you attempt.
Upper right: Ah, low-hanging fruit. Projects that are easier to complete with high value are the best places to begin. As long as you document and promote the increase in value that you’ve delivered to the company, you should be able to leverage these wins into more responsibility and more access to great projects.

Keeping an eye on both the business aspect of the data, its value, and the technical difficulty in standardizing the data will help you decide where to go and how to make your business stronger. It will also ensure that you and your business co-workers to understand the business value of improving data quality within your projects.

Monday, July 6, 2009

June’s "El Festival del IDQ Bloggers”


A Blog Carnival for Information/Data Quality Bloggers

June of 2009 is gone, so it’s time to look back at the month and recognized some of the very best data quality blog entries. Like other blog carnivals, this one is a collection of posts from different blogs on a specific theme.

If you’re a blogger and you missed out on this month’s data quality carnival, don’t worry. You can always submit your brilliant entries next month. So, here they are, in no particular order.


  • Newcomer Jeremy Benson has a unique perspective of being an actuary – someone who deals with the financial impact of risk and uncertainty to a business. We know that improving data quality will certainly produce more accurate assessments when it comes to crunching numbers and calculating risk. This month’s blog entry describes how data quality is important to predictive modeling. More actuaries should understand the importance of data quality, so this is a positive step.

  • Irish information quality expert Daragh O Brien was talking about his marriage problems this month – well, at least the data quality problems with his recording of his marriage. In this post he discusses a recent experience and how it made him think yet again about the influence of organizational culture and leadership attributes on information quality success and change management.


  • Western Australian blogger Vince McBurney contributes his excellent analysis of the new Gartner Magic Quadrant for data quality tools. Vince’s analysis of the LAST Magic Quadrant (two years ago) was perhaps my biggest inspiration for getting involved in blogging, so it makes me happy to include his blog. “Tooling Around on the IBM InfoSphere” is focused on data integration topics from the perspective of an expert in the IBM suite of software tools.

  • Jim Harris takes us into “The Data-Information Continuum” to remind us that data quality is usually both objective and subjective, making reaching the “single version of truth” more mystical. The post made it clear to me that our description of the data quality problem is evolving, and the language we must use to promote our successes must evolve, too.


  • Dalton Cervo is the Customer Data Quality Lead at Sun Microsystems and a member of the Customer Data Governance team at Sun. Dalton takes us on a journey of depuplicating a customer database using a popular data quality tool. It’s great to see the detail of project like this so that we can better understand the challenges and benefits of using data quality tools.


Thanks to all the outstanding data quality bloggers this month!

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.