Friday, August 14, 2009

My Interview with Ajay Ohri

Ajay asks me some great questions over at DecisionStats.

Tuesday, July 21, 2009

Data Quality – Technology’s Prune

Prunes. When most of us think of prunes, we tend to think of a cure for older people suffering from constipation. In reality, prunes are not only sweet but are also highly nutritious. Prunes are a good source of potassium and a good source of dietary fiber. Prunes suffer from a stigma that’s just not there for dried apricots, figs and raisins, which have a similar nutritional benefit and medicinal benefit. Prunes suffer from bad marketing.

I have no doubt that data quality is considered technology’s prune by some. We know that information quality is good for us, having many benefits to the corporation. It also can be quite tasty in its ability to deliver benefit, yet most of our corporations think of it as a cure for business intelligence constipation – something we need to “take” to cure the ills of the corporation. Like the lowly prune, data quality also suffers from bad marketing.

In recent years, prune marketers in the United States have begun marketing their product as "dried plums” in an attempt to get us to change the way we think about them. Commercials show the younger, soccer Mom crowd eating the fruit and being surprised at its delicious flavor. It may take some time for us to change our minds about prunes. I suppose if Lady Gaga or Zac Efron would be spokespersons, prunes might have a better chance.

The biggest problem in making data quality beloved by the business world is that it’s well… hard to explain. When we talk about it, we get crazy with metadata models and profiling metrics. It’s great when we’re communicating among data professionals, but that talk tends to plug-up business users.

In my recent presentations and in recent blog posts, I’ve made it clear that it’s up to us, the data quality champions, to market data quality, not as a BI laxative, but as a real business initiative with real benefits. For example:

  • Take a baseline measurement and track ROI, even if you think you don’t have to
  • If the project has no ROI, you should not be doing it. Find the ROI by asking the business users of the data what they use it for.
  • Aggregate and roll-up our geeky metrics of nulls, accuracy, conformity, etc into metrics that a business user would understand – like according to our evaluation, 86.4% of our customers are fully reachable by mail.
  • Create and use the aggregated scores similar to the Dow Jones Industrial Average. Publish them at regular intervals. To raise awareness of the data quality, talk about why it’s up and talk about why it has gone down.
  • Have a business-focused elevator pitch ready when someone asks you what you do. “My team is saving the company millions by ensuring that the ERP system accurately reflects inventory levels.”
Of course there's more. There’s more to this in my previous blog posts, yet to come in my future blog posts, and in my book The Data Governance Imperative. Marketing the value of data quality is just something we all need to do more of. Not selling the business importance of data quality... it’s just plum-crazy!

Monday, July 13, 2009

Data Quality Project Selection

What if you have five data intensive projects that are all in need of your very valuable resources for improving data quality? How do you decide where to focus? The choice is not always clear. Management may be interested in accurate reporting from your data warehouse, but revenue may be at stake in other projects. So, just how do you decide where to start?

To aid in a choice between projects, it may help to plot your projects on a “Project Selection Quadrant” as I’ve shown here. The quadrant chart plots the difficulty of completing a project versus the value it brings to the organization.




















Project Difficulty
To find the project on the X axis, you must understand how your existing system is being used; how various departments use it differently; and if there are special programs or procedures that impact the use of the data. To predict project length, you have to rely heavily on your understanding your organization's goals and business drivers.

Some of the things that will affect project difficulty:
Access to the data – do you have permission to get the data?
Window of opportunity – how much time do you have between updates to work on the data
Number of databases – more databases will increase complexity
Languages and code pages – is it English or Kanji? Is it ASCII or EBCDIC? If you have mixed languages and code pages, you may have more work ahead of you
Current state of data quality – The more non-standard your data is to begin with, the harder the task
Volume of data – data standardization takes time and the more you have, the longer it’ll take
Governance, Risk and Compliance mandates – is your access to the data stopped by regulation?

Project Value
For assessing project value (the Y axis), there is really one thing that you want to look at – money. It comes from your discussions with the business users around their ability to accomplish things like:
• being able to effectively reach/support customers
• call center performance
• inventory and holding costs
• exposure to risk such as being out of compliance with any regulations in your industry
• any business process that is inefficient because of data quality

The Quadrants
Now that you’ve assessed your projects, they will naturally fall into the following quadrants:

Lower left: The difficult and low value targets. If management is trying to get you to work on these, resist. You’ll never get anywhere with your enterprise-wide appeal by starting here.
Lower right: These may be easy to complete, but if they have limited value, you should hold off until you have complete corporate buy-in for an enterprise-wide data quality initiative.
Upper left: Working on high value targets that are hard complete will likely only give your company sticker shock when you show them the project plan. Or, they may run into major delays and be cancelled altogether. Again, proceed with caution. Make sure you have a few wins under your belt before you attempt.
Upper right: Ah, low-hanging fruit. Projects that are easier to complete with high value are the best places to begin. As long as you document and promote the increase in value that you’ve delivered to the company, you should be able to leverage these wins into more responsibility and more access to great projects.

Keeping an eye on both the business aspect of the data, its value, and the technical difficulty in standardizing the data will help you decide where to go and how to make your business stronger. It will also ensure that you and your business co-workers to understand the business value of improving data quality within your projects.

Monday, July 6, 2009

June’s "El Festival del IDQ Bloggers”


A Blog Carnival for Information/Data Quality Bloggers

June of 2009 is gone, so it’s time to look back at the month and recognized some of the very best data quality blog entries. Like other blog carnivals, this one is a collection of posts from different blogs on a specific theme.

If you’re a blogger and you missed out on this month’s data quality carnival, don’t worry. You can always submit your brilliant entries next month. So, here they are, in no particular order.


  • Newcomer Jeremy Benson has a unique perspective of being an actuary – someone who deals with the financial impact of risk and uncertainty to a business. We know that improving data quality will certainly produce more accurate assessments when it comes to crunching numbers and calculating risk. This month’s blog entry describes how data quality is important to predictive modeling. More actuaries should understand the importance of data quality, so this is a positive step.

  • Irish information quality expert Daragh O Brien was talking about his marriage problems this month – well, at least the data quality problems with his recording of his marriage. In this post he discusses a recent experience and how it made him think yet again about the influence of organizational culture and leadership attributes on information quality success and change management.


  • Western Australian blogger Vince McBurney contributes his excellent analysis of the new Gartner Magic Quadrant for data quality tools. Vince’s analysis of the LAST Magic Quadrant (two years ago) was perhaps my biggest inspiration for getting involved in blogging, so it makes me happy to include his blog. “Tooling Around on the IBM InfoSphere” is focused on data integration topics from the perspective of an expert in the IBM suite of software tools.

  • Jim Harris takes us into “The Data-Information Continuum” to remind us that data quality is usually both objective and subjective, making reaching the “single version of truth” more mystical. The post made it clear to me that our description of the data quality problem is evolving, and the language we must use to promote our successes must evolve, too.


  • Dalton Cervo is the Customer Data Quality Lead at Sun Microsystems and a member of the Customer Data Governance team at Sun. Dalton takes us on a journey of depuplicating a customer database using a popular data quality tool. It’s great to see the detail of project like this so that we can better understand the challenges and benefits of using data quality tools.


Thanks to all the outstanding data quality bloggers this month!

Thursday, June 25, 2009

Evil Dictators: You Can’t Rule the World without Data Governance

Buried in the lyrics of one of my favorite heavy metal songs are these beautiful words:

Now, what do you own the world? How do you own disorder, disorder? – System of the Down, Toxicity


System of the Down’s screamingly poetic lyrics reminds us of a very important lesson that we can take into the business. After all, it is the goal of many companies to “own their world”. If you’re Coke, you want to dominate over Pepsi. If you’re MacDonald’s, you want to crush Burger King. Yet to own competitive markets, you have to run your business with the utmost efficiency. Without data governance, or at least enterprise data quality initiatives, you won’t have that efficiency.

Your quest for world domination will be in jeopardy in many ways without data governance. If your evil world domination plan is to buy up companies, poor data quality and lack of continuity will prevent you from creating a unified environment after the merge. On the day of a merger, you may be asked to produce, one list of products, one list of customers, one list of employees, and one accurate financial report. Where is that data going to come from if it is not clean all over your company? How will the data get clean without data governance?

Data governance brings order to the business units. With order comes the ability to own the information of your business. The ownership brings the ability to make effective and timely decisions. In large companies, whose business units may be warring against each other for sales and control of the information, it’s impossible to own the chaos. It’s difficult to make good decisions and bring order to your people. If you want to own your market, you must have order.

Those companies succeeding in this data-centric world are treating their data assets just as they would treat cold, hard cash. With data governance, companies strive to protect their vast ecosystem of data like it is a monetary system. It can't be the data center's problem alone; it has to be everyone's responsibility throughout the entire company.

Data governance is the choice of CEOs and benevolent dictators, too. The choice about data governance is one about hearing the voices of your people. It's only when you harmonize the voices of technologists, executives and business teams that allow you produce a beautiful song; one that can bring your company teamwork, strategic direction and profit. When you choose data governance, you choose order, communication and hope for your world.

So megalomaniacs, benevolent dictators and CEOs pay heed. You can’t own the world without data governance.

Friday, June 19, 2009

Get your Submissions in for the June Blog Carnival for Information/Data Quality Bloggers

I’m pleased to be hosting the June edition of "El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers". If you are a data quality blogger, please feel free to submit your best blog entries today.

This blog carnival is simply a collection of posts from different data quality blogs. Anyone can submit a data quality blog post and get the benefits of extra traffic, networking with other bloggers and discovering interesting posts. The only requirement is that the submitted post has a data quality theme.

This will be the JUNE issue of the carnival, so your submissions must have been posted in June. To qualify, you should e-mail your submission to: blogcarnival@iaidq.org – your email should include:
• URL of the blog post being submitted
• Brief description of the blog (not the post, the blog)
• Brief description of the author
• Optional – URL of an author profile (e.g. LinkedIn, Twitter)

Not all entries will make it into the issue, but don’t be discouraged. Keep submitting to future issues and we’ll get you next month.
For more information: see the IAIDQ web page

Friday, June 12, 2009

Interview on Data Quality Pro.com

From Data Quality Pro.com

If you are active within the data quality and data governance community then chances are you will have come across Steve Sarsfield and his Data Governance and Data Quality Insider blog.
Steve has also recently published an excellent book, aptly titled "The Data Governance Imperative" so we recently caught up with him to find out more about some of the topics in the book and to pose some of the many questions organisations face when launching data governance initiatives.


Read the interview>>


Plus, at the end of the interview we provide details of how to win a copy of "The Data Governance Imperative".


Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.