Monday, July 13, 2009

Data Quality Project Selection

What if you have five data intensive projects that are all in need of your very valuable resources for improving data quality? How do you decide where to focus? The choice is not always clear. Management may be interested in accurate reporting from your data warehouse, but revenue may be at stake in other projects. So, just how do you decide where to start?

To aid in a choice between projects, it may help to plot your projects on a “Project Selection Quadrant” as I’ve shown here. The quadrant chart plots the difficulty of completing a project versus the value it brings to the organization.




















Project Difficulty
To find the project on the X axis, you must understand how your existing system is being used; how various departments use it differently; and if there are special programs or procedures that impact the use of the data. To predict project length, you have to rely heavily on your understanding your organization's goals and business drivers.

Some of the things that will affect project difficulty:
Access to the data – do you have permission to get the data?
Window of opportunity – how much time do you have between updates to work on the data
Number of databases – more databases will increase complexity
Languages and code pages – is it English or Kanji? Is it ASCII or EBCDIC? If you have mixed languages and code pages, you may have more work ahead of you
Current state of data quality – The more non-standard your data is to begin with, the harder the task
Volume of data – data standardization takes time and the more you have, the longer it’ll take
Governance, Risk and Compliance mandates – is your access to the data stopped by regulation?

Project Value
For assessing project value (the Y axis), there is really one thing that you want to look at – money. It comes from your discussions with the business users around their ability to accomplish things like:
• being able to effectively reach/support customers
• call center performance
• inventory and holding costs
• exposure to risk such as being out of compliance with any regulations in your industry
• any business process that is inefficient because of data quality

The Quadrants
Now that you’ve assessed your projects, they will naturally fall into the following quadrants:

Lower left: The difficult and low value targets. If management is trying to get you to work on these, resist. You’ll never get anywhere with your enterprise-wide appeal by starting here.
Lower right: These may be easy to complete, but if they have limited value, you should hold off until you have complete corporate buy-in for an enterprise-wide data quality initiative.
Upper left: Working on high value targets that are hard complete will likely only give your company sticker shock when you show them the project plan. Or, they may run into major delays and be cancelled altogether. Again, proceed with caution. Make sure you have a few wins under your belt before you attempt.
Upper right: Ah, low-hanging fruit. Projects that are easier to complete with high value are the best places to begin. As long as you document and promote the increase in value that you’ve delivered to the company, you should be able to leverage these wins into more responsibility and more access to great projects.

Keeping an eye on both the business aspect of the data, its value, and the technical difficulty in standardizing the data will help you decide where to go and how to make your business stronger. It will also ensure that you and your business co-workers to understand the business value of improving data quality within your projects.

Monday, July 6, 2009

June’s "El Festival del IDQ Bloggers”


A Blog Carnival for Information/Data Quality Bloggers

June of 2009 is gone, so it’s time to look back at the month and recognized some of the very best data quality blog entries. Like other blog carnivals, this one is a collection of posts from different blogs on a specific theme.

If you’re a blogger and you missed out on this month’s data quality carnival, don’t worry. You can always submit your brilliant entries next month. So, here they are, in no particular order.


  • Newcomer Jeremy Benson has a unique perspective of being an actuary – someone who deals with the financial impact of risk and uncertainty to a business. We know that improving data quality will certainly produce more accurate assessments when it comes to crunching numbers and calculating risk. This month’s blog entry describes how data quality is important to predictive modeling. More actuaries should understand the importance of data quality, so this is a positive step.

  • Irish information quality expert Daragh O Brien was talking about his marriage problems this month – well, at least the data quality problems with his recording of his marriage. In this post he discusses a recent experience and how it made him think yet again about the influence of organizational culture and leadership attributes on information quality success and change management.


  • Western Australian blogger Vince McBurney contributes his excellent analysis of the new Gartner Magic Quadrant for data quality tools. Vince’s analysis of the LAST Magic Quadrant (two years ago) was perhaps my biggest inspiration for getting involved in blogging, so it makes me happy to include his blog. “Tooling Around on the IBM InfoSphere” is focused on data integration topics from the perspective of an expert in the IBM suite of software tools.

  • Jim Harris takes us into “The Data-Information Continuum” to remind us that data quality is usually both objective and subjective, making reaching the “single version of truth” more mystical. The post made it clear to me that our description of the data quality problem is evolving, and the language we must use to promote our successes must evolve, too.


  • Dalton Cervo is the Customer Data Quality Lead at Sun Microsystems and a member of the Customer Data Governance team at Sun. Dalton takes us on a journey of depuplicating a customer database using a popular data quality tool. It’s great to see the detail of project like this so that we can better understand the challenges and benefits of using data quality tools.


Thanks to all the outstanding data quality bloggers this month!

Thursday, June 25, 2009

Evil Dictators: You Can’t Rule the World without Data Governance

Buried in the lyrics of one of my favorite heavy metal songs are these beautiful words:

Now, what do you own the world? How do you own disorder, disorder? – System of the Down, Toxicity


System of the Down’s screamingly poetic lyrics reminds us of a very important lesson that we can take into the business. After all, it is the goal of many companies to “own their world”. If you’re Coke, you want to dominate over Pepsi. If you’re MacDonald’s, you want to crush Burger King. Yet to own competitive markets, you have to run your business with the utmost efficiency. Without data governance, or at least enterprise data quality initiatives, you won’t have that efficiency.

Your quest for world domination will be in jeopardy in many ways without data governance. If your evil world domination plan is to buy up companies, poor data quality and lack of continuity will prevent you from creating a unified environment after the merge. On the day of a merger, you may be asked to produce, one list of products, one list of customers, one list of employees, and one accurate financial report. Where is that data going to come from if it is not clean all over your company? How will the data get clean without data governance?

Data governance brings order to the business units. With order comes the ability to own the information of your business. The ownership brings the ability to make effective and timely decisions. In large companies, whose business units may be warring against each other for sales and control of the information, it’s impossible to own the chaos. It’s difficult to make good decisions and bring order to your people. If you want to own your market, you must have order.

Those companies succeeding in this data-centric world are treating their data assets just as they would treat cold, hard cash. With data governance, companies strive to protect their vast ecosystem of data like it is a monetary system. It can't be the data center's problem alone; it has to be everyone's responsibility throughout the entire company.

Data governance is the choice of CEOs and benevolent dictators, too. The choice about data governance is one about hearing the voices of your people. It's only when you harmonize the voices of technologists, executives and business teams that allow you produce a beautiful song; one that can bring your company teamwork, strategic direction and profit. When you choose data governance, you choose order, communication and hope for your world.

So megalomaniacs, benevolent dictators and CEOs pay heed. You can’t own the world without data governance.

Friday, June 19, 2009

Get your Submissions in for the June Blog Carnival for Information/Data Quality Bloggers

I’m pleased to be hosting the June edition of "El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers". If you are a data quality blogger, please feel free to submit your best blog entries today.

This blog carnival is simply a collection of posts from different data quality blogs. Anyone can submit a data quality blog post and get the benefits of extra traffic, networking with other bloggers and discovering interesting posts. The only requirement is that the submitted post has a data quality theme.

This will be the JUNE issue of the carnival, so your submissions must have been posted in June. To qualify, you should e-mail your submission to: blogcarnival@iaidq.org – your email should include:
• URL of the blog post being submitted
• Brief description of the blog (not the post, the blog)
• Brief description of the author
• Optional – URL of an author profile (e.g. LinkedIn, Twitter)

Not all entries will make it into the issue, but don’t be discouraged. Keep submitting to future issues and we’ll get you next month.
For more information: see the IAIDQ web page

Friday, June 12, 2009

Interview on Data Quality Pro.com

From Data Quality Pro.com

If you are active within the data quality and data governance community then chances are you will have come across Steve Sarsfield and his Data Governance and Data Quality Insider blog.
Steve has also recently published an excellent book, aptly titled "The Data Governance Imperative" so we recently caught up with him to find out more about some of the topics in the book and to pose some of the many questions organisations face when launching data governance initiatives.


Read the interview>>


Plus, at the end of the interview we provide details of how to win a copy of "The Data Governance Imperative".


Tuesday, June 9, 2009

MIT's Information Quality Industry Symposium

This year, I am honored to be part of MIT's Information Quality Industry Symposium in Cambridge, MA. In past years I have attended this conference and have been pleased with the quality of the speakers and how informed the industry is getting about data quality. This year, my company is sponsoring the event and I will be co-presenting with my colleague Nelson Ruiz.

The speaker's list is impressive! Some of the featured speakers include very experienced practitioners like Larry English, Bill Inmon, Danette McGilvray and Gwen Thomas. Attendees will be sure to gain some insight on information quality with such a full line-up of experts.

In true MIT form, this forum has a lot of theoretical content in addition to the practical sessions. This is one of the more academic venues for researching data quality, and therefore less commercial. The presentations are interesting in that they often gave you another perspective on the problem of data quality. Some of them are clearly cutting edge.

My session entitled Using Data Quality Scorecards to Sell IQ Value will be more practical. When it comes to convincing your boss that you need to invest in DQ, how can you create metrics that will ignite their imagination? How do you get the funding... and how do you take information quality enterprise-wide.

If you have some travel budget open, please come to Boston this summer and check out this small and friendly event. As a reader of this blog, feel free to use the Harte-Hanks Trillium Software $100 discount pass when registering.

Wednesday, June 3, 2009

Informatica Acquires AddressDoctor

Global Data is Hard to Do

Yesterday, Informatica announced their intent to acquire AddressDoctor. This acquisition is all about being able to handle global data quality in today’s market, but it has a surprising potential twist. Data quality vendors have been striving for a better global solution because so many of the large data quality projects contain global data. If your solution doesn’t handle global data, it often just won’t make the cut.

The interesting twist here is that both IBM and Dataflux leverage AddressDoctor for their handling of global address data. There are several other smaller vendors that do also - MelissaData, QAS, and Datanomic. Trillium Software technology is not impacted by this acquisition. They have been building in-house technology for years to support the parsing of global data and have leveraged their parent company’s acquisition Global Address to beef up the geocoding capability of the Trillium Software System.

Informatica has handed the competition a strong blow here. Where will these vendors go to get their global data quality? In the months to come, there will be challenges to face. Informatica, still busy with integrating the disparate parts of Evoke, Similarity and Identity Systems, will now have to integrate AddressDoctor. Other vendors like IBM, Dataflux, MelissaData, QAS and Datanomic may now have to figure out what to do for global data if Informatica decides not to renew partner agreements.

For more analysis on this topic, you can read Rob Karel's blog. Read how this Forrester analyst thinks the move is to limit the choices on MDM platforms.

To be on the safe side, I’d like to restate my opinions in this blog are my own. Even though I work for Harte-Hanks Trillium Software, my comments are my independent thoughts and not necessarily those of my employer.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.