Saturday, October 16, 2010

Is 99.8 % data accuracy enough?

Ripped from recent headlines, we see how even a .2% failure can have a big impact.

WASHINGTON (AP) ― More than 89,000 stimulus payments of $250 each went to people who were either dead or in prison, a government investigator says in a new report.

Let’s take a good, hard look at this story. It begins with the US economy slumping.  The president proposes and passes through congress one of the biggest stimulus packages ever. The idea is sound to many; get America working by offering jobs in green energy, shovel-ready infrastructure projects. Among other actions, the plan is to give lower income people some government money so they can stimulate the economy.

I’m not really here to praise or zing the wisdom of this. I’m just here to give the facts. In hindsight, it appears as though it hasn’t stimulated the economy as many had hoped, but that’s beside the point.

Continuing on, the government issues 52 million people on social security a check for $250. It turns out of that number nearly 100,000 people were in prison or dead, roughly 0.2% of the checks. Some checks are returned, some are cashed. Ultimately, the government loses $22.3 million on the 0.2% error.

While $22.3 million is a HUGE number, 0.2% is a tiny number.  It strikes at the heart at why data quality is so important.  Social Security spokesman Mark Lassiter said, "…Each year we make payments to a small number of deceased recipients usually because we have not yet received reports of their deaths."

There is strong evidence that the SSA is hooked up to the right commercial data feeds and have the processes in place to use them. It seems as though the social security administration is quite proactive in their search for the dead and imprisoned, but people die and go to prison all the time. They also move, get married and become independent of their parents.

If we try to imagine what it would take to achieve closer to 100% accuracy, it would take up-to-the-minute reference data. It seems that the only real solution is to put forth legislation that requires the reporting to the federal government any of these life changing events. Should we mandate the bereaved or perhaps funeral directors to report the death immediately in a central database? Even with such a law, there still would be a small percentage of checks that would be issued while the recipient was alive and delivered after the recipient is dead. We’d have better accuracy for this issue, but not 100%

While this story takes a poke at the SSA for sending checks to dead people, I have to applaud their achievement of 99.8% accuracy. It could be a lot worse America.  A lot worse.

Saturday, August 28, 2010

ERP and SCM Data Profiling Techniques

In this YouTube tutorial for Talend, I walk through some techniques for profiling ERP, SCM and materials master data using Talend Open Profiler. In addition to basic profiling, the correlation analysis feature can be used to identify relationships between part numbers and descriptions.

Monday, August 16, 2010

Data Governance and Data Quality Insider 100th

I have reached my 100th post milestone.  I hope you won't mind if I get a little introspective here and tell you a little about my social media journey over these past three years.

How did I get started?  One day back in 2007, I disagreed with Vince McBurney’s post (topic unimportant now).  I responded and Vince politely told me to shut up and if I really wanted to have an opinion to write my own blog.  I did.  Thanks for the kick in the pants, Vince.

Some of my most popular posts over these past three years have been:

  • Probabilistic Matching: Sounds like a good idea, but…
    Here, I take a swipe at the sanctity of probabilistic matching. I probably have received the most hate-mail from this post. My stance still is that a hybrid approach to matching, using both probabilistic and deterministic is key to getting match results. Probabilistic alone is not the solution.
  • Data Governance and the Coke Machine Syndrome
    I recount a parable given to me by a well-respected boss in my past about meeting management. Meetings can take unexpected turns where huge issues can be settled in minutes, while insignificant ones can eat up the resources of your company. I probably wrote it just after a meeting.
  • Data Quality Project Selection
    A posting about picking the right data quality projects to work on.
  • The “Do Nothing” Option
    A posting the recounts a lesson I learned about selling the power of data quality to management.
Somewhere around my 50th post, I was contacted by a small publishing firm in the UK about publishing a book on data governance. They liked what they saw in the blog.  I published the Data Governance Imperative in 2009. I pulled upon my experiences with some of the people I met while working in the industry. It's thanks to some of you that the book is a reality.

Blogging has not always been easy. I’ve met some opposition to along the way. There were times when my blogging was perceived as somehow threatening to corporate. At the time, blogging was new and corporations didn't know how to handle it. More companies now have definitive blogging policies and realize the positive impact it has.

What about the people I’ve met? I’ve gained a lot of friendships along the way with people I’ve yet to meet face-to-face. We’re able to build a community here in cyberspace – a data geek community that I am very fond of.  I’m hesitant to write a list because I don’t want to leave anyone out, but you know who you are.

If you're thinking of blogging, please, find something you’re passionate about and write.  You’ll have a great time!

Thursday, August 12, 2010

Change Management and Data Governance

Years ago, I worked for a large company that spent time and effort on change management. It has been popular with corporations that plan significant changes as they grow or down-size. Companies, particularly high-tech companies, use change management to be more agile and respond to rapid changes in the market.

As I read through the large amount of information on change management, I’m struck by the parallels between change management and data governance. The focus is on processes. It ensures that no matter what changes happen in a corporation, whether it’s downsizing or rapid growth, significant changes are implemented in an orderly fashion and make everyone more effective.

On the other hand, humans are resistant to change. Change management aims to gain buy-in from management to achieve the organization's goal of an orderly and effective transformation. Sound familiar? Data governance speaks to this ability to manage data properly, no matter what growth spurts, mergers or downsizing occurs. It is about changing the hearts and minds of individuals to better manage data and achieve more success while doing so.

Change Management Models
As you examine data governance models, look toward change management models that have been developed by vendors and analysts in the change management space.  One that struck my attention was the ADKAR model developed by a company called Prosci. In this model, there are five specific stages that must be realized in order for an organization to successfully change. They include:
  • Awareness - An organization must know why a specific change is necessary.
  • Desire - The organizational must have the motivation and desire to participate in the call for change.
  • Knowledge – The organization must know how to change. Knowing why you must change is not enough.
  • Ability - Every individual in the company must implement new skills and processes to make the necessary changes happen.
  • Reinforcement - Individuals must sustain the changes, making them the new behavior, averting the tendency to revert back to their old processes.
These same factors can be applied when assessing how to change our own teams to manage data more effectively.  Positive change will only come if you work on all of these factors.

I often talk about business users and IT working together to solve the data governance problem. By looking at the extensive information available on change management, you can learn a lot about making changes for data governance.

Monday, August 9, 2010

Data Quality Pro Discussion

Last week I sat down with Dylan Jones of DataQualityPro.com to talk about data governance. Here is the replay. We discussed a range of topics including organic governance approaches, challenges of defining data governance, industry adoption trends, policy enforcement vs legislature and much more.

Link

Friday, July 30, 2010

Deterministic and Probabilistic Matching White Paper

I’ve been busy this summer working on a white paper on record matching, the result of which is available on the Talend web site here.

The white paper is sort of a primer containing elementary principles of record matching,  As the description says, it outlines the basic theories and strategies of record matching. It describes the nuances of deterministic and probabilistic matching and the algorithms used to identify relationships within records. It covers the processes to employ in conjunction with matching technology to transform raw data into powerful information that drives success in enterprise applications like CRM, data warehouse and ERP.

Wednesday, July 28, 2010

DGDQI Viewer Mail

From time to time, people read my blog or book and contact me to chat about data governance and data quality. I welcome it. It’s great to talk to people in the industry and hear their concerns.

Occasionally, I see things in my in-box that bother me, though.  Here is one item that I’ll address in a post. The names have been changed to protect the innocent.

A public relations firm asked:

Hi Steve,
I wonder if you could answer these questions for me.
- What are the key business drivers for the advent of data governance software solutions?
- What industries can best take advantage of data governance software solutions?
- Do you see cloud computing-based data governance solutions developing?

I couldn’t answer these questions, because they all pre-supposed that data governance is a software solution.  It made me wonder if I have made myself clear enough on the fact that data governance is mostly about changing the hearts and minds of your colleagues to re-think their opinion of data and its importance.  Data governance is a company’s mindful decision that information is important and they’re going to start leveraging it. Yes, technology can help, but a complete data governance software solution would have more features than a Workchamp XL Swiss Army Knife. It would have to include data profiling, data quality, data integration, business process management, master data management, wikis, a messaging platform, a toothpick and a nail file in order to be complete. 

Can you put all this on the cloud?  Yes.  Can you put the hearts and minds of your company on a cloud?  If only it were that easy...

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.