Friday, January 2, 2009

Building a More Powerful Data Quality Scorecard

Most data governance practitioners agree that a data quality scorecard is an important tool in any data governance program. It provides comprehensive information about quality of data in a database, and perhaps even more importantly, allows business users and technical users to collaborate on the quality issue.

However, if we show that 7% of all tables have data quality issues, the number is useless - there is no context. You can’t say whether it is good or bad, and you can’t make any decisions based on this information. There is no value associated with the score.

In an effort to improve processes, the data governance teams should roll-up the data into metrics into slightly higher formulations. In their book “Journey to Data Quality”, authors Lee, Pipino, Funk and Wang correctly suggest that making the measurements quantifiable and traceable provide the next level of transparency to the business. The metrics may be rolled up into a completeness rating, for example if your database contains 100,000 name and address postal codes and 3,500 records are incomplete, 3.5% of your postal codes failed and 96.5% pass. Similar simple formulas exist for Accuracy, Correctness, Currency and Relevance, too. However, this first aggregation still doesn’t support data governance, because business users aren’t thinking that way. They have processes that are supported by data and it's still a stretch figuring out why this all matters.

Views of Data Quality Scorecard
Your plan must be to make data quality scorecards for different internal audiences - marketing, IT, c-level, etc.

The aggregation might look something like this:You must design the scorecards to meet the needs of the interest of the different audiences, from technical through to business and up to executive. At the beginning of a data quality scorecard is information about data quality of individual data records. This is the default information that most profilers will deliver out of the box. As you aggregate scores, the high-level measures of the data quality become more meaningful. In the middle are various score sets allowing your company to analyze and summarize data quality from different perspectives. If you define the objective of a data quality assessment project as calculating these different aggregations, you will have much easier time maturing your data governance program. The business users and c-level will begin to pay attention.

Business users are looking for whether the data supports the business process. They want to know if the data is facilitating compliance with laws. They want to decide whether their programs are “Go”, “Caution” or “Stop” like a traffic light. They want to know whether the current processes are giving them good data so they can change them if necessary. You can only do this by aggregating the information quality results and aligning those results with business.

Tuesday, December 9, 2008

2009 MIT Information Quality Industry Symposium

This time of year, we’re all looking at our budgets and planning for 2009. I’d like to recommend an event that I’ve been participating in for the past several years – the MIT IQ symposium. It’s in my travel budget and I’m looking forward to going to this event again this year.

The symposium is a July event in Boston that is a discussion and exchange of ideas about data quality between practitioners and academicians. The goal is less commercial than you would find at a typical symposium. In the case of this MIT event, it’s more about the mission and philosophy of information quality.

Day one focuses on education, with highly qualified and very interesting speakers teaching you about enterprise architecture, data governance, business intelligence, data warehousing. and data quality. Latest methodologies, frameworks, and best practice cases are the topics. Day two, the sessions deconstruct industry-specific topics. There is a government track, healthcare track and business track. On the last day, a half day, the sessions are more about the future of information quality.

I’ve grown to really enjoy the presentations, information quality theory and hallway chat that you find here. If you have some travel budget, please consider earmarking some of it for this event.

Friday, December 5, 2008

Short Ham Rule and Data Governance

One of my old bosses, a long time IBM VP who was trained in the traditional Big Blue executive training program, used to refer to the “short ham” rule quite often. With my apologies for its lack of political correctness, the story goes something like this:

Sarah is recently married and for the first time decides to cook the Easter ham for her new extended family. Her spouse’s sisters, mother and grandmother are all coming to dinner and as a new bride, she is nervous. As the family arrives, she begins preparing it for dinner.

Sarah’s sister-in-law Debbie helps with the preparation.
As Sarah begins to put the ham into the oven, Debbie stops her. “You must cut off the back half of the ham before it goes into the oven.” she says.

Sarah was nervous, but somehow musters the courage to ask a simple question – why?
Debbie is shaken for a moment at the nerve of her new sister-in-law. How dare she question the family tradition?

Debbie pauses then says, “Well, I’m not sure. My Mom always does it. Let’s ask her why.”


When asked, Mom also hesitates. “Well, my Mom always cut off that part of the ham. I’m not sure why.”


Finally, the group turns to Grandma, who is sitting in her rocking chair listening to the discussion. By now, the entire party has heard about the outrageous boldness of Sarah. The party turns silent as the elder slowly begins to whisper her answer. “Well, I grew up in the depression and we didn’t have a pan big enough to fit the whole ham. So, we’d cut off part of it and saved it for another meal.”


Three factors in the short ham story caused change. First, Sarah’s courage to take on the project of cooking the ham started the change. Second, Sarah’s willingness to listen and learn the processes of others in the family gave her credibility in the eyes of the family. Finally, Sarah’s question – why – that created change. It was only with audacity that Sarah was able to educate and make the holiday feast more enjoyable.

The same can be said about leading your company toward of data governance. You have to have the courage to take on new projects, understand the business processes, and ask why to become an agent for change in your organization. A leader has to get past resistance and convince others to embrace new ways of doing things.

Building credibility is the key to overcoming the resistance. If you were to sit down and work for a day in the billing center, call center or purchasing agent job, for example, people there will see that you understand them and care about their processes. At the very least, you could invite a business person to lunch to understand their challenges. The hearts and minds of the people can be won if you walk a mile in their shoes.

Monday, December 1, 2008

Information Quality Success at Nectar

It’s great when you see data quality programs work. Such is the case in Europe, where Loyalty Management Group (LMG) has improved efficiency and information quality in a very large, retail-based, customer loyalty program. I hadn’t heard of Nectar all that much here in the USA, but the Nectar card is very well-known in the UK. About half of all UK households use it to earn points from everyday purchases and later redeem those points for gifts and prizes. Recently, Groupe Aeroplan purchased LMG and Nectar is now their brand.

Using the databases generated by Nectar, the company also provides database marketing and consulting services to retailers, service providers and consumer packaged goods companies worldwide. Data is really the company’s primary asset.

Nectar data
The data management effort needed to handle half the population of the UK and a good portion of Europe could be perilous. To make matters worse, data entered into the Nectar system generally comes from paper-based forms available in stores or received through mailings, online or by phoning a call center. All of these sources could produce poor data if not checked.

To gain closer business control, the company made business management responsible for data integrity rather than IT. The company also embedded the Trillium Software System in its own systems, including in real-time for online and call center applications.

At first, LMG used just the basic capabilities of the tool to ensure that at enrollment, addresses matched to UK Postcode Address File (PAF). Later, the company engaged a business-oriented data quality steward to review existing processes and propose new policy. For example, they set up various checks using Trillium Software to check for mandatory information at the point of registration. A process is now in place where the data collector is notified of missing information.

Information quality often lands and expands into an organization, once folks see how powerful it can be. In LMG’s case, the Trillium Software System is implemented to help partners match their own customer databases with the Nectar collector database. For certain campaigns, Nectar partners might want to know which individuals are on both their own customer database and on the Nectar database, or which customers are common to both. The Trillium Software System allows for this, including the process of pre-processing the partner’s data where necessary, to bring it up to a sufficient standard for accurate matching.

You can download the whole story on LMG here.

Sunday, November 23, 2008

Picking the Boardwalk and Park Place DQ Projects

This weekend, I was playing a game of Monopoly with my kids. Monopoly is the ultimate game of capitalism. It’s a great way to teach a young one about money. (Given the length of the game, a single game can be a weekend long lesson.) The companies that we work for are also playing the capitalism game. So, it’s not a stretch that there are lessons to be learned while playing this game.

As I took in hefty rents from Pacific Ave, I could see that my daughter was beginning to realize that it’s really tough to win if you buy low-end properties like Baltic and Mediterranean, or any of the properties on that side of the board. Even with hotels, Baltic will only get you $450. It’s only with the yellow, green and blue properties that you can really make an impression on your fellow players. She got excited by finally getting a hold of Boardwalk and Park Place.

Likewise, it’s difficult to win at the data governance game if you pick projects that have limited upside. The tendency might be to fix the data of the business users who are complaining the most or those that the CEO tells you to fix. The key is to keep capitalism and the game of monopoly in mind when you pick projects.

When you begin picking high value targets with huge upside potential, you’ll begin to win at the data governance game. People will stand up and notice when you begin to bring in the high-end returns that Boardwalk and Park Place can bring in. You’ll get better traction in the organization. You’ll be able to expand your domain across Ventnor, St. James Place, gathering up other clean data monopolies.

This is the tactic that I’ve see so many successful data governance initiatives take at Trillium Software. The most successful project managers are also good marketers, promoting their success inside the company. And if no one will listen inside the company, they promote it to trade journals, analysts and industry awards. There’s nothing like a little press to make the company look up and notice.

So take the $200 you get from passing GO and focus on high value, high impact projects. When you land on Baltic, pass it by, at least at first. By focusing on the high impact data properties, you’ll get a better payoff in the end.

To hear a few more tips, I recommend the webinar by my friend Jim Orr at Trillium Software. You can listen to his webinar here.

Wednesday, November 19, 2008

What is DIG?

In case you haven’t heard, financial services companies are in a crunch time right now. Some say the current stormy conditions are unprecedented. Some say it’s a rocky time, but certainly manageable. Either way, financial service companies have to be smarter than ever in managing risk.

That’s what DIG is all about, helping financial services companies manage risk from their data. It's a new solution set from Trillium Software.

In Europe, BASEL II is standard operating procedure at many financial services companies and the US is starting to come on board. BASEL II is complex, but includes mandates for increased transparency of key performance indicators, such as probability of default (PD) and of Loss Given Default (LGD) to better determine Exposure At Default (EAD). Strict rules on capital risks reserve provisions penalize those institutions highly exposed to risk and those unable to provide ‘provably correct’ analysis of their risk position.

Clearly, the lack of risk calculations had something to do with the situation that banks are in today. Consider all the data that it takes to make a risk compliance calculation: customer credit quality measurements, agency debt ratings, accounts receivables, and current market exposures. When this type of data is spread out over multiple systems, it introduces risk that can shake the financial world.

To comply with BASEL II, financial services companies and those who issue credit have to be smarter than ever in managing data. Data drives decision-making and risk calculation models. For example, let’s say you’re a bank and you’re calculating the risk of your debtors. You enrich your data with Standard & Poor's ratings to understand the risk. But if the data is non-standardized, you may have a hard time matching the Standard & Poor's data to your customer. If not found, a company with a AA- bond rating might default as BB- in the database. After all, it is prudent to be conservative if you don’t know the risk. But that error can cause thousands, even millions to be set unnecessarily aside. These additional capital reserves can be a major drag on the company.

With the Data Intelligence and Governance (DIG) announcement from Trillium Software, we’re starting to leverage our enterprise technology platform to fix the risk rating process to become proactive participants in the validation, measurement, and management of all data fed into risk models. The key is to establish a framework for the context of data and best practices for enterprise governance. When we leverage our software and services to work on key data attributes and set up rules to ensure the accuracy of data, it could work to save the financial services companies a ton of money.

To support DIG, we’ve brought on board some additional financial services expertise. We’ve revamped our professional services and are working closely with some of our partners on the DIG initiative. We’ve also been updating our software, like our data quality dashboard, TS Insight, to help meet financial services challenges. For more information, see the DIG page on the Trillium Software web site.

Wednesday, November 12, 2008

The Data Governance Insider - Year in Review

Today is the one year anniversary of this blog. We’ve covered some interesting ground this year. It’s great to look back and to see if the thoughts I had in my 48 blog entries made any sense at all. For the most part, I’m proud of what I said this year.


Probably the most controversial entries this year were the ones on probabilistic matching. This was where I pointed out some of the shortcomings of the probabilistic technique to matching data. Some people read and agreed. Others added their dissension.


Visitors seemed to like the entry on approaching data intensive projects with data quality in mind. This is a popular white paper on Trilliumsoftware.com, too. We'll have to do more of those nuts and bolts articles in the year ahead.


As a data guy, I like reviewing the stats from Google Analytics. In terms of traffic, it was very slow going at first, but as traffic started to build, we were able to eke out 3,506 Visits with 2,327 of those visits unique. That means that either someone came back 1,179 times or 1,179 people came back… or some combination of the two. Maybe my mother just loves reading my stuff.


The visitors came from the places you’d expect. The top ten were United States, United Kingdom, Canada, Australia, India, Germany, France, Netherlands, Belgium, and Israel. We had a few visitors from unexpected places - one visitor from Kazakhstan apparently liked my entry on the Trillium Software integration with Oracle, but not enough to come back. A visitor from the Cayman Islands took a breaking from SCUBA diving to read my story on the successes Trillium Software has had with SAP implementations. There's a nice webinar that we recorded that's available there. A visitor from Croatia took time to read my story about data quality on the mainframe. Even outside Croatia, the mainframe is still a viable platform for data management.


I’m looking forward to another year of writing about data governance and data quality. Thanks for all your visits!

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.