Monday, December 21, 2009

The World is Addicted to Data (and that's good for us)

In the famous book “The Transparent Society”, we are asked to consider some of the privacy ills we will be facing as technology improves and our society gains access to more data sets. The book was groundbreaking when it was written in 1999. It imagines the emergence of groups who are more powerful because they own the data. However, as we sit here ten years later with 20/20 hindsight, it’s clear that the existence and access to specialized data sets makes our life better, not worse.

There are countless examples of this daily improvement in our lives, but some personal ones:
  • I was in the supermarket recently and per usual, there was a long line at the deli. On the other hand, there was no line at the “deli kiosk” so I gave it a try. Based on my frequent shopper card number and underlying database, the deli kiosk already knew my preferred brand and type of cheese and delicious deli meats. Ordering was a snap thanks to a database, and I didn’t even have to mispronounce “Deutschmacher” to the deli man, like I usually do.
  • For Thanksgiving, I visited some relatives that I don’t often see. My GPS led me there thanks to a geospatial database. It told me how long it was going to take based on traffic data, which is often aggregated from several sources, including road sensors and car and taxi fleets. I also was informed about all the coffee shops along the way, thanks to the data set provided by the Dunkin Donuts. Before I left, I used Google Street View and Microsoft Bing’s Birds Eye view to see what the destination looked like. Ten years ago, all of this was pretty much unheard of, but thanks to the coming together of geospatial data, real-time traffic data, satellite and airplane imagery, street view imagery, Dunkin Donuts franchise data, and small, cheap processors, my trip was fantastic.
  • Fantasy Football is a new phenomenon, made possible by data our addiction to data. We know exactly where we stand on any given Sunday as player stats are made available instantly during the games. When Wes Welker scores, I see the six points reflected on my score instantly. Companies like STATS not only cover football, but according to their web site - 234 sports.
  • For iPhone users, there are tons of data-centric applications. For example, Wait Watchers is an app that uses user submissions to generate and display a table of the current ride wait times at major theme parks throughout the world. As this information is updated by users, other users at Disney can make decisions about whether to go to Space Mountain or It’s a small world, for example.

In the corporate world, it’s much of the same and even more important to our society. Marketing teams are addicted to information from web analytics and use marketing automation tools to track the success of their programs. Operations teams track assets like computers, buildings, trucks and people with data. Sales has been and will continue to track customers with data. Finance relies on the collision of credit scores data, invoice and payment data as well as making sure they have enough money in reserves to meet regulations. Executives will continue to rely on business intelligence and data. In fact, it’s hard to find anyone in the business world who doesn’t rely on data.

Of course, much of this is anecdotal. I haven’t found any specific study on the increase in database use, but we do know from an old IDC study that the number of servers in use worldwide, presumably some used for database, has roughly doubled from 2000 to 2005. A doubling of servers, combined with a typically bigger hard drive capacity, point to higher database use.

It was difficult to imagine us here ten years ago, and it’s even more difficult to imagine where we’ll be at the beginning of 2020.  It seems to me that we'll have more opportunity to create and use information with applications on our mobile devices. The collision of iPhone/Droid devices with increasing bandwidths of 3G and 4G networks on the major mobile phone carriers tells me that data in the future will let us do things we can only imagine today.

The world is addicted to data and that bodes well for anyone who helps the world manage it. In 2010, no matter if the economy turns up or down, our industry will continue to feed the addiction to good, clean data.

Tuesday, November 10, 2009

Overcoming Objections to a Data Governance Program

You’ve created a wonderful proposal for a comprehensive data governance program. You’ve brought it up to management, but the chiefs tell you there’s just no budget for data governance. Now what?

The best thing you can do it to keep at it. It often takes time to win the hearts and minds of your company. You know that any money spent on data governance will usually come back with multipliers. It just may take some time for others to get on board. Be patient and continue to promote your quest.

Here are some ideas for thinking about your next steps for your data governance program:

Corporate Revenue
Today, companies manage spending tightly, looking at the expenses and revenue each fiscal quarter and each month to optimize the all-important operating income (revenue minus expenses equals operating income). If sales and revenue are weak, management gets miserly. On the other hand, if revenue is high and expenses are low, your high-ROI proposal will have a better chance for approval.

For many people, this corporate reality is hard to deal with. Logical thinkers would suggest that if something is broken, it should be fixed, no matter how well the sales team is performing. The people who run your business have their first priorities set on stockholder value. You too should pay attention to your company’s sales figures as they are announced each quarter. If your company has a quarterly revenue call, use it to strike when the environment for spending is right.

Cheap Wins
If there is no money to spend on information quality, there still may be potential for information quality wins for you to exploit. For example, let’s say you were to profile or make some SQL queries into your company’s supply chain system database and you found a part that has a near duplicate. So, part number “21-998 Condenser” and part number “2-1-998 Cndsr” exist as duplicated parts in your supply chain.

After verifying the fairly obvious duplicate, you can ask your friend on the procurement side how much it costs to store and hold these condensers in inventory. Then use some guerilla marketing techniques to extol the virtues of data governance. After all, if you could find this with just SQL queries, consider how much you could find with a data discovery/profiling tool. Better yet, consider how much you could find with a company-wide initiative.  In a previous blog post, I referred to this as the low-hanging fruit.

Case Studies
Case studies are a great way to spread the word about data governance. They usually contain real-world examples, often of your competitors, who are finding gold with better attention to information quality. Vendors in the data governance space will have case studies on their websites, or you can get unpublished studies by asking your sales representative.

Consider that built-in desire of your company to be competitive, and keep your Google searches and alerts tuned to what data management projects are underway at your competitors.

Analysts are another valuable source for proving your point about the virtues of data governance. Your boss may have installed his own custom spam filter against your cajoling on data governance. But he doesn’t have to take your word for it; he can listen to an industry expert.

If you own a subscription to an analyst firm, use it to sell the power of data governance. Analysts offer telephone consultations, reports and webinars to clients. These offerings may be useful to sway your team.  If you are not a client of these firms, go to the vendors. If there is a crucial report, they will often license it to offer on their website for download, particularly if it speaks well about their solution.

Data Governance Expert Sessions
This technique also falls within the category of “don’t just take my word for it.” You can find a data governance workshop from many vendors to assist your organization with developing your data quality strategies. Often conducted for a group, the session leader interacts with a group of your choosing and presents the potential for improving the efficiency of your business with data governance. As the meeting leader, you would invite both technologists and business users. Include those who are skeptical of the value a data-quality program will bring to their company; a third-party opinion may sway them. The cost is usually reasonable and it can help the group understand and share key concepts of data governance.

Guerrilla Marketing
Why not start your own personal crusade, your own marketing initiative to drive home the power of information quality? In my previous installment of the data governance blog, I offer graphics for use in your signature file to drive home the importance of IQ to your organization. Use the power of a newsletter, blog, or e-mail signature to get your message across.

Excerpt from Steve Sarsfield's book "The Data Governance Imperative"

Thursday, October 22, 2009

Book Review: Data Modeling for Business

A couple of weeks ago, I book-swapped with author Donna Burbank. She has a new book entitled Data Modeling for Business. Donna, an experienced consultant by trade, has teamed up with Steve Hoberman, a previous published author and technologist and Chris Bradley, also a consultant, for an excellent exploration of the process of creating a data model. With a subtitle like “A handbook for Aligning the Business with IT using a High-Level Data Model” I knew I was going to find some value in the swap.

The book describes in plain English the proper way to create a data model, but that simple description doesn’t do it justice. The book is designed for those who are learning from scratch – those who only vaguely understand what a data model is. It uses commonly understood concepts to describe data model concepts. The book describes the impact of the data model to the project’s success and digs into setting up data definitions and the levels of detail necessary for them to be effective. All of this is accomplished in a very plain-talk, straight-forward tone without the pretentiousness you sometimes get in books about data modeling.

We often talk about the need for business and IT to work together to build a data governance initiative. But many, including myself, have pointed to the communication gap that can exist in a cross-functional team. In order to bridge the gap, a couple of things need to happen. First, IT teams need to expand their knowledge of business processes, budgets and corporate politics. Second, business team members need to expand their knowledge of metadata and data modeling. This book provides an insightful education for the latter. In my book, the Data Governance Imperative, the goal was the former.

The book is well-written and complete. It’s a perfect companion for those who are trying to build a knowledgeable, cross-function team for data warehouse, MDM or data governance projects. Therefore, I’ve added it to my recommended reading list on my blog.

Monday, October 12, 2009

Data May Require Unique Data Quality Processes

A few things in life have the same appearance, but the details can vary widely.  For example, planets and stars look the same in the night sky, but traveling to them and surviving once you get there are two completely different problems. It’s only when you get close to your destination that you can see the difference.

All data quality projects can appear the same from afar but ultimately can be as different as stars and planets. One of the biggest ways they vary is in the data itself and whether it is chiefly made up of name and address data or some other type of data.

Name and Address Data
A customer database or CRM system contains data that we know much about. We know that letters will be transposed, names will be comma reversed, postal codes will be missing and more.  There are millions of things that good data quality tools know about broken name and address data since so many name and address records have been processed over the years. Over time, business rules and processes are fine-tuned for name and address data.  Methods of matching up names and addresses become more and more powerful.

Data quality solutions also understand what name and addresses are supposed to look like since the postal authorities provide them with correct formatting. If you’re somewhat precise about following the rules of the postal authorities, most mail makes it to its destination.  If we’re very precise, the postal services can offer discounts. The rules are clear in most parts of the civilized world. Everyone follows the same rules for name and address data because it makes for better efficiency.

So, if we know what the broken item looks like and we know what the fixed item is supposed to look like, you can design and develop processes that involve trained, knowledgeable workers and automated solutions to solve real business problems. There’s knowledge inherent in the system and you don’t have to start from scratch every time you want to cleanse it.

ERP, Supply Chain Data
However, when we take a look at other types of data domains, the picture is very different.  There isn’t a clear set of knowledge what is typically input and what is typically output and therefore you must set up processes for doing so. In supply chain data or ERP data, we can’t immediately see why the data is broken or what we need to do to fix it.  ERP data is likely to be sort of a history lesson of your company’s origins, the acquisitions that were made, and the partnership changes throughout the years. We don’t immediately have an idea about how the data should ultimately look. The data that exists in this world is specific to one client or a single use scenario which cannot be handled by existing out-of-the-box rules

With this type of data you may find the need to collaborate more with the business users of the data, who expertise in determining the correct context for the information comes more quickly, and therefore enable you to effect change more rapidly. Because of the inherent unknowns about the data, few of the steps for fixing the data are done for you ahead of time. It then becomes critical to establish a methodology for:
  • Data profiling in order to understanding what issues and challenges.
  • Discussions with the users of the data to understand context, how it’s used and the most desired representation.  Since there are few governing bodies for ERP and supply chain data, the corporation and its partners must often come up with an agreed-upon standard.
  • Setting up business rules, usually from scratch, to transform the data
  • Testing the data in the new systems
I write about this because I’ve read so much about this topic lately. As practitioners you should be aware that the problem is not the same across all domains. While you can generally solve name and address data problems with a technology focus, you will often rely more on collaboration with subject matter experts to solve issues in other data domains.

Monday, August 24, 2009

9 Questions CEOs Should Ask About Data Governance

When it comes to data governance, the one most influential power in an organization with respect to data governance is the executive team (presidents, vice presidents, managing directors, and CxOs). Sure, business users control certain aspects of the initiative and may even want to hold them back to maintain data ownership. It’s also true that the technology team is influential, but may be short on staff, short on budget and busy with projects like software upgrades. So, it sometimes falls to executives to push data governance as a strategic initiative when the vision doesn’t come from elsewhere.

It makes sense. Executives have the most to gain from a data governance program. Data governance brings order to the business, offering the ability to make effective and timely decisions. By implementing a data governance program, you can make fewer decisions based on ‘gut’ and better decisions based on knowledge. It’s an executive’s job to strive for greater control and lower risk, and that can’t be achieved without some form of data governance.

Rather than issuing edicts, a tactic of many smart executives implement is to ask questions. Questioning your IT and business teams is a form of fact-checking your decisions, understanding shortcomings in skills and resources and empowering your people. It ultimately allows your people to come to the same decision at which you may have already arrived. It is a very gracious way to manage.

Therefore asking questions about data governance is an important job of a CEO. Some of the questions you should be asking your technology leaders are as follows:



Do we have a data management strategy?

Ask the question to understand if your people have considered data governance. If you have a strategy, you should know who are the people and how are they organized around providing information to the corporation. What are the process for information in the organization?

Are we ahead or behind our competitors with regard to business intelligence and data governance?

Case studies on managing data are widely available on vendor web sites. It’s important to understand if any of your competitors are outflanking you on the efficiencies gained from data governance.

What is poor information quality costing us?

Has your technology team even considered the business impact of information quality on the bottom line, or are they just accepting these costs as standard operating procedure?

What confidence level do you have in my revenue reports?

Has your team considered the impact of information on the business intelligence and therefore the reports they are handing you?

Are we in compliance with all laws regarding our governance of data?

Executives are often culpable for non-compliance, so you should be concerned about any laws that govern the company’s industry. This holds especially true in banking and healthcare, but even in unregulated industries, organizations must comply with spam laws and “do not mail” laws for marketing, for example.

Are you working across business units to work towards data governance, or is data quality done in silos?

To provide the utmost efficiency, information quality processes should be reusable and implemented in similar manner across business units. This is done for exactly the same reason you might standardize on a type of desktop computer or software package for your business – it’s more efficient to share training resources and support to work better as a team. Taking successful processes from one business unit and extending them to others is the best strategy.

Do you have the access to data you need?

The CEO should understand if any office politics are getting in the way of ensuring that the business has the information it need. This question opens the door to that discussion.

How many people in your business unit are managing data?

To really understand if you need to a unified process for managing data, it often helps to look at the organizational chart and try to figure out how many people already manage it. A centralized strategy for data governance may actually prove more efficient.

Who owns the information in your business unit? If something goes right, who should I praise, and if something is wrong, who should I reprimand?

The business should understand who is culpable for adverse events with regard to information. If, for example, you lose revenue by sending the wrong type of customer discount offers, or if you can’t deliver your product because of problems with inventory data, there should be someone responsible. Take action if the answer cannot easily be given.

By asking these questions, you’ll open up the door to some great discussions about data governance. It should allow you to be a maverick for all of your company’s data needs. Thanks to Ajay Ohri for posing this question to me in last week’s interview; it’s something every executive should consider.

Friday, August 14, 2009

My Interview with Ajay Ohri

Ajay asks me some great questions over at DecisionStats.

Tuesday, July 21, 2009

Data Quality – Technology’s Prune

Prunes. When most of us think of prunes, we tend to think of a cure for older people suffering from constipation. In reality, prunes are not only sweet but are also highly nutritious. Prunes are a good source of potassium and a good source of dietary fiber. Prunes suffer from a stigma that’s just not there for dried apricots, figs and raisins, which have a similar nutritional benefit and medicinal benefit. Prunes suffer from bad marketing.

I have no doubt that data quality is considered technology’s prune by some. We know that information quality is good for us, having many benefits to the corporation. It also can be quite tasty in its ability to deliver benefit, yet most of our corporations think of it as a cure for business intelligence constipation – something we need to “take” to cure the ills of the corporation. Like the lowly prune, data quality also suffers from bad marketing.

In recent years, prune marketers in the United States have begun marketing their product as "dried plums” in an attempt to get us to change the way we think about them. Commercials show the younger, soccer Mom crowd eating the fruit and being surprised at its delicious flavor. It may take some time for us to change our minds about prunes. I suppose if Lady Gaga or Zac Efron would be spokespersons, prunes might have a better chance.

The biggest problem in making data quality beloved by the business world is that it’s well… hard to explain. When we talk about it, we get crazy with metadata models and profiling metrics. It’s great when we’re communicating among data professionals, but that talk tends to plug-up business users.

In my recent presentations and in recent blog posts, I’ve made it clear that it’s up to us, the data quality champions, to market data quality, not as a BI laxative, but as a real business initiative with real benefits. For example:

  • Take a baseline measurement and track ROI, even if you think you don’t have to
  • If the project has no ROI, you should not be doing it. Find the ROI by asking the business users of the data what they use it for.
  • Aggregate and roll-up our geeky metrics of nulls, accuracy, conformity, etc into metrics that a business user would understand – like according to our evaluation, 86.4% of our customers are fully reachable by mail.
  • Create and use the aggregated scores similar to the Dow Jones Industrial Average. Publish them at regular intervals. To raise awareness of the data quality, talk about why it’s up and talk about why it has gone down.
  • Have a business-focused elevator pitch ready when someone asks you what you do. “My team is saving the company millions by ensuring that the ERP system accurately reflects inventory levels.”
Of course there's more. There’s more to this in my previous blog posts, yet to come in my future blog posts, and in my book The Data Governance Imperative. Marketing the value of data quality is just something we all need to do more of. Not selling the business importance of data quality... it’s just plum-crazy!

Monday, July 13, 2009

Data Quality Project Selection

What if you have five data intensive projects that are all in need of your very valuable resources for improving data quality? How do you decide where to focus? The choice is not always clear. Management may be interested in accurate reporting from your data warehouse, but revenue may be at stake in other projects. So, just how do you decide where to start?

To aid in a choice between projects, it may help to plot your projects on a “Project Selection Quadrant” as I’ve shown here. The quadrant chart plots the difficulty of completing a project versus the value it brings to the organization.

Project Difficulty
To find the project on the X axis, you must understand how your existing system is being used; how various departments use it differently; and if there are special programs or procedures that impact the use of the data. To predict project length, you have to rely heavily on your understanding your organization's goals and business drivers.

Some of the things that will affect project difficulty:
Access to the data – do you have permission to get the data?
Window of opportunity – how much time do you have between updates to work on the data
Number of databases – more databases will increase complexity
Languages and code pages – is it English or Kanji? Is it ASCII or EBCDIC? If you have mixed languages and code pages, you may have more work ahead of you
Current state of data quality – The more non-standard your data is to begin with, the harder the task
Volume of data – data standardization takes time and the more you have, the longer it’ll take
Governance, Risk and Compliance mandates – is your access to the data stopped by regulation?

Project Value
For assessing project value (the Y axis), there is really one thing that you want to look at – money. It comes from your discussions with the business users around their ability to accomplish things like:
• being able to effectively reach/support customers
• call center performance
• inventory and holding costs
• exposure to risk such as being out of compliance with any regulations in your industry
• any business process that is inefficient because of data quality

The Quadrants
Now that you’ve assessed your projects, they will naturally fall into the following quadrants:

Lower left: The difficult and low value targets. If management is trying to get you to work on these, resist. You’ll never get anywhere with your enterprise-wide appeal by starting here.
Lower right: These may be easy to complete, but if they have limited value, you should hold off until you have complete corporate buy-in for an enterprise-wide data quality initiative.
Upper left: Working on high value targets that are hard complete will likely only give your company sticker shock when you show them the project plan. Or, they may run into major delays and be cancelled altogether. Again, proceed with caution. Make sure you have a few wins under your belt before you attempt.
Upper right: Ah, low-hanging fruit. Projects that are easier to complete with high value are the best places to begin. As long as you document and promote the increase in value that you’ve delivered to the company, you should be able to leverage these wins into more responsibility and more access to great projects.

Keeping an eye on both the business aspect of the data, its value, and the technical difficulty in standardizing the data will help you decide where to go and how to make your business stronger. It will also ensure that you and your business co-workers to understand the business value of improving data quality within your projects.

Monday, July 6, 2009

June’s "El Festival del IDQ Bloggers”

A Blog Carnival for Information/Data Quality Bloggers

June of 2009 is gone, so it’s time to look back at the month and recognized some of the very best data quality blog entries. Like other blog carnivals, this one is a collection of posts from different blogs on a specific theme.

If you’re a blogger and you missed out on this month’s data quality carnival, don’t worry. You can always submit your brilliant entries next month. So, here they are, in no particular order.

  • Newcomer Jeremy Benson has a unique perspective of being an actuary – someone who deals with the financial impact of risk and uncertainty to a business. We know that improving data quality will certainly produce more accurate assessments when it comes to crunching numbers and calculating risk. This month’s blog entry describes how data quality is important to predictive modeling. More actuaries should understand the importance of data quality, so this is a positive step.

  • Irish information quality expert Daragh O Brien was talking about his marriage problems this month – well, at least the data quality problems with his recording of his marriage. In this post he discusses a recent experience and how it made him think yet again about the influence of organizational culture and leadership attributes on information quality success and change management.

  • Western Australian blogger Vince McBurney contributes his excellent analysis of the new Gartner Magic Quadrant for data quality tools. Vince’s analysis of the LAST Magic Quadrant (two years ago) was perhaps my biggest inspiration for getting involved in blogging, so it makes me happy to include his blog. “Tooling Around on the IBM InfoSphere” is focused on data integration topics from the perspective of an expert in the IBM suite of software tools.

  • Jim Harris takes us into “The Data-Information Continuum” to remind us that data quality is usually both objective and subjective, making reaching the “single version of truth” more mystical. The post made it clear to me that our description of the data quality problem is evolving, and the language we must use to promote our successes must evolve, too.

  • Dalton Cervo is the Customer Data Quality Lead at Sun Microsystems and a member of the Customer Data Governance team at Sun. Dalton takes us on a journey of depuplicating a customer database using a popular data quality tool. It’s great to see the detail of project like this so that we can better understand the challenges and benefits of using data quality tools.

Thanks to all the outstanding data quality bloggers this month!

Thursday, June 25, 2009

Evil Dictators: You Can’t Rule the World without Data Governance

Buried in the lyrics of one of my favorite heavy metal songs are these beautiful words:

Now, what do you own the world? How do you own disorder, disorder? – System of the Down, Toxicity

System of the Down’s screamingly poetic lyrics reminds us of a very important lesson that we can take into the business. After all, it is the goal of many companies to “own their world”. If you’re Coke, you want to dominate over Pepsi. If you’re MacDonald’s, you want to crush Burger King. Yet to own competitive markets, you have to run your business with the utmost efficiency. Without data governance, or at least enterprise data quality initiatives, you won’t have that efficiency.

Your quest for world domination will be in jeopardy in many ways without data governance. If your evil world domination plan is to buy up companies, poor data quality and lack of continuity will prevent you from creating a unified environment after the merge. On the day of a merger, you may be asked to produce, one list of products, one list of customers, one list of employees, and one accurate financial report. Where is that data going to come from if it is not clean all over your company? How will the data get clean without data governance?

Data governance brings order to the business units. With order comes the ability to own the information of your business. The ownership brings the ability to make effective and timely decisions. In large companies, whose business units may be warring against each other for sales and control of the information, it’s impossible to own the chaos. It’s difficult to make good decisions and bring order to your people. If you want to own your market, you must have order.

Those companies succeeding in this data-centric world are treating their data assets just as they would treat cold, hard cash. With data governance, companies strive to protect their vast ecosystem of data like it is a monetary system. It can't be the data center's problem alone; it has to be everyone's responsibility throughout the entire company.

Data governance is the choice of CEOs and benevolent dictators, too. The choice about data governance is one about hearing the voices of your people. It's only when you harmonize the voices of technologists, executives and business teams that allow you produce a beautiful song; one that can bring your company teamwork, strategic direction and profit. When you choose data governance, you choose order, communication and hope for your world.

So megalomaniacs, benevolent dictators and CEOs pay heed. You can’t own the world without data governance.

Friday, June 19, 2009

Get your Submissions in for the June Blog Carnival for Information/Data Quality Bloggers

I’m pleased to be hosting the June edition of "El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers". If you are a data quality blogger, please feel free to submit your best blog entries today.

This blog carnival is simply a collection of posts from different data quality blogs. Anyone can submit a data quality blog post and get the benefits of extra traffic, networking with other bloggers and discovering interesting posts. The only requirement is that the submitted post has a data quality theme.

This will be the JUNE issue of the carnival, so your submissions must have been posted in June. To qualify, you should e-mail your submission to: – your email should include:
• URL of the blog post being submitted
• Brief description of the blog (not the post, the blog)
• Brief description of the author
• Optional – URL of an author profile (e.g. LinkedIn, Twitter)

Not all entries will make it into the issue, but don’t be discouraged. Keep submitting to future issues and we’ll get you next month.
For more information: see the IAIDQ web page

Friday, June 12, 2009

Interview on Data Quality

From Data Quality

If you are active within the data quality and data governance community then chances are you will have come across Steve Sarsfield and his Data Governance and Data Quality Insider blog.
Steve has also recently published an excellent book, aptly titled "The Data Governance Imperative" so we recently caught up with him to find out more about some of the topics in the book and to pose some of the many questions organisations face when launching data governance initiatives.

Read the interview>>

Plus, at the end of the interview we provide details of how to win a copy of "The Data Governance Imperative".

Tuesday, June 9, 2009

MIT's Information Quality Industry Symposium

This year, I am honored to be part of MIT's Information Quality Industry Symposium in Cambridge, MA. In past years I have attended this conference and have been pleased with the quality of the speakers and how informed the industry is getting about data quality. This year, my company is sponsoring the event and I will be co-presenting with my colleague Nelson Ruiz.

The speaker's list is impressive! Some of the featured speakers include very experienced practitioners like Larry English, Bill Inmon, Danette McGilvray and Gwen Thomas. Attendees will be sure to gain some insight on information quality with such a full line-up of experts.

In true MIT form, this forum has a lot of theoretical content in addition to the practical sessions. This is one of the more academic venues for researching data quality, and therefore less commercial. The presentations are interesting in that they often gave you another perspective on the problem of data quality. Some of them are clearly cutting edge.

My session entitled Using Data Quality Scorecards to Sell IQ Value will be more practical. When it comes to convincing your boss that you need to invest in DQ, how can you create metrics that will ignite their imagination? How do you get the funding... and how do you take information quality enterprise-wide.

If you have some travel budget open, please come to Boston this summer and check out this small and friendly event. As a reader of this blog, feel free to use the Harte-Hanks Trillium Software $100 discount pass when registering.

Wednesday, June 3, 2009

Informatica Acquires AddressDoctor

Global Data is Hard to Do

Yesterday, Informatica announced their intent to acquire AddressDoctor. This acquisition is all about being able to handle global data quality in today’s market, but it has a surprising potential twist. Data quality vendors have been striving for a better global solution because so many of the large data quality projects contain global data. If your solution doesn’t handle global data, it often just won’t make the cut.

The interesting twist here is that both IBM and Dataflux leverage AddressDoctor for their handling of global address data. There are several other smaller vendors that do also - MelissaData, QAS, and Datanomic. Trillium Software technology is not impacted by this acquisition. They have been building in-house technology for years to support the parsing of global data and have leveraged their parent company’s acquisition Global Address to beef up the geocoding capability of the Trillium Software System.

Informatica has handed the competition a strong blow here. Where will these vendors go to get their global data quality? In the months to come, there will be challenges to face. Informatica, still busy with integrating the disparate parts of Evoke, Similarity and Identity Systems, will now have to integrate AddressDoctor. Other vendors like IBM, Dataflux, MelissaData, QAS and Datanomic may now have to figure out what to do for global data if Informatica decides not to renew partner agreements.

For more analysis on this topic, you can read Rob Karel's blog. Read how this Forrester analyst thinks the move is to limit the choices on MDM platforms.

To be on the safe side, I’d like to restate my opinions in this blog are my own. Even though I work for Harte-Hanks Trillium Software, my comments are my independent thoughts and not necessarily those of my employer.

Thursday, May 21, 2009

Guiding Call Center Workers to Data Quality

Data Governance and data quality are often the domain of data quality vendors, but any technology that can help your quest to achieve better data is worth exploring. Rather than fixing up data after it has been corrupted, it’s a good idea to use preventative technologies to stop poor data quality in the first place.

I recently met with some folks from Panviva Software to talk about how the company’s technologies do just that. Panviva is considered the leader in Business Process Guidance, an emerging set of technologies that could help your company improve data quality and lower training costs on your call centers.

The technology is powerful, particularly in situations where the call center environment is complex – multiple environments mixed together. IT departments in the banking, insurance, telecommunication and high-tech industries have particularly been rattled with many mergers and acquisitions. Call center workers at those companies must be trained where to navigate and which application to use to get a customer service process accomplished. On top of that, processes may change often due to change in regulation, change in corporate policy, or the next corporate merger.

To use a metaphor, business process guidance is a GPS for your complicated call center apps.

If you think about it, the way we drive our cars has really improved over the years because of the GPS. We no longer need buy a current road map at Texaco and follow the map as far as it’ll take us. Instead, GPS technology knows where we are, what potential construction and traffic issues we may face – we simply need to tell it where we want to go. Business Process Guidance provides that same paradigm improvement for enterprise applications. Rather than forcing training on your Customer Service Representatives (CSRs) with all of its unabridged training manuals, business process guidance provides a GPS-like function that sits on top of those systems, providing context-sensitive information on where you need to go. When a customer calls into the call center, the technology combines the context of the CSR’s screens with knowledge of the company’s business processes to guide the CSR to much faster call times and lower error rates.

A case study at BT leverages Panviva technology to reduce the error rate in BT's order entry system from 30% down to 6%, an amazing 80% reduction. That’s powerful technology on the front-end of your data stream.

Sunday, May 10, 2009

Data Governance – the Movie

To really drive home the challenge of data governance in your company, you have to believe that it’s a movie, not a photo. A snapshot is taken once and done, but that’s not what happens when you embark on a data governance initiative.

In a movie, you start with a hero – that’s you the data governance champion. You have a good heart and want to fight for justice in the cruel data management world.

Next, there needs to be conflict, a dark cloud that overshadows our hero. In most cases, the conflict goes back to the beginning when your company was just starting out. Back then, your first customers may have been from your local area at first, but slowly the circle began to grow - first locally, then regionally, then nationwide, then worldwide. As new offices opened and new systems were born, the silos formed. The hero warned the company that they need a data management strategy, but no one listened. Almost no small or medium sized company thinks about data management when they’re growing up, despite the best efforts of our heroes.

When it comes time to fix it all, you can’t think of it as taking a snapshot of the data and fixing it up with Photoshop. The hero must embark on a long journey of battle and self-sacrifice to defeat evil. Corporate change, like rapid growth, mergers, downsizing, and new laws governing the corporation happens frequently in business. The battle for corporate data management requires small steps to mature the corporation into a better way of doing business. It’s Neo from the Matrix fighting Agent Smith and evolving into ‘the One”. It’s John McLane slowly taking out the bad guys in Nakatomi Plaza.

I see what’s missing in many people’s minds in reference to data governance is that concept of time. It took a long time to mess up the data in your big corporation, and it takes time to reverse it. When you select your tools and your people and your processes for data governance, you always want to keep that enterprise vision in mind. The vision has a timeline, throughout which the data champion will have unexpected issues thrown at them. It’s not about the free data cleansing software that you get with your enterprise application. That stuff won’t hold up when you try to use it once you get out of your native environment. It’s about making sure the process, the team, and the tools stand up over time, across projects, across business units and across data types. There are few and fewer vendors standing who can offer that kind of enterprise vision.

Monday, May 4, 2009

Don’t Sweat the Small Stuff, Except in Data Quality

April was a busy month. I was the project manager on a new web application, nearly completed my first German web site (also as project manager) and released the book “Data Governance Imperative”. All this real work has taken me away from something I truly love – blogging.

I did want to share something that affected my project this month, however. Data issues can come in the smallest of places and can have a huge effect on your time line.

For the web project I completed this month, the goal was to replace a custom-coded application with a similar application built within a content management system. We had to migrate log in data of users of the application, all with various access levels, to the new system.

During go live, we were on a tight deadline to migrate the data, do final testing of the new application and seamlessly switch everyone over. That all had to happen on the weekend. No one would be the wiser come Monday morning. If you’ve ever done an enterprise application upgrade, you may have followed a similar plan.

We had done our profiling and knew that there were no data issues. However when the migration actually took place, lo and behold – the old system allowed # as a character in the username and password while the new system didn’t. It forced us to stop the migration and write a rule to handle the issue. Even with this simple issue, the time line came close to missing its Monday morning deadline.

Should we have spotted that issue? Yes, in hindsight we could have better understood the system restrictions on the username and password and set up a custom business rule in the data profiler to test it. We might have even forced the users to change the # before the switch while they were still using the old application.

The experience reminds me that data quality is not just about making the data right, it’s about making the data fit for business purpose – fit for the target application. When data is correct for one legacy application, it can be unfit for others. It reminds me that you can plan and test all you want, but you have to be ready for hiccups during the go live phase of the project. The tools, like profiling, are there to help you limit the damage. We were lucky in that this database was relatively small and reload was relatively simple once we figured it all out. For bigger projects, more complete staging of the project – making dry run before the go live phase would have been more effective.

Sunday, April 19, 2009

New Book - The Data Governance Imperative

My new book entitled The Data Governance Imperative is making its way to Amazon, Barnes and Noble, and other outlets this week. I’m very proud of this and happy to see it finally hit the streets. It was a lot of work and dedication to get it done.

I decided to write this book because I saw a common recurring question that arose during discussions about data governance. How do I get my boss to believe that data governance is important? How do I work with my colleagues to build better information and a better company? How do I break through the barriers preventing data governance maturity like getting money, resources and expertise to accomplish the task? When it comes to justifying the costs of data governance to their organization, building organizational processes, learning how to staff initiatives, understanding the role and importance of technologies, and dealing with corporate politics, there is little information available.

In my years working at Trillium Software, I have been exposed to many great projects in Fortune 1000 companies worldwide. Over the years, I’ve made note of the success factors that contribute to strong data governance. I’ve seen successful strategies for data governance and the common threads to success within and across the industry.

I’ve written the Data Governance Imperative to help readers pioneer data governance initiatives, breaking through political barriers by shining a light on the benefits of corporate information quality. This book is designed to give data governance team members insight into the art of starting data governance. It could be helpful to:

  • Data governance teams – those looking for direction/validation in starting a corporate data governance initiative.
  • Business stakeholders – those working in marketing, sales, finance and other business roles who need to understand the goals and functions of a data governance team.
  • C-level executives – those looking to learn about the benefits of data governance without having to read excessive technical jargon, or even those who need to be convinced that data governance is the right thing to do.
  • IT executives – those who believe in the power of information quality but have faced challenges in convincing others in their corporation of its value.
This book does not focus on the technical aspects of data governance, although technologies are discussed. There are some great books on the technology of data governance in the market today. Some are listed on the left side of this blog in the carousel.

Thursday, April 2, 2009

Next Week’s Can’t-Miss Webinars

Presenters can either make or break a webinar. Simply put, good webinars are given by people who are passionate and knowledgeable about their topic. In order to give give up an hour of a busy day, I have to believe that it will impart some knowledge beyond product demos and brochure-ware. In looking ahead to next week, I see a couple of high points:

Data Governance: Strategies for Building Business Value
Date: Tuesday, April 14, 2009 at 11 a.m. Eastern
Trillium Software will host a Web seminar that includes featured guest speaker Rob Karel of Forrester Research presenting a discussion titled: Data Governance: Strategies for Building Business Value. If you’ve never seen Rob Karel speak, I can tell you from experience that it’s a real treat. I played emcee to a 2008 webinar with Rob on data governance. It was very well attended and very positively reviewed. At that time, the webinar concluded with a lot of great questions on selling the business case for data governance. In this session, Rob plans to tackle that topic a bit more - outlining the best practices and skills needed to obtain executive buy-in for data governance projects.

How to Boost Service, Cut Costs and Deliver Great Customer Experiences - Even in an Economic Downturn
Date: Thursday, April 16, 2009 at 11 a.m. Eastern
Teradata and the SmartData Collective will co-sponsor a webinar on dealing with a down economy. We’ve seen a couple of companies cover this topic, but the panel looks very strong. Judging from the panel and the description, this webinar looks to have a CRM-focus - how technology can help you a) provide an experience that customers will love, and; b) cut costs and help you differentiate your communications strategies from your competition. Curtis Rapp from Air2Web will be in on the discussion, so I’m guessing there will be some talk about Teradata Relationship Manager Mobile and using text messaging in your Teradata apps.

The panel of experts will include:

  • Dave Schrader, Teradata - published author and long time Teradata employee
  • Lisa Loftis, CRM and BI Expert - author on CRM topics
  • Curtis Rapp, Air2Web – the partner responsible for some of Teradata’s mobile solution (CRM on your cell phone)
  • Rebecca Bucnis, Teradata - another long-time and experienced Teradata employee
For attending, you’ll also get a white paper by Lisa Loftis called Ringing in the Customers: Harnessing the power of Mobile Marketing.

Wednesday, March 25, 2009

A Brief History of Data Quality

Believe it or not, the concept of data quality has been touted as important since the beginning of the relational database. The original concept of a relational database came from Dr. Edgar Codd, who worked for IBM in the 1960s and 70s. Dr. Codd’s ideas about relational databases, storing data in cross-referenced tables, were groundbreaking, but largely ignored at IBM where he worked. It was only when Larry Ellison grabbed onto the idea and began to have success with a little company named Oracle that IBM did finally pay attention. Today, relational databases are everywhere.

Even then, Dr Codd advised about data integrity. He wrote about:

  • Entity integrity – every table must have a primary key and the column or columns chosen to be the primary key should be unique and not null.
  • Referential integrity – consistency between coupled tables. With certain values, there are obvious relationships between tables. The same ZIP code should always refer to the same town, for example.
  • Domain integrity – defining the possible values of a value stored in a database, including data type and length. So if the domain is a telephone number, the value shouldn’t be an address.

He put everything else into something he called 'business rules' to define specific standards for your company. An example of a business rule would be for companies who store part numbers. The part number field would have a certain length and data shape – domain integrity – but also have certain character combinations to designate the category and type of part – business rules.

The point is, information quality is not something new. It was something that the database pioneers even knew theoretically in the 1970s. In the old days, when the systems were inflexible, you may have been forced to break it.

For example, a programmer who may have worked for you in the past used 99/99/9999 in a date field to designate an inactive account. It all works fine when the data is used within the single application. However, these sorts of shortcuts cause huge headaches for the data governance team as they try to consolidate and move data from silo to enterprise-wide.

To solve these legacy issues, you have to:
  • Profile data to realize that some dates contain all 9s – one of the advantages of using data profiling tools in the beginning of the process.
  • Figure out what the 9s mean by collaborating with members of the business community.
  • Plan what to do to migrate that data over to a data model that makes more sense, like having an active/inactive account table.

If you take that one example and amplify it across thousands of tables in your company, you’ll begin to understand one of the many challenges that data stewards face as they work on migrating legacy data into MDM and data governance programs.

Friday, March 20, 2009

The Down Economy and Data Integration

Vendors, writers and analysts are generating a lot of buzz about the poor economic growth conditions in the world. It’s true that in tough times, large, well-managed companies tend to put off IT purchases until the picture gets a bit rosier. Some speculate that the poor economy will affect data integration vendors and their ability to advance big projects with customers. Yet, I don’t think it will have a deep or lasting impact. Here are just some of the signs still seem to point to a strong data integration economy.

Stephen Swoyer at TDWI wrote a very interesting article that attempts to prove that data integration and BI projects are going full-steam ahead, despite a lock-down on spending in other areas.

Research from Forrester seems to suggest that IT job cuts in 2009 won’t be as steep as they were in the 2001/2002 dot com bubble burst. Forrester says that the US market for jobs in information technology will not escape the recession, with total jobs in IT occupations down by 1.2% in 2009, but the pain will be relatively mild compared with past recessions. (You have to be a Forrester customer to get this report.)

You can read the article by Doug Henschen from Intelligent Enterprise for further proof on the impact of BI and real time analytics. The article contains success stories from Wal-Mart, Kimberly-Clark and Goodyear, too.

On this topic, SAP BusinessObjects recently asked me if I’d blog about their upcoming webinar on this topic entitled: Defy the Times: Business Growth in a Weak Economy. The concept of the webinar being that you can use business intelligence and analytics to cut operating expenses and discretionary spending and improve efficiencies. It might be a helpful webinar if you’re on a data warehouse team and trying to prove your importance to management during this economic down-turn. Use vendors to help you provide third-party confirmation of your value.

So, is the poor economy threatening the data integration economy? I don’t think so. When you look at the problems of growing data volumes and the value of data integration, I don’t see how these positive stories can change any time soon. You can run out of money, but the world will never run out of data.

Sunday, March 15, 2009

Data Governance and the Coke Machine Syndrome

I was in a meeting last week and recognized the Coke Machine Syndrome, an important business parable that I learned from an old boss. All meetings can fall victim to it, not just data governance meetings. Since meeting management is so crucial to the success of a data governance initiative, you should learn to recognize it and nip it in the bud as quickly as possible.

Data Governance and the Coke Machine Syndrome
The scene is your company’s conference room. You have just presented your new plan outlining the data governance projects for the entire year. The plan outlines where you’re going to spend this year to improve data quality. Each department argues persuasively for support from the data governance team. With some significant growth goals for the coming year, marketing and sales claims they can’t make it without better data for promotions. Manufacturing obviously can’t reach new goals for efficiency without improving the data within the ERP system. And administration simply must have better data for better metrics in the data warehouse to understand the business.

After limited discussion, the budget is approved and 95% of your team’s expenses have been committed for the current budget. This part of the meeting allocating millions of dollars and takes place in about 60 minutes.

The Coke Machine
At this point, the meeting leader mentions that the company has been considering the installation of a Coke machine in this section of the building. With a few minutes left in the meeting, he asks what drinks people want in the machine.

For the next 45 minutes, the debate rages with a heightened level of intensity. Should it be placed near the stairway, or in the employee cafeteria, or in the stairwell? Should it contain Pepsi products instead of Coke? Should it contain Red Bull? Should the bottles be recyclable, and how will the recyclable materials be handled?

By the time the meeting adjourns, nearly as much time has been spent on the Coke machine as has been spent on the entire data governance budget for the year. The Coke machine discussion is an incredible waste of management time and effort.

Why does it Happen
Coke machine syndromes happen because everyone knows about Coke machines and everyone has a stake in the decision. Knowledge about the issue makes it easier to speak up about the Coke machine than it would be to speak up about a complicated issue like the budget.

Managing it
To manage the Coke machine syndrome, you must recognize it when it occurs. You can identify this syndrome whenever a small, easily understood issue begins to consume more time than it should. There is usually a full range of logical, well-supported, and totally divergent opinions of what must be done, too.

Make sure you call it what it is. In other words, label it with the term: Coke Machine Syndrome and define it for your team. When it happens, you have a short-hand term that you can use to describe what’s happening.

Before each meeting, think about what items on your meeting agenda might turn into a Coke machine syndrome. If you can recognize it, that can be a big help. Many find it helpful to conduct pre-meetings with certain team members to prepare them for simple decisions without having to vet ideas in a meeting.

Finally, if calling it the Coke machine syndrome doesn't work, just use the phrase let’s take it off-line and move on.

Monday, March 2, 2009

Top Six Traits of a Data Champion

Data champions play a crucial role in making data governance successful. The data champions are enthusiastic about the power of data and in just about every company that has successfully implemented data governance, they often lead the way.

Let's take a look at what you must do in order to lead your organization to data governance. Here are the top six characteristics:

1. Passion. Champions are passionate about data governance and promote its benefit to all whom they meet. They are the vision of data governance, developing new efficient processes and working through any issues of non-cooperation that arise. If the data champion finds him/herself losing your passion for data management, it’s time for regime change.

2. Respect. A data champion is someone who is the glue between executives, business, IT and third-party providers. The data champion role requires someone who has both technology and business knowledge – someone who can communicate with others and build relationships as needed. In a way, a data champion is a translator, translating the technologist's jargon of schemas and metadata into business value, and vice versa. To do that, you really need to understand what makes all sides tick and have the respect of the team.

3. Maven-dom. A ‘maven’ is someone who wants to solve other people's problems, generally by solving his own, according to Malcom Gladwell, author of The Tipping Point (and another good book for data champions to read). A maven’s social skills and ability to communicate are powerful tools in evangelizing data governance. A data champion needs to be socially connected and willing to reach out and to share what is known about data governance. It is not easy for some to create and maintain relationships. If you’re the type of person who prefers closing the office door to avoid others, you may not be an effective data champion.

4. Persuasiveness. One of the success traits of a good data champion is that they have vision and they can sell it. Working with others within your organization to develop a vision is important, but the data champion is the primary marketer of the vision. Successful data champions understand the power of the elevator pitch and are willing to use it to promote the data governance vision to all who will listen. The term elevator pitch describes a sales message that can be delivered in the time span of an elevator ride. The pitch should have a clear, consistent message and reflects your goals to make the company more efficient through data governance. The more effective the speech, the more interested your colleagues will become.

5. Positive Attitude. A data champion must smile and train themselves to think positively. Why? Positive thinking is contagious and your optimism will build positive energy for your project. Data champions smile and speak optimistically to give others the confidence to agree with them. As a champion, you will encounter negative people who will attempt to set up road blocks in front of you. But as long you’re optimistic and respond positively, you will inspire team members to join your quest and share in your success.

6. Leadership. A data champion is a leader above all, so studying the qualities of successful leaders will serve you well. This is a catch-all category because leadership also has many faces and traits. Before you begin to champion the cause of data governance, read books like The 21 Indispensable Qualities of a Leader: Becoming the Person Others Will Want to Follow
where author John Maxwell identifies areas for you to work on.

Those are my top six qualities of a data champion. You’ll notice that I didn’t particularly put anything about technical expertise, although it is implied in number two. That’s because being a data champion is as much about managing people and resources than it is about technical know-how.

Thursday, February 19, 2009

Syncsort and Trillium Software Partnership

When you think of Syncsort, you think of, well… sorting. SyncSort offers their flagship product - a high-performance sort utility - that has been used for years to decrease processing time for large volumes of data. In the case of multiple customer databases, for example, you may want to sort the files different ways and compare them on many different keys. Sorting on multiple keys is a very resource-intensive data processing function, so maximizing sorting speed and efficiency is crucial.
SyncSort’s sheer performance is made possible by a fast, but proprietary sorting algorithm. Because of that performance boost, many Trillium Software customers use Syncsort sorting as part of their batch data quality processes.
On the other hand, when your company is named after what you do, it’s hard to change what you do. Syncsort's DMExpress has little to do with sorting, but instead is the company's low cost ETL tool. Trillium Software recently announced connectivity between Syncsort and the Trillium Software System. Trillium Software’s fast, scalable data cleansing combined with Syncsort’s fast scalable ETL makes for a great pairing.
I’m fascinated by some of the metrics that Syncsort has posted on their web site. An independent benchmark claims that it’s the fastest ETL ever. DMExpress extracted, transformed, cleansed and loaded 5.4 TB of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds, using HP BladeSystem c-Class running RedHat. In other words, low cost hardware and record performance. It beats the big boys of ETL on many levels.
Many of the case studies I read on Syncsort’s web site are from companies who can finally afford to get rid of those slow, hand-coded ETL processes. When you reduce extraction time by over 80% in many cases, it gives you the ability to provide business intelligence that’s a lot more current, and that’s a big deal. For a quick, low cost ETL, DMExpress makes perfect sense.

Wednesday, February 11, 2009

Using Data Quality Tools to Look for Bad Guys

Most companies do not want to do business with bad guys - those on the FBI most wanted or international terrorists. Here in Boston, we’re always on the lookout for James “Whitey” Bulger, a notorious mobster who has been on the FBI most wanted list for years. But how do you really know of you’re doing business with bad guys if you don’t pay attention to data quality?
If you work for a financial organization, you may be mandated by your country's government to avoid doing business with the bad guys. The mandates have to do with the lists of terrorists offered by the European Union, Australia, Canada and the United States. For example, in the U.S., the US Treasury Department publishes a list of terrorists and narcotics traffickers. These individuals and companies are called "Specially Designated Nationals" or "SDNs." Their assets are blocked and companies in the U.S. are discouraged from dealing with them by the Office of Foreign Asset Control (OFAC). In the U.K., the Bank of England maintains a separate list but with similar restrictions.
If your company fails to identify and block a bad guy (like Whitey here), there could be real world consequences such as an enforcement action against your bank or company, and negative publicity. On the other hand, many cases may be a "false positive," where the name is similar to a bad guy's name, but the rest of the information provided by the applicant does not match the SDN list. The false positives can make for poor customer relationships.
If you have to chase bad guys in your data, you need to make data quality a prerequisite. Data quality tools can help you both correctly identify foreign nationals on the SDN list and lower the number of false positives. If the data coming into your system is standardized and has all of the required information as mandated by your governance program, matching technologies and more easily and more automatically identify SDNs, and avoid those false positives.

Saturday, January 31, 2009

Improving Communication on Data Governance Teams

If data governance is about enabling people to improve processes, your team should consider some tools to help communication between the people. Particularly if your data governance team is global, communication software can improve efficiency by working through some of the issues of a diverse team. If teams are in different time zones, it will be difficult for you to hold status meetings at a time that's convenient for all. The good news is that there are some fantastic software tools including Web 2.0 tools that can support communications in a data governance team.

I'm sure you've heard of, and used, most of these technologies. But have you considered using them on your data governance project?

Blogs are great ways to provide commentary or news on your data governance project. The writer may use text, images, and links to other blogs written by other team members to inform and foster teamwork. A blog allows for one person's perspective on the data governance project, but readers can leave comments and links to their own blogs. Blogs can educate and inform data governance groups, and they can use them to debate unresolved issues or to continue discussions between meetings.
Data governance teams could designate certain team members to blog about the problems they are trying to solve and the projects they are working on. Over time, this type of blog would help keep a record of the processes used - what works and what doesn't. It can also be used to inform data stewards, data governance constituents and other readers about how the company is working to solve data quality issues.

RSS Feeds
The problem with blogs is that you have to revisit them frequently in order to keep up on the latest news. RSS feeds are a great way to push crucial data governance information to the team benefits them by improving communication.

Wikis can hold the latest corporate data policies. Wikis can be opened up to the corporation and provide communications across the enterprise.
There are a lot wikis to choose from. Your best bet is to check out the matrix at

Let’s not forget workflow tools. Workflow software is genre of powerful tools for collaboration and should be considered to improve efficiency into your data governance process. With workflow tools, teams can manage the processes and coordination of the data governance team. The processes managed with workflow tools might include any of the following:

  • work progress of a person or group
  • business approval processes
  • challenges of specific data governance technical processes like ETL or data profiling
  • financial approval processes
Much of the work involved in data governance is meeting and discussing status. Workflow software can save of the time and human capital investment that goes into holding status meetings by covering status and progress in an application. Employees update their status on specific task while managers can see what is on schedule and what is behind.
Some examples of workflow tools include Attask, Basecamp, Clarizen, Sharepoint

Friday, January 9, 2009

Starting Your Own Personal Data Quality Crusade

As I talk to people in the industry, many folks comment on their organization's lack of interest when it comes to information quality. People have the tendency to think that responsibility for information quality starts with someone else, not themselves. In truth, we all know that information quality is the responsibility of everyone in the organization, from the call center operators to the sales force to IT and beyond.
So why not start your own personal crusade, your own marketing initiative to drive home the power of information quality? Use the power of the e-mail signature to get your message across.

Use these graphics in your signature file to drive home the important of IQ to your organization.

I may knock out a few more banners this weekend, but if you have your own ideas for a custom "Information Quality" banner, let me know and I'll post it.

Friday, January 2, 2009

Building a More Powerful Data Quality Scorecard

Most data governance practitioners agree that a data quality scorecard is an important tool in any data governance program. It provides comprehensive information about quality of data in a database, and perhaps even more importantly, allows business users and technical users to collaborate on the quality issue.

However, if we show that 7% of all tables have data quality issues, the number is useless - there is no context. You can’t say whether it is good or bad, and you can’t make any decisions based on this information. There is no value associated with the score.

In an effort to improve processes, the data governance teams should roll-up the data into metrics into slightly higher formulations. In their book “Journey to Data Quality”, authors Lee, Pipino, Funk and Wang correctly suggest that making the measurements quantifiable and traceable provide the next level of transparency to the business. The metrics may be rolled up into a completeness rating, for example if your database contains 100,000 name and address postal codes and 3,500 records are incomplete, 3.5% of your postal codes failed and 96.5% pass. Similar simple formulas exist for Accuracy, Correctness, Currency and Relevance, too. However, this first aggregation still doesn’t support data governance, because business users aren’t thinking that way. They have processes that are supported by data and it's still a stretch figuring out why this all matters.

Views of Data Quality Scorecard
Your plan must be to make data quality scorecards for different internal audiences - marketing, IT, c-level, etc.

The aggregation might look something like this:You must design the scorecards to meet the needs of the interest of the different audiences, from technical through to business and up to executive. At the beginning of a data quality scorecard is information about data quality of individual data records. This is the default information that most profilers will deliver out of the box. As you aggregate scores, the high-level measures of the data quality become more meaningful. In the middle are various score sets allowing your company to analyze and summarize data quality from different perspectives. If you define the objective of a data quality assessment project as calculating these different aggregations, you will have much easier time maturing your data governance program. The business users and c-level will begin to pay attention.

Business users are looking for whether the data supports the business process. They want to know if the data is facilitating compliance with laws. They want to decide whether their programs are “Go”, “Caution” or “Stop” like a traffic light. They want to know whether the current processes are giving them good data so they can change them if necessary. You can only do this by aggregating the information quality results and aligning those results with business.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.