Thursday, October 22, 2009

Book Review: Data Modeling for Business


A couple of weeks ago, I book-swapped with author Donna Burbank. She has a new book entitled Data Modeling for Business. Donna, an experienced consultant by trade, has teamed up with Steve Hoberman, a previous published author and technologist and Chris Bradley, also a consultant, for an excellent exploration of the process of creating a data model. With a subtitle like “A handbook for Aligning the Business with IT using a High-Level Data Model” I knew I was going to find some value in the swap.

The book describes in plain English the proper way to create a data model, but that simple description doesn’t do it justice. The book is designed for those who are learning from scratch – those who only vaguely understand what a data model is. It uses commonly understood concepts to describe data model concepts. The book describes the impact of the data model to the project’s success and digs into setting up data definitions and the levels of detail necessary for them to be effective. All of this is accomplished in a very plain-talk, straight-forward tone without the pretentiousness you sometimes get in books about data modeling.

We often talk about the need for business and IT to work together to build a data governance initiative. But many, including myself, have pointed to the communication gap that can exist in a cross-functional team. In order to bridge the gap, a couple of things need to happen. First, IT teams need to expand their knowledge of business processes, budgets and corporate politics. Second, business team members need to expand their knowledge of metadata and data modeling. This book provides an insightful education for the latter. In my book, the Data Governance Imperative, the goal was the former.

The book is well-written and complete. It’s a perfect companion for those who are trying to build a knowledgeable, cross-function team for data warehouse, MDM or data governance projects. Therefore, I’ve added it to my recommended reading list on my blog.

Monday, October 12, 2009

Data May Require Unique Data Quality Processes


A few things in life have the same appearance, but the details can vary widely.  For example, planets and stars look the same in the night sky, but traveling to them and surviving once you get there are two completely different problems. It’s only when you get close to your destination that you can see the difference.

All data quality projects can appear the same from afar but ultimately can be as different as stars and planets. One of the biggest ways they vary is in the data itself and whether it is chiefly made up of name and address data or some other type of data.

Name and Address Data
A customer database or CRM system contains data that we know much about. We know that letters will be transposed, names will be comma reversed, postal codes will be missing and more.  There are millions of things that good data quality tools know about broken name and address data since so many name and address records have been processed over the years. Over time, business rules and processes are fine-tuned for name and address data.  Methods of matching up names and addresses become more and more powerful.

Data quality solutions also understand what name and addresses are supposed to look like since the postal authorities provide them with correct formatting. If you’re somewhat precise about following the rules of the postal authorities, most mail makes it to its destination.  If we’re very precise, the postal services can offer discounts. The rules are clear in most parts of the civilized world. Everyone follows the same rules for name and address data because it makes for better efficiency.

So, if we know what the broken item looks like and we know what the fixed item is supposed to look like, you can design and develop processes that involve trained, knowledgeable workers and automated solutions to solve real business problems. There’s knowledge inherent in the system and you don’t have to start from scratch every time you want to cleanse it.

ERP, Supply Chain Data
However, when we take a look at other types of data domains, the picture is very different.  There isn’t a clear set of knowledge what is typically input and what is typically output and therefore you must set up processes for doing so. In supply chain data or ERP data, we can’t immediately see why the data is broken or what we need to do to fix it.  ERP data is likely to be sort of a history lesson of your company’s origins, the acquisitions that were made, and the partnership changes throughout the years. We don’t immediately have an idea about how the data should ultimately look. The data that exists in this world is specific to one client or a single use scenario which cannot be handled by existing out-of-the-box rules

With this type of data you may find the need to collaborate more with the business users of the data, who expertise in determining the correct context for the information comes more quickly, and therefore enable you to effect change more rapidly. Because of the inherent unknowns about the data, few of the steps for fixing the data are done for you ahead of time. It then becomes critical to establish a methodology for:
  • Data profiling in order to understanding what issues and challenges.
  • Discussions with the users of the data to understand context, how it’s used and the most desired representation.  Since there are few governing bodies for ERP and supply chain data, the corporation and its partners must often come up with an agreed-upon standard.
  • Setting up business rules, usually from scratch, to transform the data
  • Testing the data in the new systems
I write about this because I’ve read so much about this topic lately. As practitioners you should be aware that the problem is not the same across all domains. While you can generally solve name and address data problems with a technology focus, you will often rely more on collaboration with subject matter experts to solve issues in other data domains.

Monday, August 24, 2009

9 Questions CEOs Should Ask About Data Governance

When it comes to data governance, the one most influential power in an organization with respect to data governance is the executive team (presidents, vice presidents, managing directors, and CxOs). Sure, business users control certain aspects of the initiative and may even want to hold them back to maintain data ownership. It’s also true that the technology team is influential, but may be short on staff, short on budget and busy with projects like software upgrades. So, it sometimes falls to executives to push data governance as a strategic initiative when the vision doesn’t come from elsewhere.

It makes sense. Executives have the most to gain from a data governance program. Data governance brings order to the business, offering the ability to make effective and timely decisions. By implementing a data governance program, you can make fewer decisions based on ‘gut’ and better decisions based on knowledge. It’s an executive’s job to strive for greater control and lower risk, and that can’t be achieved without some form of data governance.

Rather than issuing edicts, a tactic of many smart executives implement is to ask questions. Questioning your IT and business teams is a form of fact-checking your decisions, understanding shortcomings in skills and resources and empowering your people. It ultimately allows your people to come to the same decision at which you may have already arrived. It is a very gracious way to manage.

Therefore asking questions about data governance is an important job of a CEO. Some of the questions you should be asking your technology leaders are as follows:

Question

Impact

Do we have a data management strategy?

Ask the question to understand if your people have considered data governance. If you have a strategy, you should know who are the people and how are they organized around providing information to the corporation. What are the process for information in the organization?

Are we ahead or behind our competitors with regard to business intelligence and data governance?

Case studies on managing data are widely available on vendor web sites. It’s important to understand if any of your competitors are outflanking you on the efficiencies gained from data governance.

What is poor information quality costing us?

Has your technology team even considered the business impact of information quality on the bottom line, or are they just accepting these costs as standard operating procedure?

What confidence level do you have in my revenue reports?

Has your team considered the impact of information on the business intelligence and therefore the reports they are handing you?

Are we in compliance with all laws regarding our governance of data?

Executives are often culpable for non-compliance, so you should be concerned about any laws that govern the company’s industry. This holds especially true in banking and healthcare, but even in unregulated industries, organizations must comply with spam laws and “do not mail” laws for marketing, for example.

Are you working across business units to work towards data governance, or is data quality done in silos?

To provide the utmost efficiency, information quality processes should be reusable and implemented in similar manner across business units. This is done for exactly the same reason you might standardize on a type of desktop computer or software package for your business – it’s more efficient to share training resources and support to work better as a team. Taking successful processes from one business unit and extending them to others is the best strategy.

Do you have the access to data you need?

The CEO should understand if any office politics are getting in the way of ensuring that the business has the information it need. This question opens the door to that discussion.

How many people in your business unit are managing data?

To really understand if you need to a unified process for managing data, it often helps to look at the organizational chart and try to figure out how many people already manage it. A centralized strategy for data governance may actually prove more efficient.

Who owns the information in your business unit? If something goes right, who should I praise, and if something is wrong, who should I reprimand?

The business should understand who is culpable for adverse events with regard to information. If, for example, you lose revenue by sending the wrong type of customer discount offers, or if you can’t deliver your product because of problems with inventory data, there should be someone responsible. Take action if the answer cannot easily be given.




By asking these questions, you’ll open up the door to some great discussions about data governance. It should allow you to be a maverick for all of your company’s data needs. Thanks to Ajay Ohri for posing this question to me in last week’s interview; it’s something every executive should consider.

Friday, August 14, 2009

My Interview with Ajay Ohri

Ajay asks me some great questions over at DecisionStats.

Tuesday, July 21, 2009

Data Quality – Technology’s Prune

Prunes. When most of us think of prunes, we tend to think of a cure for older people suffering from constipation. In reality, prunes are not only sweet but are also highly nutritious. Prunes are a good source of potassium and a good source of dietary fiber. Prunes suffer from a stigma that’s just not there for dried apricots, figs and raisins, which have a similar nutritional benefit and medicinal benefit. Prunes suffer from bad marketing.

I have no doubt that data quality is considered technology’s prune by some. We know that information quality is good for us, having many benefits to the corporation. It also can be quite tasty in its ability to deliver benefit, yet most of our corporations think of it as a cure for business intelligence constipation – something we need to “take” to cure the ills of the corporation. Like the lowly prune, data quality also suffers from bad marketing.

In recent years, prune marketers in the United States have begun marketing their product as "dried plums” in an attempt to get us to change the way we think about them. Commercials show the younger, soccer Mom crowd eating the fruit and being surprised at its delicious flavor. It may take some time for us to change our minds about prunes. I suppose if Lady Gaga or Zac Efron would be spokespersons, prunes might have a better chance.

The biggest problem in making data quality beloved by the business world is that it’s well… hard to explain. When we talk about it, we get crazy with metadata models and profiling metrics. It’s great when we’re communicating among data professionals, but that talk tends to plug-up business users.

In my recent presentations and in recent blog posts, I’ve made it clear that it’s up to us, the data quality champions, to market data quality, not as a BI laxative, but as a real business initiative with real benefits. For example:

  • Take a baseline measurement and track ROI, even if you think you don’t have to
  • If the project has no ROI, you should not be doing it. Find the ROI by asking the business users of the data what they use it for.
  • Aggregate and roll-up our geeky metrics of nulls, accuracy, conformity, etc into metrics that a business user would understand – like according to our evaluation, 86.4% of our customers are fully reachable by mail.
  • Create and use the aggregated scores similar to the Dow Jones Industrial Average. Publish them at regular intervals. To raise awareness of the data quality, talk about why it’s up and talk about why it has gone down.
  • Have a business-focused elevator pitch ready when someone asks you what you do. “My team is saving the company millions by ensuring that the ERP system accurately reflects inventory levels.”
Of course there's more. There’s more to this in my previous blog posts, yet to come in my future blog posts, and in my book The Data Governance Imperative. Marketing the value of data quality is just something we all need to do more of. Not selling the business importance of data quality... it’s just plum-crazy!

Monday, July 13, 2009

Data Quality Project Selection

What if you have five data intensive projects that are all in need of your very valuable resources for improving data quality? How do you decide where to focus? The choice is not always clear. Management may be interested in accurate reporting from your data warehouse, but revenue may be at stake in other projects. So, just how do you decide where to start?

To aid in a choice between projects, it may help to plot your projects on a “Project Selection Quadrant” as I’ve shown here. The quadrant chart plots the difficulty of completing a project versus the value it brings to the organization.




















Project Difficulty
To find the project on the X axis, you must understand how your existing system is being used; how various departments use it differently; and if there are special programs or procedures that impact the use of the data. To predict project length, you have to rely heavily on your understanding your organization's goals and business drivers.

Some of the things that will affect project difficulty:
Access to the data – do you have permission to get the data?
Window of opportunity – how much time do you have between updates to work on the data
Number of databases – more databases will increase complexity
Languages and code pages – is it English or Kanji? Is it ASCII or EBCDIC? If you have mixed languages and code pages, you may have more work ahead of you
Current state of data quality – The more non-standard your data is to begin with, the harder the task
Volume of data – data standardization takes time and the more you have, the longer it’ll take
Governance, Risk and Compliance mandates – is your access to the data stopped by regulation?

Project Value
For assessing project value (the Y axis), there is really one thing that you want to look at – money. It comes from your discussions with the business users around their ability to accomplish things like:
• being able to effectively reach/support customers
• call center performance
• inventory and holding costs
• exposure to risk such as being out of compliance with any regulations in your industry
• any business process that is inefficient because of data quality

The Quadrants
Now that you’ve assessed your projects, they will naturally fall into the following quadrants:

Lower left: The difficult and low value targets. If management is trying to get you to work on these, resist. You’ll never get anywhere with your enterprise-wide appeal by starting here.
Lower right: These may be easy to complete, but if they have limited value, you should hold off until you have complete corporate buy-in for an enterprise-wide data quality initiative.
Upper left: Working on high value targets that are hard complete will likely only give your company sticker shock when you show them the project plan. Or, they may run into major delays and be cancelled altogether. Again, proceed with caution. Make sure you have a few wins under your belt before you attempt.
Upper right: Ah, low-hanging fruit. Projects that are easier to complete with high value are the best places to begin. As long as you document and promote the increase in value that you’ve delivered to the company, you should be able to leverage these wins into more responsibility and more access to great projects.

Keeping an eye on both the business aspect of the data, its value, and the technical difficulty in standardizing the data will help you decide where to go and how to make your business stronger. It will also ensure that you and your business co-workers to understand the business value of improving data quality within your projects.

Monday, July 6, 2009

June’s "El Festival del IDQ Bloggers”


A Blog Carnival for Information/Data Quality Bloggers

June of 2009 is gone, so it’s time to look back at the month and recognized some of the very best data quality blog entries. Like other blog carnivals, this one is a collection of posts from different blogs on a specific theme.

If you’re a blogger and you missed out on this month’s data quality carnival, don’t worry. You can always submit your brilliant entries next month. So, here they are, in no particular order.


  • Newcomer Jeremy Benson has a unique perspective of being an actuary – someone who deals with the financial impact of risk and uncertainty to a business. We know that improving data quality will certainly produce more accurate assessments when it comes to crunching numbers and calculating risk. This month’s blog entry describes how data quality is important to predictive modeling. More actuaries should understand the importance of data quality, so this is a positive step.

  • Irish information quality expert Daragh O Brien was talking about his marriage problems this month – well, at least the data quality problems with his recording of his marriage. In this post he discusses a recent experience and how it made him think yet again about the influence of organizational culture and leadership attributes on information quality success and change management.


  • Western Australian blogger Vince McBurney contributes his excellent analysis of the new Gartner Magic Quadrant for data quality tools. Vince’s analysis of the LAST Magic Quadrant (two years ago) was perhaps my biggest inspiration for getting involved in blogging, so it makes me happy to include his blog. “Tooling Around on the IBM InfoSphere” is focused on data integration topics from the perspective of an expert in the IBM suite of software tools.

  • Jim Harris takes us into “The Data-Information Continuum” to remind us that data quality is usually both objective and subjective, making reaching the “single version of truth” more mystical. The post made it clear to me that our description of the data quality problem is evolving, and the language we must use to promote our successes must evolve, too.


  • Dalton Cervo is the Customer Data Quality Lead at Sun Microsystems and a member of the Customer Data Governance team at Sun. Dalton takes us on a journey of depuplicating a customer database using a popular data quality tool. It’s great to see the detail of project like this so that we can better understand the challenges and benefits of using data quality tools.


Thanks to all the outstanding data quality bloggers this month!

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.