Monday, December 21, 2009

The World is Addicted to Data (and that's good for us)


In the famous book “The Transparent Society”, we are asked to consider some of the privacy ills we will be facing as technology improves and our society gains access to more data sets. The book was groundbreaking when it was written in 1999. It imagines the emergence of groups who are more powerful because they own the data. However, as we sit here ten years later with 20/20 hindsight, it’s clear that the existence and access to specialized data sets makes our life better, not worse.

There are countless examples of this daily improvement in our lives, but some personal ones:
  • I was in the supermarket recently and per usual, there was a long line at the deli. On the other hand, there was no line at the “deli kiosk” so I gave it a try. Based on my frequent shopper card number and underlying database, the deli kiosk already knew my preferred brand and type of cheese and delicious deli meats. Ordering was a snap thanks to a database, and I didn’t even have to mispronounce “Deutschmacher” to the deli man, like I usually do.
  • For Thanksgiving, I visited some relatives that I don’t often see. My GPS led me there thanks to a geospatial database. It told me how long it was going to take based on traffic data, which is often aggregated from several sources, including road sensors and car and taxi fleets. I also was informed about all the coffee shops along the way, thanks to the data set provided by the Dunkin Donuts. Before I left, I used Google Street View and Microsoft Bing’s Birds Eye view to see what the destination looked like. Ten years ago, all of this was pretty much unheard of, but thanks to the coming together of geospatial data, real-time traffic data, satellite and airplane imagery, street view imagery, Dunkin Donuts franchise data, and small, cheap processors, my trip was fantastic.
  • Fantasy Football is a new phenomenon, made possible by data our addiction to data. We know exactly where we stand on any given Sunday as player stats are made available instantly during the games. When Wes Welker scores, I see the six points reflected on my score instantly. Companies like STATS not only cover football, but according to their web site - 234 sports.
  • For iPhone users, there are tons of data-centric applications. For example, Wait Watchers is an app that uses user submissions to generate and display a table of the current ride wait times at major theme parks throughout the world. As this information is updated by users, other users at Disney can make decisions about whether to go to Space Mountain or It’s a small world, for example.

In the corporate world, it’s much of the same and even more important to our society. Marketing teams are addicted to information from web analytics and use marketing automation tools to track the success of their programs. Operations teams track assets like computers, buildings, trucks and people with data. Sales has been and will continue to track customers with data. Finance relies on the collision of credit scores data, invoice and payment data as well as making sure they have enough money in reserves to meet regulations. Executives will continue to rely on business intelligence and data. In fact, it’s hard to find anyone in the business world who doesn’t rely on data.

Of course, much of this is anecdotal. I haven’t found any specific study on the increase in database use, but we do know from an old IDC study that the number of servers in use worldwide, presumably some used for database, has roughly doubled from 2000 to 2005. A doubling of servers, combined with a typically bigger hard drive capacity, point to higher database use.

It was difficult to imagine us here ten years ago, and it’s even more difficult to imagine where we’ll be at the beginning of 2020.  It seems to me that we'll have more opportunity to create and use information with applications on our mobile devices. The collision of iPhone/Droid devices with increasing bandwidths of 3G and 4G networks on the major mobile phone carriers tells me that data in the future will let us do things we can only imagine today.

The world is addicted to data and that bodes well for anyone who helps the world manage it. In 2010, no matter if the economy turns up or down, our industry will continue to feed the addiction to good, clean data.

Tuesday, November 10, 2009

Overcoming Objections to a Data Governance Program


You’ve created a wonderful proposal for a comprehensive data governance program. You’ve brought it up to management, but the chiefs tell you there’s just no budget for data governance. Now what?

The best thing you can do it to keep at it. It often takes time to win the hearts and minds of your company. You know that any money spent on data governance will usually come back with multipliers. It just may take some time for others to get on board. Be patient and continue to promote your quest.

Here are some ideas for thinking about your next steps for your data governance program:

Corporate Revenue
Today, companies manage spending tightly, looking at the expenses and revenue each fiscal quarter and each month to optimize the all-important operating income (revenue minus expenses equals operating income). If sales and revenue are weak, management gets miserly. On the other hand, if revenue is high and expenses are low, your high-ROI proposal will have a better chance for approval.

For many people, this corporate reality is hard to deal with. Logical thinkers would suggest that if something is broken, it should be fixed, no matter how well the sales team is performing. The people who run your business have their first priorities set on stockholder value. You too should pay attention to your company’s sales figures as they are announced each quarter. If your company has a quarterly revenue call, use it to strike when the environment for spending is right.

Cheap Wins
If there is no money to spend on information quality, there still may be potential for information quality wins for you to exploit. For example, let’s say you were to profile or make some SQL queries into your company’s supply chain system database and you found a part that has a near duplicate. So, part number “21-998 Condenser” and part number “2-1-998 Cndsr” exist as duplicated parts in your supply chain.

After verifying the fairly obvious duplicate, you can ask your friend on the procurement side how much it costs to store and hold these condensers in inventory. Then use some guerilla marketing techniques to extol the virtues of data governance. After all, if you could find this with just SQL queries, consider how much you could find with a data discovery/profiling tool. Better yet, consider how much you could find with a company-wide initiative.  In a previous blog post, I referred to this as the low-hanging fruit.

Case Studies
Case studies are a great way to spread the word about data governance. They usually contain real-world examples, often of your competitors, who are finding gold with better attention to information quality. Vendors in the data governance space will have case studies on their websites, or you can get unpublished studies by asking your sales representative.

Consider that built-in desire of your company to be competitive, and keep your Google searches and alerts tuned to what data management projects are underway at your competitors.

Analysts
Analysts are another valuable source for proving your point about the virtues of data governance. Your boss may have installed his own custom spam filter against your cajoling on data governance. But he doesn’t have to take your word for it; he can listen to an industry expert.

If you own a subscription to an analyst firm, use it to sell the power of data governance. Analysts offer telephone consultations, reports and webinars to clients. These offerings may be useful to sway your team.  If you are not a client of these firms, go to the vendors. If there is a crucial report, they will often license it to offer on their website for download, particularly if it speaks well about their solution.

Data Governance Expert Sessions
This technique also falls within the category of “don’t just take my word for it.” You can find a data governance workshop from many vendors to assist your organization with developing your data quality strategies. Often conducted for a group, the session leader interacts with a group of your choosing and presents the potential for improving the efficiency of your business with data governance. As the meeting leader, you would invite both technologists and business users. Include those who are skeptical of the value a data-quality program will bring to their company; a third-party opinion may sway them. The cost is usually reasonable and it can help the group understand and share key concepts of data governance.

Guerrilla Marketing
Why not start your own personal crusade, your own marketing initiative to drive home the power of information quality? In my previous installment of the data governance blog, I offer graphics for use in your signature file to drive home the importance of IQ to your organization. Use the power of a newsletter, blog, or e-mail signature to get your message across.


Excerpt from Steve Sarsfield's book "The Data Governance Imperative"

Thursday, October 22, 2009

Book Review: Data Modeling for Business


A couple of weeks ago, I book-swapped with author Donna Burbank. She has a new book entitled Data Modeling for Business. Donna, an experienced consultant by trade, has teamed up with Steve Hoberman, a previous published author and technologist and Chris Bradley, also a consultant, for an excellent exploration of the process of creating a data model. With a subtitle like “A handbook for Aligning the Business with IT using a High-Level Data Model” I knew I was going to find some value in the swap.

The book describes in plain English the proper way to create a data model, but that simple description doesn’t do it justice. The book is designed for those who are learning from scratch – those who only vaguely understand what a data model is. It uses commonly understood concepts to describe data model concepts. The book describes the impact of the data model to the project’s success and digs into setting up data definitions and the levels of detail necessary for them to be effective. All of this is accomplished in a very plain-talk, straight-forward tone without the pretentiousness you sometimes get in books about data modeling.

We often talk about the need for business and IT to work together to build a data governance initiative. But many, including myself, have pointed to the communication gap that can exist in a cross-functional team. In order to bridge the gap, a couple of things need to happen. First, IT teams need to expand their knowledge of business processes, budgets and corporate politics. Second, business team members need to expand their knowledge of metadata and data modeling. This book provides an insightful education for the latter. In my book, the Data Governance Imperative, the goal was the former.

The book is well-written and complete. It’s a perfect companion for those who are trying to build a knowledgeable, cross-function team for data warehouse, MDM or data governance projects. Therefore, I’ve added it to my recommended reading list on my blog.

Monday, October 12, 2009

Data May Require Unique Data Quality Processes


A few things in life have the same appearance, but the details can vary widely.  For example, planets and stars look the same in the night sky, but traveling to them and surviving once you get there are two completely different problems. It’s only when you get close to your destination that you can see the difference.

All data quality projects can appear the same from afar but ultimately can be as different as stars and planets. One of the biggest ways they vary is in the data itself and whether it is chiefly made up of name and address data or some other type of data.

Name and Address Data
A customer database or CRM system contains data that we know much about. We know that letters will be transposed, names will be comma reversed, postal codes will be missing and more.  There are millions of things that good data quality tools know about broken name and address data since so many name and address records have been processed over the years. Over time, business rules and processes are fine-tuned for name and address data.  Methods of matching up names and addresses become more and more powerful.

Data quality solutions also understand what name and addresses are supposed to look like since the postal authorities provide them with correct formatting. If you’re somewhat precise about following the rules of the postal authorities, most mail makes it to its destination.  If we’re very precise, the postal services can offer discounts. The rules are clear in most parts of the civilized world. Everyone follows the same rules for name and address data because it makes for better efficiency.

So, if we know what the broken item looks like and we know what the fixed item is supposed to look like, you can design and develop processes that involve trained, knowledgeable workers and automated solutions to solve real business problems. There’s knowledge inherent in the system and you don’t have to start from scratch every time you want to cleanse it.

ERP, Supply Chain Data
However, when we take a look at other types of data domains, the picture is very different.  There isn’t a clear set of knowledge what is typically input and what is typically output and therefore you must set up processes for doing so. In supply chain data or ERP data, we can’t immediately see why the data is broken or what we need to do to fix it.  ERP data is likely to be sort of a history lesson of your company’s origins, the acquisitions that were made, and the partnership changes throughout the years. We don’t immediately have an idea about how the data should ultimately look. The data that exists in this world is specific to one client or a single use scenario which cannot be handled by existing out-of-the-box rules

With this type of data you may find the need to collaborate more with the business users of the data, who expertise in determining the correct context for the information comes more quickly, and therefore enable you to effect change more rapidly. Because of the inherent unknowns about the data, few of the steps for fixing the data are done for you ahead of time. It then becomes critical to establish a methodology for:
  • Data profiling in order to understanding what issues and challenges.
  • Discussions with the users of the data to understand context, how it’s used and the most desired representation.  Since there are few governing bodies for ERP and supply chain data, the corporation and its partners must often come up with an agreed-upon standard.
  • Setting up business rules, usually from scratch, to transform the data
  • Testing the data in the new systems
I write about this because I’ve read so much about this topic lately. As practitioners you should be aware that the problem is not the same across all domains. While you can generally solve name and address data problems with a technology focus, you will often rely more on collaboration with subject matter experts to solve issues in other data domains.

Monday, August 24, 2009

9 Questions CEOs Should Ask About Data Governance

When it comes to data governance, the one most influential power in an organization with respect to data governance is the executive team (presidents, vice presidents, managing directors, and CxOs). Sure, business users control certain aspects of the initiative and may even want to hold them back to maintain data ownership. It’s also true that the technology team is influential, but may be short on staff, short on budget and busy with projects like software upgrades. So, it sometimes falls to executives to push data governance as a strategic initiative when the vision doesn’t come from elsewhere.

It makes sense. Executives have the most to gain from a data governance program. Data governance brings order to the business, offering the ability to make effective and timely decisions. By implementing a data governance program, you can make fewer decisions based on ‘gut’ and better decisions based on knowledge. It’s an executive’s job to strive for greater control and lower risk, and that can’t be achieved without some form of data governance.

Rather than issuing edicts, a tactic of many smart executives implement is to ask questions. Questioning your IT and business teams is a form of fact-checking your decisions, understanding shortcomings in skills and resources and empowering your people. It ultimately allows your people to come to the same decision at which you may have already arrived. It is a very gracious way to manage.

Therefore asking questions about data governance is an important job of a CEO. Some of the questions you should be asking your technology leaders are as follows:

Question

Impact

Do we have a data management strategy?

Ask the question to understand if your people have considered data governance. If you have a strategy, you should know who are the people and how are they organized around providing information to the corporation. What are the process for information in the organization?

Are we ahead or behind our competitors with regard to business intelligence and data governance?

Case studies on managing data are widely available on vendor web sites. It’s important to understand if any of your competitors are outflanking you on the efficiencies gained from data governance.

What is poor information quality costing us?

Has your technology team even considered the business impact of information quality on the bottom line, or are they just accepting these costs as standard operating procedure?

What confidence level do you have in my revenue reports?

Has your team considered the impact of information on the business intelligence and therefore the reports they are handing you?

Are we in compliance with all laws regarding our governance of data?

Executives are often culpable for non-compliance, so you should be concerned about any laws that govern the company’s industry. This holds especially true in banking and healthcare, but even in unregulated industries, organizations must comply with spam laws and “do not mail” laws for marketing, for example.

Are you working across business units to work towards data governance, or is data quality done in silos?

To provide the utmost efficiency, information quality processes should be reusable and implemented in similar manner across business units. This is done for exactly the same reason you might standardize on a type of desktop computer or software package for your business – it’s more efficient to share training resources and support to work better as a team. Taking successful processes from one business unit and extending them to others is the best strategy.

Do you have the access to data you need?

The CEO should understand if any office politics are getting in the way of ensuring that the business has the information it need. This question opens the door to that discussion.

How many people in your business unit are managing data?

To really understand if you need to a unified process for managing data, it often helps to look at the organizational chart and try to figure out how many people already manage it. A centralized strategy for data governance may actually prove more efficient.

Who owns the information in your business unit? If something goes right, who should I praise, and if something is wrong, who should I reprimand?

The business should understand who is culpable for adverse events with regard to information. If, for example, you lose revenue by sending the wrong type of customer discount offers, or if you can’t deliver your product because of problems with inventory data, there should be someone responsible. Take action if the answer cannot easily be given.




By asking these questions, you’ll open up the door to some great discussions about data governance. It should allow you to be a maverick for all of your company’s data needs. Thanks to Ajay Ohri for posing this question to me in last week’s interview; it’s something every executive should consider.

Friday, August 14, 2009

My Interview with Ajay Ohri

Ajay asks me some great questions over at DecisionStats.

Tuesday, July 21, 2009

Data Quality – Technology’s Prune

Prunes. When most of us think of prunes, we tend to think of a cure for older people suffering from constipation. In reality, prunes are not only sweet but are also highly nutritious. Prunes are a good source of potassium and a good source of dietary fiber. Prunes suffer from a stigma that’s just not there for dried apricots, figs and raisins, which have a similar nutritional benefit and medicinal benefit. Prunes suffer from bad marketing.

I have no doubt that data quality is considered technology’s prune by some. We know that information quality is good for us, having many benefits to the corporation. It also can be quite tasty in its ability to deliver benefit, yet most of our corporations think of it as a cure for business intelligence constipation – something we need to “take” to cure the ills of the corporation. Like the lowly prune, data quality also suffers from bad marketing.

In recent years, prune marketers in the United States have begun marketing their product as "dried plums” in an attempt to get us to change the way we think about them. Commercials show the younger, soccer Mom crowd eating the fruit and being surprised at its delicious flavor. It may take some time for us to change our minds about prunes. I suppose if Lady Gaga or Zac Efron would be spokespersons, prunes might have a better chance.

The biggest problem in making data quality beloved by the business world is that it’s well… hard to explain. When we talk about it, we get crazy with metadata models and profiling metrics. It’s great when we’re communicating among data professionals, but that talk tends to plug-up business users.

In my recent presentations and in recent blog posts, I’ve made it clear that it’s up to us, the data quality champions, to market data quality, not as a BI laxative, but as a real business initiative with real benefits. For example:

  • Take a baseline measurement and track ROI, even if you think you don’t have to
  • If the project has no ROI, you should not be doing it. Find the ROI by asking the business users of the data what they use it for.
  • Aggregate and roll-up our geeky metrics of nulls, accuracy, conformity, etc into metrics that a business user would understand – like according to our evaluation, 86.4% of our customers are fully reachable by mail.
  • Create and use the aggregated scores similar to the Dow Jones Industrial Average. Publish them at regular intervals. To raise awareness of the data quality, talk about why it’s up and talk about why it has gone down.
  • Have a business-focused elevator pitch ready when someone asks you what you do. “My team is saving the company millions by ensuring that the ERP system accurately reflects inventory levels.”
Of course there's more. There’s more to this in my previous blog posts, yet to come in my future blog posts, and in my book The Data Governance Imperative. Marketing the value of data quality is just something we all need to do more of. Not selling the business importance of data quality... it’s just plum-crazy!

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.