Friday, December 28, 2007

Data Governance and Data Quality Predictions for 2008

Prognostication is a game involving gathering information, digesting it, recognizing the trends and using them to predict the future. When you make predictions, they only fail when you don’t have all the information. Completely accurate predictions require omnipotence, which it’s safe to say, I do not have. Yet, it’s fun to peer into the crystal ball and see the future. Here are my predictions for the world of data governance in 2008:

  • Business acumen will become more important than technical acumen in the IT world - This prediction is just another way to look at the fact that business users are getting more and more involved in the technology process, and that meeting the demands of the business users will be paramount. In order for technologists to survive, they will need to communicate in ways that business people understand. In 2008, it won’t be how many certifications you have, rather your ability to understand return on investment and get the message across.
  • Business Process Management will emerge – In a related prediction, applications that manage business process will emerge as important to most organizations. Business users and IT users will work together in GUIs that can quickly change processes in an organization without a lengthy IT development. It will start with call center applications, where companies strive to lower call times and improve customer satisfaction. It will then move to other areas of the business, including logistics. Of course, data quality vendors who embrace BPM will thrive.
  • The term “Data Quality” will be used less and less in the industry – The term data quality has been corrupted in a sense by MDM, CRM, ETL, and end-to-end data management vendors who claim to have data quality functionality, but sometimes have very weak solutions. New terminology will be defined by the industry that more precisely describes the solutions and processes behind data governance.
  • Specialty data quality vendors will expand data domains served to provide increased value – The main reason for the survival (and growth) of independent data quality vendors in 2008 and beyond will be in the data domains they serve. Large vendors offering end-to-end data management solutions simply won’t be interested in rules set expansion to cover data domains like supply chain, ERP, financial data, and other industry-specific domains. Nor will they invest in fine-tuning their business rules engines to deal with new data anomalies. Yet, the biggest projects in 2008 will rely on the data quality engine’s ability to cleanse beyond US name and address data. The big projects will need advanced matching techniques offered only by the specialty vendors.
  • Solutions – Customers will be looking for traditional data quality vendors to provide solutions, not just technology. Data governance is about the people, process and technology. Who better to provide expertise than those who have successfully implemented solutions? Successful data quality vendors will strive to deliver process-centric solutions for their customers.

Thursday, December 20, 2007

MDM Readiness Kit

I'm excited that Trillium Software is now offering a Master Data Management Readiness Kit. It represents some of the best thought leadership pieces that Trillium Software has produced yet, plus a smattering of industry knowledge about master data management. Certainly this kit is worth a download, if you're implementing, or thinking about implementing a master data management strategy at your company.

The kit includes the Gartner Magic Quadrant for Data Quality Tools 2007, a Data Quality Project Planning Checklist for MDM, an UMB Bank Case Study for Data Governance, and section on how to build a Business Case for Data Quality and MDM.

Sunday, December 16, 2007

Data Governance or Magic

Today, I wanted to report on what I have discovered - an extremely large data governance project. The project is shrouded in secrecy, but bits and pieces have come out that point to the largest data governance project in the world. I hesitate to give you the details. This quasi-governmental, cross-secular organization is one of the foundational organizations or our society. Having said that, not everyone recognizes it as an authority.

Some statistics: the database contains over 40 million names in the US alone. In Canada, Mexico, South America, and many countries in Europe, the names and addresses of up to 15% percent of the population is stored in this data warehouse. Along with geospatial information, used to optimize product delivery, there’s a huge amount of transactional data. Customers in the data warehouse are served for up to 12 years, when the trends show that most customers move on and eventually pass their memberships on to their children. Because of the nature of their work, there is sleep pattern information on each individual, as well as a transaction when they do something “nice” for society, or whether they pursue more “naughty” actions. For example, when the individual exhibits emotional outbursts, such as pouting or crying, this kicks off a series of events that affect a massive manufacturing facility and supply chain, staffed by thousands of specialty workers who adjust as the clients’ disposition reports come into the system. Many of the clients are simply delivered coal, but other customers receive the toy, game, new sled, of their dreams. Complicating matters even more, the supply chain must deliver all products on a single day each year, December 25th.

I am of course talking about the implementation managed by Kris Kringle at the North Pole. I tried to find out more about the people, processes and products in place, but apparently there is a custom application in place. According to Mr. Kringle, “Our elves use ‘magic’ to understand our customers and manage our supply chain, so there is no need for Teradata, SAP, Oracle, Trillium Software any other enterprise application in this case. Our magic solution has served us well for many years, and we plan to continue with this strategy for years to come.” If only we could productize some of that Christmas magic.

Tuesday, December 11, 2007

Data Governance Success in the Financial Services Sector - UMB

We all know by now that data governance is comprised of people, process and technology. Without all of these factors working together in harmony, data governance can’t succeed.
Among the latest webcasts we’ve recently done at Trillium Software, there’s the story of UMB Bank. This is a very interesting story about people, process and technology in the financial services world and how they came together for success.
The team started with a mission statement: to know customers, anticipate needs, advocate and advise, innovate and surprise. The initiative used technology to build a solid foundation of high quality, integrated customer data. The technology is built on Oracle and Trillium Software to deliver high quality customer data to all arms of the business. Finally, the webcast covers the process and people in starting out with smaller projects and building alignment within the data governance team for ongoing success.
If you have about 45 minutes, please use it to view this wecast, now available for replay on the Trillium Software web site. It’s a great use of your time!

Friday, December 7, 2007

Probabilistic Matching: Sounds like a good idea, but...

I've been thinking about the whole concept of probabilistic matching and how flawed it is to assume that this matching technique is the best there is. Even in concept, it isn't.

To summarize, decisions for matching records together with probabilistic matchers are based on three things: 1) statistical analysis of the data; 2) a complicated mathematical formula, and; 3) and a “loose” or “tight” control setting. Statistical analysis is important because under probabilistic matching, data that is more unique in your data set has more weight in determining a pass/fail on the match. In other words, if you have a lot of ‘Smith’s in your database, Smith becomes a less important matching criterion for that record. If the record has a unique last name like ‘Afinogenova’ that’ll carry more weight in determining the match.

So the only control you really have is the loose or tight setting. Imagine for a moment that you had a volume control for the entire world. This device allows you to control the volume of every living thing and every device on the planet. The device uses a strange and mystical algorithm of sound dynamics and statistics that only the most knowledgeable scientists can understand. So, if construction noise gets too much outside your window, you could turn the knob down. The man in the seat next to you on the airplane is snoring too loud? Turn down the volume.

Unfortunately, the knob does control EVERY sound on the planet, so when you turn down the volume, the ornithologist in Massachusetts can’t hear the rare yellow-bellied sapsucker she’s just spotted. A mother in Chicago may be having a hard time hearing her child coo, so she and a thousand other people call you to ask you to turn up the volume.

Initially, the idea of a world volume control sounds really cool, but after you think about the practical applications, it’s useless. By making one adjustment to the knob, the whole system must readjust.

That’s exactly why most companies don’t use probabilistic matching. To bring records together, probabilistic matching uses statistics and algorithms to determine a match. If you don’t like the way it’s matching, your only recourse is to adjust the volume control. However, the correct and subtle matches that probabilistic matching found on the previous run will be affected by your adjustment. It just makes more sense for companies to have the individual volume controls that deterministic and rules-based matching provides to find duplicates and households.
Perhaps more importantly, certain types of companies can't use probabilistic matching because of transparency. If you're changing the data at financial institutions, for example, you need to be able to explain exactly why you did it. An auditor my ask you why you matched two customer records? That's something that's easy to explain with a rules-based system, and much less transparent with probabilistic matching.

I have yet to talk to a company that actually uses 100% probabilistic matching in their data quality production systems. Like the master volume control, it sounds like a good idea when the sales guy pitches it, but once implemented, the practical applications are few.
Read more on probablistic matching.

Saturday, December 1, 2007

SAP in the Big D

I'm headed down to Dallas this week for a meet-and-greet event with the SAP CRM community. Some of our successful customers will be there, including Okidata, Moen, and the folks from Sita who are representing our Shred-It implementation of Trillium Software with SAP CRM.
However, I've been thinking about the SAP acquisition of Business Objects, strictly from the information quality tools perspective. When SAP announced that they were buying BO, the press release covered the synergies in business intelligence, yet there was barely a mention of the data quality tools.
Prior to the announcement, BO had been buying up vendors like Inxight, Fuzzy Informatik, and FirstLogic. Over its long history, FirstLogic solved their lack of global data support with an OEM partnership with Identex. So, if you wanted a global implementation from FirstLogic, they sold you both solutions. But with BO's acquisition of Fuzzy Informatik, word was that the Identex solution was beginning to lose traction. Global data could be handled by either the Identex solution or the in-house, revenue-generating Fuzzy Informatik solution. When revenue is involved, the partner usually loses.
So there are challenges, primarily the cornucopia of solutions. Strictly from a data quality solution perspective, there will be a wide assortment of customers of FirstLogic, FirstLogic/Identex, FirstLogic/Fuzzy Informatik, and Fuzzy Informatik data quality technology.
I'm not the only one thinking about this, Andy Bitterer and Ted Friedman from Gartner have been thinking about it too, but I think the situation is even more convoluted that they describe.
I have faith that SAP can address these challenges, but it's going to take a big effort to get it done. It's also going to take a decision by SAP to keep key developers and experts on-staff to fully integrate a data quality solution for the future. It's also going to take quick action to keep this technology moving forward. It may even be a couple of years to sort it all out.
Meanwhile, folks who have chosen Trillium Software as their data quality solution look pretty good right now. One platform that supports both the global aspects of data quality, and the platform aspects of data quality, by offering support for SAP CRM, ERP, and SAP NetWeaver MDM to name just a few.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.