Friday, December 28, 2007

Data Governance and Data Quality Predictions for 2008

Prognostication is a game involving gathering information, digesting it, recognizing the trends and using them to predict the future. When you make predictions, they only fail when you don’t have all the information. Completely accurate predictions require omnipotence, which it’s safe to say, I do not have. Yet, it’s fun to peer into the crystal ball and see the future. Here are my predictions for the world of data governance in 2008:

  • Business acumen will become more important than technical acumen in the IT world - This prediction is just another way to look at the fact that business users are getting more and more involved in the technology process, and that meeting the demands of the business users will be paramount. In order for technologists to survive, they will need to communicate in ways that business people understand. In 2008, it won’t be how many certifications you have, rather your ability to understand return on investment and get the message across.
  • Business Process Management will emerge – In a related prediction, applications that manage business process will emerge as important to most organizations. Business users and IT users will work together in GUIs that can quickly change processes in an organization without a lengthy IT development. It will start with call center applications, where companies strive to lower call times and improve customer satisfaction. It will then move to other areas of the business, including logistics. Of course, data quality vendors who embrace BPM will thrive.
  • The term “Data Quality” will be used less and less in the industry – The term data quality has been corrupted in a sense by MDM, CRM, ETL, and end-to-end data management vendors who claim to have data quality functionality, but sometimes have very weak solutions. New terminology will be defined by the industry that more precisely describes the solutions and processes behind data governance.
  • Specialty data quality vendors will expand data domains served to provide increased value – The main reason for the survival (and growth) of independent data quality vendors in 2008 and beyond will be in the data domains they serve. Large vendors offering end-to-end data management solutions simply won’t be interested in rules set expansion to cover data domains like supply chain, ERP, financial data, and other industry-specific domains. Nor will they invest in fine-tuning their business rules engines to deal with new data anomalies. Yet, the biggest projects in 2008 will rely on the data quality engine’s ability to cleanse beyond US name and address data. The big projects will need advanced matching techniques offered only by the specialty vendors.
  • Solutions – Customers will be looking for traditional data quality vendors to provide solutions, not just technology. Data governance is about the people, process and technology. Who better to provide expertise than those who have successfully implemented solutions? Successful data quality vendors will strive to deliver process-centric solutions for their customers.

Thursday, December 20, 2007

MDM Readiness Kit

I'm excited that Trillium Software is now offering a Master Data Management Readiness Kit. It represents some of the best thought leadership pieces that Trillium Software has produced yet, plus a smattering of industry knowledge about master data management. Certainly this kit is worth a download, if you're implementing, or thinking about implementing a master data management strategy at your company.

The kit includes the Gartner Magic Quadrant for Data Quality Tools 2007, a Data Quality Project Planning Checklist for MDM, an UMB Bank Case Study for Data Governance, and section on how to build a Business Case for Data Quality and MDM.

Sunday, December 16, 2007

Data Governance or Magic

Today, I wanted to report on what I have discovered - an extremely large data governance project. The project is shrouded in secrecy, but bits and pieces have come out that point to the largest data governance project in the world. I hesitate to give you the details. This quasi-governmental, cross-secular organization is one of the foundational organizations or our society. Having said that, not everyone recognizes it as an authority.

Some statistics: the database contains over 40 million names in the US alone. In Canada, Mexico, South America, and many countries in Europe, the names and addresses of up to 15% percent of the population is stored in this data warehouse. Along with geospatial information, used to optimize product delivery, there’s a huge amount of transactional data. Customers in the data warehouse are served for up to 12 years, when the trends show that most customers move on and eventually pass their memberships on to their children. Because of the nature of their work, there is sleep pattern information on each individual, as well as a transaction when they do something “nice” for society, or whether they pursue more “naughty” actions. For example, when the individual exhibits emotional outbursts, such as pouting or crying, this kicks off a series of events that affect a massive manufacturing facility and supply chain, staffed by thousands of specialty workers who adjust as the clients’ disposition reports come into the system. Many of the clients are simply delivered coal, but other customers receive the toy, game, new sled, of their dreams. Complicating matters even more, the supply chain must deliver all products on a single day each year, December 25th.

I am of course talking about the implementation managed by Kris Kringle at the North Pole. I tried to find out more about the people, processes and products in place, but apparently there is a custom application in place. According to Mr. Kringle, “Our elves use ‘magic’ to understand our customers and manage our supply chain, so there is no need for Teradata, SAP, Oracle, Trillium Software any other enterprise application in this case. Our magic solution has served us well for many years, and we plan to continue with this strategy for years to come.” If only we could productize some of that Christmas magic.

Tuesday, December 11, 2007

Data Governance Success in the Financial Services Sector - UMB

We all know by now that data governance is comprised of people, process and technology. Without all of these factors working together in harmony, data governance can’t succeed.
Among the latest webcasts we’ve recently done at Trillium Software, there’s the story of UMB Bank. This is a very interesting story about people, process and technology in the financial services world and how they came together for success.
The team started with a mission statement: to know customers, anticipate needs, advocate and advise, innovate and surprise. The initiative used technology to build a solid foundation of high quality, integrated customer data. The technology is built on Oracle and Trillium Software to deliver high quality customer data to all arms of the business. Finally, the webcast covers the process and people in starting out with smaller projects and building alignment within the data governance team for ongoing success.
If you have about 45 minutes, please use it to view this wecast, now available for replay on the Trillium Software web site. It’s a great use of your time!

Friday, December 7, 2007

Probabilistic Matching: Sounds like a good idea, but...

I've been thinking about the whole concept of probabilistic matching and how flawed it is to assume that this matching technique is the best there is. Even in concept, it isn't.

To summarize, decisions for matching records together with probabilistic matchers are based on three things: 1) statistical analysis of the data; 2) a complicated mathematical formula, and; 3) and a “loose” or “tight” control setting. Statistical analysis is important because under probabilistic matching, data that is more unique in your data set has more weight in determining a pass/fail on the match. In other words, if you have a lot of ‘Smith’s in your database, Smith becomes a less important matching criterion for that record. If the record has a unique last name like ‘Afinogenova’ that’ll carry more weight in determining the match.

So the only control you really have is the loose or tight setting. Imagine for a moment that you had a volume control for the entire world. This device allows you to control the volume of every living thing and every device on the planet. The device uses a strange and mystical algorithm of sound dynamics and statistics that only the most knowledgeable scientists can understand. So, if construction noise gets too much outside your window, you could turn the knob down. The man in the seat next to you on the airplane is snoring too loud? Turn down the volume.

Unfortunately, the knob does control EVERY sound on the planet, so when you turn down the volume, the ornithologist in Massachusetts can’t hear the rare yellow-bellied sapsucker she’s just spotted. A mother in Chicago may be having a hard time hearing her child coo, so she and a thousand other people call you to ask you to turn up the volume.

Initially, the idea of a world volume control sounds really cool, but after you think about the practical applications, it’s useless. By making one adjustment to the knob, the whole system must readjust.

That’s exactly why most companies don’t use probabilistic matching. To bring records together, probabilistic matching uses statistics and algorithms to determine a match. If you don’t like the way it’s matching, your only recourse is to adjust the volume control. However, the correct and subtle matches that probabilistic matching found on the previous run will be affected by your adjustment. It just makes more sense for companies to have the individual volume controls that deterministic and rules-based matching provides to find duplicates and households.
Perhaps more importantly, certain types of companies can't use probabilistic matching because of transparency. If you're changing the data at financial institutions, for example, you need to be able to explain exactly why you did it. An auditor my ask you why you matched two customer records? That's something that's easy to explain with a rules-based system, and much less transparent with probabilistic matching.

I have yet to talk to a company that actually uses 100% probabilistic matching in their data quality production systems. Like the master volume control, it sounds like a good idea when the sales guy pitches it, but once implemented, the practical applications are few.
Read more on probablistic matching.

Saturday, December 1, 2007

SAP in the Big D

I'm headed down to Dallas this week for a meet-and-greet event with the SAP CRM community. Some of our successful customers will be there, including Okidata, Moen, and the folks from Sita who are representing our Shred-It implementation of Trillium Software with SAP CRM.
However, I've been thinking about the SAP acquisition of Business Objects, strictly from the information quality tools perspective. When SAP announced that they were buying BO, the press release covered the synergies in business intelligence, yet there was barely a mention of the data quality tools.
Prior to the announcement, BO had been buying up vendors like Inxight, Fuzzy Informatik, and FirstLogic. Over its long history, FirstLogic solved their lack of global data support with an OEM partnership with Identex. So, if you wanted a global implementation from FirstLogic, they sold you both solutions. But with BO's acquisition of Fuzzy Informatik, word was that the Identex solution was beginning to lose traction. Global data could be handled by either the Identex solution or the in-house, revenue-generating Fuzzy Informatik solution. When revenue is involved, the partner usually loses.
So there are challenges, primarily the cornucopia of solutions. Strictly from a data quality solution perspective, there will be a wide assortment of customers of FirstLogic, FirstLogic/Identex, FirstLogic/Fuzzy Informatik, and Fuzzy Informatik data quality technology.
I'm not the only one thinking about this, Andy Bitterer and Ted Friedman from Gartner have been thinking about it too, but I think the situation is even more convoluted that they describe.
I have faith that SAP can address these challenges, but it's going to take a big effort to get it done. It's also going to take a decision by SAP to keep key developers and experts on-staff to fully integrate a data quality solution for the future. It's also going to take quick action to keep this technology moving forward. It may even be a couple of years to sort it all out.
Meanwhile, folks who have chosen Trillium Software as their data quality solution look pretty good right now. One platform that supports both the global aspects of data quality, and the platform aspects of data quality, by offering support for SAP CRM, ERP, and SAP NetWeaver MDM to name just a few.

Thursday, November 29, 2007

Trillium Software Customer Conference

The field marketing team told me today that the dates for the Trillium Software Customer Conference are confirmed now on May 20-23, 2008. You heard it here first. The official announcement comes out from our PR team in a week or so.

I’m excited about it this year, since the location will be here in Boston at the Marriott Boston Cambridge. Being so close to home, it will give everyone working in our Billerica office an opportunity to interact with customers.

The event is specifically aimed at customers. If there’s a feature in the software that you’ve been dying to see, or if you think Trillium Software needs to lead the way in a new direction, you can tell it to the VP of development and even the head of the Trillium division of Harte-Hanks. There are no guarantees your suggestions will make it, but the interaction is valuable to both customer and company.

We’ll have learning sessions, demonstrations and industry experts all contributing to the event. In looking over the mini-site, I can see that the final agenda hasn’t been announced yet, but they have been known to do a great job getting guest speakers, customer presentations and a few key Trillium Software employee presentations.

I had the privilege of attending and presenting at last year’s event in Las Vegas. I heard very positive feedback. Field marketing usually throws in a couple of nice surprises to make the event fun. Last year, one of the highlights was a “motivation speaker” who was anything BUT motivational. There were some Vegas shows that the whole group enjoyed, including Blue Man Group. I’m looking forward to a great time in Boston, too.

Wednesday, November 28, 2007

Wanted: Information Engineer

CNN Money has release a list of the hottest up and coming jobs. Number 3 is the "information engineer", a new title with a $70K to $120K salary potential. It's not surprising. I would estimate that salary range to be low, as corporations compete for resources to build data governance teams.
If you think about it, information engineering as a career move is relatively outsource-safe. Corporations will have a hard time outsourcing data governance, for the following reasons:


  • Data needs to be secure - with many laws on the books about data security, corporations know that it's hard to maintain security if the data is sent off shore.

  • Accessing data is half the problem - data tends to be locked up in silos, often controlled by someone who doesn't want to give it up, or someone who has left the company. Corporations know they need inside resources to work through the politics and opposition to access to the data.

  • No one knows the ins and outs of the company data like a company employee - an outsourced data quality won't understand the industry-specific and corporate-specific data challenges and how to solve them. It'll be tough enough for someone who has complete access to the business users of the data.


Let’s face it, as companies grow and compete, we’re not going to run out of data that needs to be cleansed, standardized, and matched. Building your career around data governance may just be the best move you can make.

Monday, November 26, 2007

Winners and Losers of Data Quality: Nominations

When companies have poor information quality practices, it’s hard to miss. The issues mostly manifest themselves in customer service (CRM) type interactions. As a customer, you may get unwanted or unnecessary contact, too little contact, or you’ll be struck with the feeling that the company doesn’t know you at all during a transaction. It’s more difficult to notice companies with good data quality practices. It’s more a good feeling you get when you do business with them. Good customer service, powered by proper data governance is becoming, more and more, an expected modus operandi.

Starting with this week’s blog entry, I’m going to nominate companies who either exemplify to me the epitome of good data governance with thumbs up, or show signs of data governance inefficiencies with thumbs down.


Thumbs Up: MGM Mirage Resorts and Casinos

I traveled often to Las Vegas in 2007 for industry trade shows, staying at hotels in the MGM system like Mandalay Bay and Monte Carlo, and those outside the MGM system like Caesars Palace. Depending upon who made the reservations for the trip, there were slight variations in my name and/or e-mail address that naturally occur.

Still, I am quite satisfied with the apparent knowledge that MGM has about me. They seem to understand that, as I am not exactly a high roller, discounted rooms and free buffets are appropriate offers for me. I don’t get duplicate e-mails and direct mail pieces from them, even though there was ample opportunity for those types of things to happen this year. The marketing materials that I get from them seem like a service, not at all a bother.

Nicely done, MGM. I’ll be back to bet the daily double at Monmouth some time soon.

Thumbs Down: Hewlett-Packard

Conversely, from my perspective, HP seems to have a data governance problem. Every quarter or so, HP sends me a catalog, listing home office store products. But every catalog comes in duplicate. Sometimes two, sometimes more. This time, one of the catalogs says that I'm "Steve Sarsfield" and the other calls me "Steven Sarsfield". For some reason, the one addressed to Steven also has a line that identifies the street that intersects my address. Not sure why, since that doesn't have anything to do with my official postal address.

So, I'm going to throw some numbers at you HP. Let's say conservatively, you send out 500,000 catalogs each quarter. Let's estimate that 20% of your database contains duplicates - a fairly conservative number. Finally, I'm going to estimate the catalog postage and printing costs at $1 ea.

That would be 20% of 500,000 = 100,000 catalogs that you send unnecessarily. At $1 each, that's $100,000 per mailing wasted. You mail it every quarter? 4 x $100,000 = $400,000 per year.

I know margins on computer hardware are very slim these days. How many computers do you have to sell to add $400k directly to the bottom line? In this one division in this very large company, there seems to be inefficiencies.

Besides that, it’s mildly irksome to receive duplicate catalogs – just more paper that I have to make sure goes into the recycle bin and not into a landfill.
Now, don’t get me wrong. I’ve been very, very happy with my HP printer and PC. Heed this, HP. Despite making some of the best products in the world, I believe that poor data governance is stealing away your goodwill.

Tuesday, November 20, 2007

Creating Structure from Unstructured Data


A lot of focus in the data quality industry has turned to cleansing and standardizing unstructured data. An example of this is shown above.

At Trillium Software, we continue to teach the engine more and more about supply chain and ERP data as well. We can take what would otherwise be very difficult to use description data, and putting it into buckets. The Trillium Software System understands the distinction between an item name, and size, and packaging and is able to standardize that information into proper fields.

Of course, the benefit of this is that if you want to understand how much polypropylene you have in inventory, you can't easily do it with the data at the top of the diagram. However, you can get a complete understanding after it has been put into its proper buckets (data on the lower part of the diagram). It comes in handy for that meeting with the polypropylene sales rep, since now you can fully understand the volume of your purchases.

One of the first customers for whom Trillium accomplished such a task was back in 1996 at a major food manufacturing company. The company had descriptions of ingredients such as “Frozen Carrots”, “Carrots, Frozen”, and “Frz Car” in their supply chain systems, and Trillium was able to sort it out.

More recently, there was Bombardier, which is available as a case study. It took only three months for a small team of engineers to design, develop, and implement a new process for standardizing 2.9 million inventory items using the Trillium Software System. Now, reports that once took months to generate are created weekly, providing high-quality information for streamlining procurement, reducing inventory, increasing on-time delivery, and boosting sales.

This is my last web log entry before the Thanksgiving break. Have a happy and safe holiday!


Sunday, November 18, 2007

Postal Validation for the Australia Post



One of the very basic functions you can offer as a data quality vendor is to validate data against the local postal services. With this validation, the postal service is saying that it has tested your software and it agrees that your product can effectively cleanse local data. The customer of said products then become eligible for postal discounts and save money when they mail to their customers. The US, Canada, and Australia have their own way of testing software to ensure results.
I took a look at the Australia Post web site to who was ON the latest AMAS (Address Matching Approval System) list and who was missing. It's interesting to note that, as of this posting, only two of the major enterprise software vendors (those in the Gartner Magic Quadrant 'leaders' section) now support AMAS.
According to the AMAS list, only Trillium Software and Business Objects (FirstLogic) support the Australian postal system with software certified by Australia Post.
Sure, a good data quality solution should have connectivity - it should integrate well with your systems. It should be fast, and it should support the business user as well as the technologist. It should have many other features that meet the needs of a global company. However, postal validation for global name and address data is basic. It helps the marketing department hit their targets, it helps the billing department's invoices reach the customer, and it keeps revenue flowing into an organization.

Saturday, November 17, 2007

Weekend Edition: Unique Christmas Gifts

On the weekend edition, I'm apt to take a departure from normal data governance and data quality topics and talk about... well, anything. This weekend, I found two very unique holiday gifts that I wanted to share with you. I'm not making money off this, just passing an idea along.
The first is shown here. My friend Joe Unni is offering a line of very unique, handcrafted clocks. In his custom furniture business, Joe is always on the lookout for interesting and exotic pieces of wood. He then meticulously sands and finishes them, and produces some very cool clocks. I had a beer with Joe a few nights ago, and he told me that he was able to get his hands on some 50,000 year old Kauri wood that was previously buried in New Zealand. The wood isn't petrified, but has some very unique qualities that you can't find anywhere else. He also showed me some other exotic woods that naturally come in oranges and purples. I think it's just a great gift idea. If you want one, it's best to give him a call on the number listed on his Web site.

Winter "Gardening"
I'll tell you my second unique gift idea, as long as you don't tell my sister. My family are avid gardeners, and it doesn't make sense to give gardening items in the winter time. If you've ever tried to shop for a gardener, you know that gardening items are hard to find in November and December. However, I think my sister will really enjoy the home mushroom kit offered by Mushroom Adventures. The kits have everything you need to grow a couple of crops of mushrooms. The mushroom soil comes inoculated with the mushroom mycelium (the fungus). You just add water and wait for the portabellas. Yes, it is a little "out there" in terms of gifts, but perfect for my family. It beats a boring pair of gloves.

Thursday, November 15, 2007

Siebel UCM at Schneider National


A case study on Schneider National came out today on TechTarget. This appears to be a case study that Oracle/Siebel drove, but Trillium Software is a big part of the success here. It's a classic case of a small company that grew rapidly without regard to information quality processes. Now, smartly, company is looking to clean up the 50+ databases of customer information to get a better handle on its business.
I was intrigued by this:

"Schneider's source data comes from a monolithic mainframe-based system that houses customer information such as billing and customer orders. The UCM also ties into a Lawson ERP system on the back end."
That's why it's so important for data quality vendors to support many platforms and offer many ways to integrate data quality processes. Here's an example of a customer that may possibly need: 1) Mainframe support; 2) Siebel UCM support; and 3) a way to hook up Lawson ERP to the DQ processes. Frankly, the vendors that have folded into the ETL tools or BI tools can't support all that. Only an enterprise DQ platform can do it all, and any future integration that may come up.

Wednesday, November 14, 2007

Rapid Business Growth and Data Quality

My colleagues and I just completed a webinar for Trillium Software that discusses rapid business growth and it's negative effects on data quality. It was called "Improving Data Quality in SAP Netweaver Environments with Trillium Software". In hindsight, it was probably a poor name. The event focussed more on how rapid growth sets up a company for huge data quality problems and how you can use re-usable data quality processes to tackle the problem. We had only about half of the attendees that we usually get.
But, wow! Where there were shortcomings in picking a name, we must have struck a chord with the content.
I had some great follow-up calls directly after the event to follow-up and ask questions. A pharma company called and wanted to chat about doctor's data. You know, doctor data can be quite challenging in that they tend to work in multiple facilities. So, in order to match up duplicate doctors, you have to go off of non-name and address components and compare things like tax id number or provider number.
Another company, who I'll describe as retailer, called to talk about some of the professional services that I mentioned in my webinar. More and more, staffing a data quality initiative is a problem, but we can usually help out with our professional services team. The company wanted to go back and clean up some legacy data for re-use in a marketing campaign. They were asking for advice on how to get started. Since Trillium SW now has strategic planning services, run by some talented folks like my colleague Jim Orr, we were able to help.
That was just two of the eight or so follow-up calls I got. Great projects from some great companies.
As for the webinar, if you get a chance, please check it out and let me know what you think. The idea is that as companies grow, the fact that they are trying to grow rapidly, often with little regard to information quality, can be a major problem. Not only silos of data crop up, but silos of information quality processes appear. Everyone begins to put their own unique spin on what the data should look like, and before you know it, the information quality problem gets worse.

Tuesday, November 13, 2007

Data Quality in Japan

Data quality problems are global! This is a poignant video on data quality in Japan as it relates to the governmental retirement program in Japan. It shows a company called Agrex who has used Trillium Software to cleanse data and find fraud in the Japanese system. Even if you don't speak Japanese, you can still follow the plot, so give it a try:

Monday, November 12, 2007

MIT's Information Quality Program

I just attended MIT's 12th International Conference on Information Quality in Cambridge this weekend. I have to say that I am amazed and delighted how informed the industry is getting about data quality. As I sat through each of the 20 minute talks, it became clear who was more theoretical than practical.... who were the presenters who were throwing it out there and trying to sell it, and those who have done it. Based on the Q&A sessions, the attendees understood this as well. The industry is getting smarter about how to get funding for and implement data quality.
You want an example? I sat in on a case study that BT (British Telecom) delivered by Nigel Turner and Dave Evans. Good guys and funny as they peppered in insults to each other as part of their shtick. Anyway, talk about amazing returns on investment. Back in 2003, BT realized that much of their investment in new enterprise application was being hurt by poor data quality. Nigel and Dave were part of a team that was able to sell the impact of information quality by selling the ROI. The key seems to be to talk about what will happen if you institute good data quality practices, but also include, as they say, the "do nothing" option. What will happen to the data if we continue down a path and do nothing about information quality on any given project? How much will it cost in 1 year, 3 years, or 5 years. That's a good strategy to take when delivering your data quality investment presentation to corporate. Fear, uncertainty and doubt always takes you far.

BT tracked ROI and their data governance initiative and was able to show hundreds of millions of dollars (or British pounds) worth of savings and benefit. Wow! Fantastic stuff.

This is one of the more academic venues for researching data quality, and therefore less commercial. The presentations were interesting in that they often gave you another perspective on the problem of data quality. Some of the information were clearly cutting edge, but I sat in on a couple of sessions that flashed up theoretical formulas for calculating data quality. Yup, that's nice, but when it comes to convincing your boss that you need to invest in DQ, they don't generally want to relive calculus class. Rather, they want to know how it'll effect me and my company. That's how you'll get the funding you'll need for the task at hand. BT had it right.
I talked briefly to Rich Wang, who is the founder and face of the MIT IQ program. He mentioned the possibility of speaking at the July event in Cambridge, and I'm excited by the prospect of it all. I have many ideas... many ideas...

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.