Thursday, July 24, 2008

Forget the Data. Eat the Ice Cream.

It’s summer and time for vacations. Even so, it’s difficult for a data-centric guy like me to shut off thoughts of information quality, even during times of rest and relaxation.
Case in point, my family and I just took a road trip from Boston to Burlington, VT to visit the shores of Lake Champlain. We loaded up the mini-van and headed north. Along the way, you drive along beautiful RT 89, which winds its way through the green mountains and past the capital - Montpelier.
No trip to western Vermont is complete without a trip to the Ben and Jerry’s ice cream manufacturing plant in Waterbury. They offer a tour of the plant and serve up a sample of the freshly made flavor of the day at the end. The kids were very excited.
However, when I see a manufacturing process, my mind immediately turns to data. As the tour guide spouted off statistics about how much of any given ingredient they use, and which flavor was the most popular (Cherry Garcia), my thoughts turned to the trustworthiness of the data behind it. I wanted him to back it up by telling me what ERP system they used and what data quality processes were in place to ensure the utmost accuracy in the manufacturing process. Inside, I wondered if they had the data to negotiate properly with the ingredients vendors and if they really knew how many heath bars, for example, they were buying across all of their manufacturing plants. Just having the clean data and accurate metrics around their purchasing processes could save them thousands and thousands of dollars.
The tour guide talked about a Jack Daniels flavored ice cream that was now in the “flavor graveyard” mostly because the main ingredient was disappearing from the production floor. I thought about inventory controls and processes that could be put in place to stop employee pilfering.
It went on and on. The psychosis continued until my daughter exclaimed “Dad. This is the coolest thing ever! That’s how they make Chunky Monkey!” She was right. It was perhaps the coolest thing ever to see how they made something we use nearly every day. It was cool to take a peak inside the corporate culture of Ben and Jerry’s. It popped me back into reality.
Take your vacation this year, but remember that life isn’t only about the data. Remember to eat the ice cream and enjoy.

Tuesday, July 1, 2008

The Soft Costs of Information Quality

Choosing data quality technology simply on price could mean that you end up paying far more than you need to, thanks to the huge differences in how the products solve the problems. While your instinct may tell you to focus solely on the price of your data quality tool, your big costs come in less visible areas – like time to implement, re-usability, time spend preprocessing data so that it reads into the tool, performance and overall learning curve.

As if it wasn’t confusing enough for the technology buyer having to choose between a desktop and enterprise-class technology, local and global solutions, or built-in solution vs. universal architecture, now you have to work out soft costs too. But you need to know that there are some huge differences in the way the technologies are implemented and work day-to-day, and those differences will impact your soft costs.

So just what should you look for to limit soft costs when selecting an information quality solution? Here are a few suggestions:

  • Does the data quality solution understand data at the field level only or can it see the big picture? For example, can you pass it an address that’s a blob of text, or do you need to pass it individual first name, last name, address, city, state, postal code lines. Importance: If the data is misfielded, you’ll have a LOT of work to do to get it ready for the field level solution.
  • On a similar note, what is the approach to multi-country data? Is there an easy way to pre-process mixed global data or is it a manual process? Importance: If the data has mixed country of origin, again you’ll have to do a lot of preprocessing work to do to get it ready.
  • What is the solution’s approach to complex records like “John and Diane Cougar Mellencamp DBA John Cougar”? Does the solution have the intelligence to understand all of those people in a record or do I have to post-process this name?
  • Despite the look of the user interface, is the product a real application or is it a development environment? Importance: In a real application, an error will be indicated if you pass in some wild and crazy data. In a development environment, even slight data quirks will cause nothing to run and just getting the application to run can be very time consuming and wasteful.
  • How hard is it to build a process? As a user you’ll need to know how to build an entire end-to-end process with the product. During proof of concept, the data quality vendor may hide that from you. Importance: Whether you’re using it on one project, or across many projects, you’re eventually going to want to build or modify a process. You should know up-front how hard this is. It shouldn’t be a mystery, and you need to follow this during the proof-of-concept.
  • Are web services the only real-time implementation strategy? Importance: Compared to a scalable application server, web services can be slow and actually add costs to the implementation.
  • Does the application actually use its own address correction worldwide or a third party solution? Importance: Understanding how the application solves certain problems will let you understand how much support you’ll get from the company. If something breaks, it’s easier for the program’s originator to fix it. A company using a lot of third party applications may have challenges with this.
  • Does the application have different ways to find duplicates? Importance: During a complex clean-up, you may want to dedupe your records based on, say e-mail and name for the first pass. But what about the records where your e-mail isn’t populated? For those records, you’ll need to go back and use other attributes to match. The ability to multi-match allows you to achieve cleaner, more efficient data by using whatever attributes are best in your specific data.

I could go on. The point is – there are many technical, in-the-weeds differences between vendors, and those differences have a BIG impact on your ability to deliver information quality. The best way to understand a data quality vendor’s solution is to look over their shoulder during the proof-of-concept. Ask questions. Challenge the steps needed to cleanse your data. Diligence today will save you from having to buy Excedrin tomorrow.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.