Tuesday, July 1, 2008

The Soft Costs of Information Quality

Choosing data quality technology simply on price could mean that you end up paying far more than you need to, thanks to the huge differences in how the products solve the problems. While your instinct may tell you to focus solely on the price of your data quality tool, your big costs come in less visible areas – like time to implement, re-usability, time spend preprocessing data so that it reads into the tool, performance and overall learning curve.

As if it wasn’t confusing enough for the technology buyer having to choose between a desktop and enterprise-class technology, local and global solutions, or built-in solution vs. universal architecture, now you have to work out soft costs too. But you need to know that there are some huge differences in the way the technologies are implemented and work day-to-day, and those differences will impact your soft costs.

So just what should you look for to limit soft costs when selecting an information quality solution? Here are a few suggestions:

  • Does the data quality solution understand data at the field level only or can it see the big picture? For example, can you pass it an address that’s a blob of text, or do you need to pass it individual first name, last name, address, city, state, postal code lines. Importance: If the data is misfielded, you’ll have a LOT of work to do to get it ready for the field level solution.
  • On a similar note, what is the approach to multi-country data? Is there an easy way to pre-process mixed global data or is it a manual process? Importance: If the data has mixed country of origin, again you’ll have to do a lot of preprocessing work to do to get it ready.
  • What is the solution’s approach to complex records like “John and Diane Cougar Mellencamp DBA John Cougar”? Does the solution have the intelligence to understand all of those people in a record or do I have to post-process this name?
  • Despite the look of the user interface, is the product a real application or is it a development environment? Importance: In a real application, an error will be indicated if you pass in some wild and crazy data. In a development environment, even slight data quirks will cause nothing to run and just getting the application to run can be very time consuming and wasteful.
  • How hard is it to build a process? As a user you’ll need to know how to build an entire end-to-end process with the product. During proof of concept, the data quality vendor may hide that from you. Importance: Whether you’re using it on one project, or across many projects, you’re eventually going to want to build or modify a process. You should know up-front how hard this is. It shouldn’t be a mystery, and you need to follow this during the proof-of-concept.
  • Are web services the only real-time implementation strategy? Importance: Compared to a scalable application server, web services can be slow and actually add costs to the implementation.
  • Does the application actually use its own address correction worldwide or a third party solution? Importance: Understanding how the application solves certain problems will let you understand how much support you’ll get from the company. If something breaks, it’s easier for the program’s originator to fix it. A company using a lot of third party applications may have challenges with this.
  • Does the application have different ways to find duplicates? Importance: During a complex clean-up, you may want to dedupe your records based on, say e-mail and name for the first pass. But what about the records where your e-mail isn’t populated? For those records, you’ll need to go back and use other attributes to match. The ability to multi-match allows you to achieve cleaner, more efficient data by using whatever attributes are best in your specific data.

I could go on. The point is – there are many technical, in-the-weeds differences between vendors, and those differences have a BIG impact on your ability to deliver information quality. The best way to understand a data quality vendor’s solution is to look over their shoulder during the proof-of-concept. Ask questions. Challenge the steps needed to cleanse your data. Diligence today will save you from having to buy Excedrin tomorrow.

No comments:

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.