Showing posts with label mdm. Show all posts
Showing posts with label mdm. Show all posts

Monday, April 2, 2012

Why Code Base is Important in Vendor Selection


The horticulture of software

Spring has sprung here in the northern hemisphere and mind turn to the plant life that will be sprouting all across our home towns. The new growth has me thinking about the similarities between horticulture and the code base of our data management solutions.

Reviewing software solutions before you buy is a major effort for users and/or vendor selection committees. Much time is spent on looking at whether the features of the product will meet team needs. Features are so important that companies will spend time to produce RFPs with extensive feature lists. They may even require a proof of concept; the vendor must install and test the solution in the purchaser’s work environment. This goes for those applications used to manage data, but also many other applications.

However, I believe that buyers should carefully look at the style of growth to the code base. In the data management field, we undergone decades of technology combined with decades of market consolidation.  The code base for the application you’re about to buy may have grown from the following horticultural strategy:

  • Grafting  –  A large software company sees potential in the data management field and begins to acquire companies and grafting them together to create a solution. Sometimes the acquisition isn’t done by technologists, but by upper management seeking to fill holes in the product line. Sometimes they even buy competing technologies, leaving everyone trying to figure out who will win. Sometimes the graft doesn’t take.
  • Old Growth – Companies have an existing technology that has worked for decades. However, back in 1990 when they released version 1.0, JAVA was experimental and not the dominant force it is today.  FORTRAN was the preferred programming language and COBOL copybooks were the data model.  I know some companies in the data management market have spent millions updating old growth code to be more competitive in this market, and some others who have not.  This becomes a dilemma for all vendors at some point.  When do you prune out the dead wood?
  • Sapling – Companies who are just breaking into the data management marketplace and have a good-looking start for data management.  However, the sapling doesn’t yet have all the branches you want on it.  Will the sapling survive among the other deciduous solutions in the market?

When you’re selecting a vendor, you ideally want a code base that is mature, but not too mature.  You want limited grafting.   The growth of the code and the grafting affects:

  • Speed of innovation for the vendor
  • Customization for you
  • Future expansion for both of you
  • The age and experience of the technologists necessary to operate it
  • Consulting requirements
  • Ability to cross-train personnel (E.g. DI people running DQ and vice versa)

So, when you’re selecting a data management solution, or any technology solution, don’t just compare the features, but take a look at how the product grew to where it is today.  Look for the solution in the optimal stage of growth that will meet your needs today and those for the future.


Tuesday, November 30, 2010

Match Mitigation: When Algorithms Aren’t Enough

I’d like to get a little technical on this post. I try to keep my posts business-friendly, but sometimes there's importance in detail. If none of this post makes any sense to you, I wrote a sort of primer on how matching works in many data quality tools, which you can get here.

Matching Algorithms
When you use a data quality tool, you’re often using matching algorithms and rules to make decisions on whether records match or not.  You might be using deterministic algorithms like Jaro, SoundEx and Metaphones. You might also be using probabilistic matching algorithms.

In many tools, you can set the rules to be tight where the software uses tougher criteria to determine a match, or loose where the software is not so particular. Tight and loose matches are important because you may have strict rules for putting records together, like customers of a bank, or not so strict rules, like when you’re putting together a customer list for marketing purposes.

What to do with Matches
Once data has been processed through the matcher, there are several possible outcomes. Between any two given records, the matcher may find:

  • No relationship
  • Match – the matcher found a definite match based on the criteria given
  • Suspect – the matcher thinks it found a match but is not confident. The results should be manually reviewed.
It’s that last category that the tough one.  Mitigating the suspect matches is the most time-consuming follow-up task after the matching is complete. Envision a million record database where you have 20,000 suspect matches.   That’s still going to take you some time to review.

Some of the newer (and cooler) tools offer strategies for dealing with suspect matches. The tools will present the suspect matches in a graphical user interface and allow users to pick which relationships are accurate and which are not. For example, Talend now offers a data stewardship console that lets you pick and choose records and attributes that will make up a best of breed record.

The goal, of course, is to not have suspect matches, so tuning the matches and limiting the suspect matches is the ultimate. The newest tools will make this easy. Some of the legacy tools make this hard.

Match mitigation is perhaps one of the most often overlooked processes of data quality. Don’t overlook it in your planning and processes.

Thursday, January 21, 2010

ETL, Data Quality and MDM for Mid-sized Business


Is data quality a luxury that only large companies should be able to afford?  Of course the answer is no. Your company should be paying attention to data quality no matter if you are a Fortune 1000 or a startup. Like a toothache, poor data quality will never get better on its own.

As a company naturally grows, the effects of poor data quality multiply.  When a small company expands, it naturally develops new IT systems. Mergers often bring in new IT systems, too. The impact of poor data quality slowly invades and hinders the company’s ability to service customers, keep the supply chain efficient and understand its own business. Paying attention to data quality early and often is a winning strategy for even the small and medium-sized enterprise (SME).

However, SME’s have challenges with the investment needed in enterprise level software. While it’s true that the benefit often outweighs the costs, it is difficult for the typical SME to invest in the license, maintenance and services needed to implement a major data integration, data quality or MDM solution.

At the beginning of this year, I started with a new employer, Talend. I became interested in them because they were offering something completely different in our world – open source data integration, data quality and MDM.  If you go to the Talend Web site, you can download some amazing free software, like:
  • a fully functional, very cool data integration package (ETL) called Talend Open Studio
  • a data profiling tool, called Talend Open Profiler, providing charts and graphs and some very useful analytics on your data
The two packages sit on top of a database, typically MySQL – also an open source success.

For these solutions, Talend uses a business model similar to what my friend Jim Harris has just blogged about – Freemium. Under this new model, free open source content is made available to everyone—providing the opportunity to “up-sell” premium content to a percentage of the audience. Talend works like this.  You can enhance your experience from Talend Open Studio by purchasing Talend Integration Suite (in various flavors).  You can take your data quality initiative to the next level by upgrading Talend Open Profiler to Talend Data Quality.

If you want to take the combined data integration and data quality to an even higher level, Talend just announced a complete Master Data Management (MDM) solution, which you can use in a more enterprise-wide approach to data governance. There’s a very inexpensive place to start and an evolutionary path your company can take as it matures its data management strategy.

The solutions have been made possible by the combined efforts of the open source community and Talend, the corporation. If you’d like, you can take a peek at some source code, use the basic software and try your hand at coding an enhancement. Sharing that enhancement with community will only lead to a world full of better data, and that’s a very good thing.

Saturday, September 20, 2008

New Data Governance Books

A couple of new, important books hit the streets this month. I’m adding these books to my recommended reading list.

Data Driven: Profiting from Your Most Important Business Asset is Tom Redmond’s new book making the most of your data to sharpen your company's competitive edge and enhance its profitability. I like how Tom uses real-life metaphors in this book to simplify the concepts of governing your data.

Master Data Management is David Loshin’s new book that provides help for both business and technology managers as they strive to improve data quality. Among the topics covered are strategic planning, managing organizational change and the integration of systems and business processes to achieve better data.

Both Tom and David have written several books on data quality and master data management, and I think their material gets stronger and stronger as they plug in new experiences and reference new strategies.

EDIT: In April of 2009, I also released my own book on data governance called "The Data Governance Imperative".
Check it out.>>

Thursday, December 20, 2007

MDM Readiness Kit

I'm excited that Trillium Software is now offering a Master Data Management Readiness Kit. It represents some of the best thought leadership pieces that Trillium Software has produced yet, plus a smattering of industry knowledge about master data management. Certainly this kit is worth a download, if you're implementing, or thinking about implementing a master data management strategy at your company.

The kit includes the Gartner Magic Quadrant for Data Quality Tools 2007, a Data Quality Project Planning Checklist for MDM, an UMB Bank Case Study for Data Governance, and section on how to build a Business Case for Data Quality and MDM.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.