Showing posts with label data steward. Show all posts
Showing posts with label data steward. Show all posts

Tuesday, March 15, 2011

Open Source Data Management or Do-it-Yourself

With the tough economy people are still cutting back on corporate spending.  There is a sense of urgency to just get things done, and sometimes that can lead to hand-coding your own data integration, data quality or MDM functions. When you begin to develop your plan and strategies for data management, you have to think about all the hidden costs of getting solutions out-of-the-box versus building on your own.

Reusability is one key consideration. Using data management technologies that only plug into one system just doesn’t make sense.  It’s difficult to get that re-usability with custom code, unless your programmers have high visibility into other projects. On the other hand, all tool vendors, even open source ones have pressure from their clients to support multiple databases and business solutions.  Open source solutions are built to work in a wider variety of architectures. You can move your data management processes between JD Edwards and SAP and SalesForce, for example, with relative ease.

Indemnity is another consideration. What if something goes wrong with your home-grown solution after the chief architect leaves his job? Who are you going to call? If something goes wrong with your open source solution, you can turn to the community or call the vendor for support.

Long-term costs are yet another issue.  Home-grown solutions have the tendency to start cheap and get more expensive as time goes on.  It’s difficult to manage custom code, especially if it is poorly documented. You hire consultants to manage code.  Eventually, you have to rip and replace and that can be costly.

You should consider your human resources, too. Does it make sense to have a team work on hand-coding database extractions and transformation, or would the total cost/benefit be better if you used an open source data integration tool? It might just free up some of your programmers to pursue more important, ROI-centric ventures.

If you’re thinking of cooking up your own technical solutions for data management, hoping to just get it done, think again. Your most economical solution might just be to leverage the community of experts and go with open source.

Tuesday, November 30, 2010

Match Mitigation: When Algorithms Aren’t Enough

I’d like to get a little technical on this post. I try to keep my posts business-friendly, but sometimes there's importance in detail. If none of this post makes any sense to you, I wrote a sort of primer on how matching works in many data quality tools, which you can get here.

Matching Algorithms
When you use a data quality tool, you’re often using matching algorithms and rules to make decisions on whether records match or not.  You might be using deterministic algorithms like Jaro, SoundEx and Metaphones. You might also be using probabilistic matching algorithms.

In many tools, you can set the rules to be tight where the software uses tougher criteria to determine a match, or loose where the software is not so particular. Tight and loose matches are important because you may have strict rules for putting records together, like customers of a bank, or not so strict rules, like when you’re putting together a customer list for marketing purposes.

What to do with Matches
Once data has been processed through the matcher, there are several possible outcomes. Between any two given records, the matcher may find:

  • No relationship
  • Match – the matcher found a definite match based on the criteria given
  • Suspect – the matcher thinks it found a match but is not confident. The results should be manually reviewed.
It’s that last category that the tough one.  Mitigating the suspect matches is the most time-consuming follow-up task after the matching is complete. Envision a million record database where you have 20,000 suspect matches.   That’s still going to take you some time to review.

Some of the newer (and cooler) tools offer strategies for dealing with suspect matches. The tools will present the suspect matches in a graphical user interface and allow users to pick which relationships are accurate and which are not. For example, Talend now offers a data stewardship console that lets you pick and choose records and attributes that will make up a best of breed record.

The goal, of course, is to not have suspect matches, so tuning the matches and limiting the suspect matches is the ultimate. The newest tools will make this easy. Some of the legacy tools make this hard.

Match mitigation is perhaps one of the most often overlooked processes of data quality. Don’t overlook it in your planning and processes.

Tuesday, February 16, 2010

The Secret Ingredient in Major IT Initiatives

One of my first jobs was that of assistant cook at a summer camp.  (In this case, the term ‘cook’ was loosely applied meaning to scrub pots and pans for the head cook.) It was there I learned that most cooks have ingredients that they tend to use more often.  The cook at Camp Marlin tended to use honey where applicable.  Food TV star Emeril likes to use garlic and pork fat.  Some cooks add a little hot pepper to their chocolate recipes – it is said to bring out the flavor of the chocolate.  Definitely a secret ingredient.
For head chefs taking on major IT initiatives the secret ingredient is always data quality technology. Attention to data quality doesn’t make the recipe of an IT initiative alone so much as it makes an IT initiative better.  Let’s take a look at how this happens.

Profiling
No matter what the project, data profiling provides a complete understanding of the data before the project team attempts to migrate it. This can help the project team create a more accurate plan for integration.  On the other hand, it is ill-advised to migrate data to your new solution as-is, as it can lead to major costs over-runs and project delays as you have to load and reload it.

Customer Relationship Management (CRM)
By using data quality technology in CRM, the organization will benefit from a cleaner customer list with fewer duplicate records. Data quality technology can work as a real-time process, limiting the amount of typos and duplicates in the system, thus leading to improved call center efficiency.  Data profiling can also help an organization understand and monitor the quality of a purchased list for integration will avoid issues with third-party data.

Enterprise Resource Planning (ERP) and Supply Chain Management (SCM)

If data is accurate, you will have a more complete picture of the supply chain. Data quality technology can be used to more accurately report inventory levels, lowering inventory costs. When you make it part of your ERP project, you may also be able to improve bargaining power with suppliers by gaining improved intelligence about their corporate buying power. 

Data Warehouse and Business  Intelligence
Data quality helps disparate data sources to act as one when migrated to a data warehouse. Data quality makes data warehouse possible by standardizing disparate data. You will be able to generate more accurate reports when trying to understand sales patterns, revenue, customer demographics and more.

Master Data Management (MDM)
Data quality is a key component of master data management.     An integral part of making applications communicate and share data is to have standardized data.  MDM enhances the basic premise of data quality with additional features like persistent keys, a graphical user interface to mitigate matching, the ability to publish and subscribe to enterprise applications, and more.

So keep in mind, when you decide to improve data quality, it is often because of your need to make a major IT initiative even stronger.  In most projects, data quality is the secret ingredient to make your IT projects extraordinary.  Share the recipe.

Friday, June 12, 2009

Interview on Data Quality Pro.com

From Data Quality Pro.com

If you are active within the data quality and data governance community then chances are you will have come across Steve Sarsfield and his Data Governance and Data Quality Insider blog.
Steve has also recently published an excellent book, aptly titled "The Data Governance Imperative" so we recently caught up with him to find out more about some of the topics in the book and to pose some of the many questions organisations face when launching data governance initiatives.


Read the interview>>


Plus, at the end of the interview we provide details of how to win a copy of "The Data Governance Imperative".


Sunday, May 4, 2008

Data Governance Structure and Organization Webinar

My colleague Jim Orr just did a great job delivering a webinar on data governance. You can see a replay of the webinar in case you missed it. Jim is our Data Quality Practice Leader and he has a very positive point of view when it comes to developing a successful data governance strategy.
In this webinar, Jim talks exclusively about the structure and the organization behind data governance. If you believe that data governance is people, process and technology, this webinar covers the "people" side of the equation.

Wednesday, April 9, 2008

Must-read Analyst Reports on Data Governance

If you’re thinking of implementing a data governance strategy at your company, here are some key analyst reports I believe are a must-read.

Data Governance: What Works And What Doesn't
by Rob Karel, Forrester
A high-level overview of data governance strategies. It’s a great report to hand to a c-level executive in your company who may need some nudging.

Data Governance Strategies
by Philip Russom and TDWI
A comprehensive overview of data governance, including extensive research and case studies. This one is hot off the presses from TDWI. Sponsored by many of the top information quality vendors.

The Forrester Wave™: Information Quality Software by J. Paul Kirby, Forrester
This report covers the strengths and weaknesses of top information quality software vendors. Many of the vendors covered here have been gobbled up by other companies, but the report is still worth a read. $$

Best Practices for Data Stewardship
Magic Quadrant for Data Quality Tools

by Ted Friedman, Gartner
I have included the names of two of Ted’s reports on this list, but Ted offers much insight in many forms. He has written and spoken often on the topic. (When you get to the Gartner web site, you're going to have to search on the above terms as Gartner makes it difficult to link directly.) $$
Ed Note: The latest quadrant (2008) is now available here.

The case for a data quality platform
Philip Howard, Bloor Research
Andy Hayler and Philip Howard are prolific writers on information quality at Bloor Research. They bring an international flair to the subject that you won’t find in the rest.

Thursday, January 24, 2008

The Rise of the Business-focused Data Steward


In a December 2007 research note from Gartner entitled “Best Practices for Data Stewardship”, Gartner give some very practical and accurate advice on starting and executing a data steward program. They reiterate this advice in a press release issued this month. The new advice is to have business people become your data stewards. So, in marketing you have someone assigned as a data steward to work with the IT. The business person knows the meaning of the data as well as where they want to go with it. They become responsible for the data, and owners of it.

It’s a great concept, and one that I expect will become more and more a reality this year. However, there is some growth that needs to happen in the software industry. There are very few tools that serve a business-focused data steward. Most tools on the market are additional features that have been tacked on to IT-focused tools. Sure, a data profiler can show some cool charts and graphs, but not many business users want to learn how to use them. Should a business user really have to learn about metadata, entities, and attributes in order to find out if the data meets the need of the organization?

Rather, a marketing person wants to know if (s)he can do an offer mailing without getting most of it back. A CIO wants to know if a customer database that they just got as part of a merger has complete and current information. Accounting wants to know that they have valid tax ID numbers (social security numbers) for customers with whom they give credit, and the compliance team want to know that they are stopping those listed on the OFAC from opening accounts. Metadata? They don’t care. They just need the metrics to track the business problem.

This was really the concept that Trillium Software had when we designed TS Insight, our data quality reporting tool. The tool uses business rules and analysis from our profiler and presents them in a very friendly way – via a web browser. The more technical users can set up regular updates that display compliance with the business rules. The less technical users can open their web browsers to their customized page and metrics that are important to them. The business rules can track pretty much anything about the data without being too technical.

TS Insight is still in ramp-up for us. We came out with version 1.0 last year and we’re about to release version 2.5 this quarter. Still, we have a big head start on anyone else in the industry with this tool, serving the needs of the business-focused data steward. If this is something you’d like to see, please send me an e-mail and I’ll set up a demo.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.