Data Governance Insider

Tuesday, June 3, 2008

Trillium Software News Items

A couple of big items hit the news wire today from Trillium Software that are significant for data quality enthusiasts.

Item One:
Trillium Software cleansed and matched the huge database of Loyalty Management Group (LMG), the database company that owns the Nectar and Air Miles customer loyalty schemes in the UK and Europe.
Significance:
LMG has saved £150,000 by using data quality software to cleanse its mailing list, which is the largest in Europe, some 10 million customers strong. I believe this speaks to Trillium Software’s outstanding scalability and global data support. This particular implementation is an Oracle database with Trillium Software as the data cleansing process.

Item Two:
Trillium Software delivered the latest version of the Trillium Software System version 11.5. The software now offers expanded cleansing capabilities across a broader range of countries.
Significance:
Again, global data is a key take-away here. Being able to handle all of the cultural challenges you encounter with international data sets is a problem that requires continual improvement from data quality vendors. Here, Trillium is leveraging their parent company’s buyout of Global Address to improve the Trillium technology.

Item Three:
Trillium Software released a new mainframe version of version 11.5, too.
Significance:
Trillium Software continues to support data quality processes on the mainframe. Unfortunately, you don’t see other enterprise software companies offering many new mainframe releases these days, despite the fact that the mainframe is still very much a viable and vibrant for managing data.

Monday, May 19, 2008

Unusual Data Quality Problems

When I talk to folks who are struggling with data quality issues, there are some who are worried that they have data unlike any data anyone has ever seen. Often there’s a nervous laugh in the voice as if the data is so unusual and so poor that an automated solution can’t possibly help.

Yes, there are wide variations in data quality and consistency and it might be unlike any we’ve seen. On the other hand, we’ve seen a lot of unusual data over the years. For example:

A major motorcycle manufacturer used data quality tools to pull out nicknames from their customer records. Many of the names they had acquired for their prospect list were from motorcycle events and contests where the entries were, shall we say, colorful. The name fields contained data like “John the Mad Dog Smith” or “Frank Motor-head Jones”. The client used the tool to separate the name from the nickname, making it a more valuable marketing list.
One major utility company used our data quality tools to identify and record notations on meter-reader records that were important to keep for operational uses, but not in the customer billing record. Upon analysis of the data, the company noticed random text like “LDIY" and "MOR" along with the customer records. After some investigation, they figured out that LDIY meant “Large Dog in Yard” which was particularly important for meter readers. MOR meant “Meter in Right, which was also valuable. The readers were given their own notes field, so that they could maintain the integrity of the name and address while also keeping this valuable data. IT probably saved a lot of meter readers from dog bite situations.
Banks have used our data quality tools to separate items like "John and Judy Smith/221453789 ITF George Smith". The organization wanted to consider this type of record as three separate records "John Smith" and "Judy Smith" and "George Smith" with obvious linkage between the individuals. This type of data is actually quite common on mainframe migrations.
A food manufacturer standardizes and cleanses ingredient names to get better control of manufacturing costs. In data from their worldwide manufacturing plants, an ingredient might be “carrots” “chopped frozen carrots” “frozen carrots, chopped” “chopped carrots, frozen” and so on. (Not to mention all the possible abbreviations for the words carrots, chopped and frozen.) Without standardization of these ingredients, there was really no way to tell how many carrots the company purchased worldwide. There was no bargaining leverage with the carrot supplier, and all the other ingredient suppliers, until the data was fixed.

Not all data quality solutions can handle all of these types of anomalies. They will pass these "odd" values without attempting to cleanse them. It’s key to have a system that will learn from your data and allow you to develop business rules that meet the organization’s needs.

Now there are times, quite frankly, when data gets so bad, that automated tools can do nothing about it, but that’s where data profiling comes in. Before you attempt to cleanse or migrate data, you should profile it to have a complete understanding of it. This will let you weigh the cost of fixing very poor data against the value that it will bring to the organization.

Wednesday, May 14, 2008

The Best Books on Data Governance

Is there a comprehensive book on data governance that we should all read to achieve success? At the time of this post, I'm not sure there is. I haven't seen it yet. If you think about it, such a book would make War and Peace look like a Harlequin novel in terms of book size in order to cover the all aspects of the topic. Instead, we really must become students of data governance and begin to understand large knowledge areas such as 1) how to optimize and manage processes; 2) how to manage teams and projects; 3) public relations and marketing for internal project promotion; and 4) how to implement technologies to achieve data governance, just to name a few.

I’ve recently added an Amazon widget to my blog that lists some printed books on data governance-related topics. The books cover the four areas I’ve mentioned. As summer vacation arrives, now is the time to buy your books for the beach and read up! After all, what could be more relaxing on a July afternoon than a big frozen margarita and the book “Business Process Improvement: The Breakthrough Strategy for Total Quality, Productivity, and Competitiveness” by James Harrington?

The Amazon affiliate program generates just a few pennies for each book, but what money it does generate will be donated to charity. The appeal of the Amazon widget is that it's a good way to store a list of books and provide direct links to buy. If you have some suggestions to add to the list, please share them.

EDIT: My book on data governance is now available on Amazon. The Data Governance Imperative.

Sunday, May 4, 2008

Data Governance Structure and Organization Webinar

My colleague Jim Orr just did a great job delivering a webinar on data governance. You can see a replay of the webinar in case you missed it. Jim is our Data Quality Practice Leader and he has a very positive point of view when it comes to developing a successful data governance strategy.
In this webinar, Jim talks exclusively about the structure and the organization behind data governance. If you believe that data governance is people, process and technology, this webinar covers the "people" side of the equation.

Sunday, April 27, 2008

The Solution Maturity Cycle

I saw the news about Informatica’s acquisition of Identity Systems, and it got me thinking. I recognize a familiar pattern that all too often occurs in the enterprise software business. I’m going to call it the Solution Maturity Cycle. It goes something like this:

1. The Emergence Phase: A young, fledgling company emerges that provides an excellent product that fills a need in the industry. This was Informatica in the 90’s. Rather than hand coding a system of metadata management, companies could use a cool graphical user interface to get the job done. Customers were happy. Informatica became a success. Life was good.

2. The Mashup Phase: Customers begin to realize that if they mash up the features of say, an ETL tool and a data quality tool, they can reap huge benefit for their companies. Eventually, the companies see the benefit of working together, and even begin to talk to prospective customers together. This was Informatica in 2003-5, working with FirstLogic and Trillium Software. Customers could decide which solution to use. Customers were happy that they could mashup, and happy that others had found success in doing so.

3. The Market Consolidation Phase: Under pressure from stockholders to increase revenue, the company looks to buy a solution in order to sell it in-house. The pressure also comes from industry analysts, who if they’re doing their job properly, interpret the mashup as a hole in the product. Unfortunately, the established and proven technology companies are too expensive to buy, so the company looks to a young, fledgling data quality company. The decision on which company to buy is more influenced by bean counters than technologists. Even if there are limitations on the fledgling’s technology, the sales force pushes hard to eliminate mashup implementations, so that annual maintenance revenue will be recognized. This is what happened with Informatica and Similarity Systems in my opinion. Early adopters are confused by this and fearful that their mashup might not be supported. Some customers fight to keep their mashups, some yield to the pressure and install the new solution.

4. Buy and Grow Phase: When bean counters select technology to support the solution, they usually get some product synergies wrong. Sure, the acquisition works from a revenue-generating perspective, but from the technology solution perspective, it is limited. The customers are at the same time under pressure from the mega-vendors, who want to own the whole enterprise. What to do? Buy more technology. It’ll fill the holes, keep the mega-vendor wolves at bay, and build more revenue.

The Solution Maturity Cycle is something that we all must pay attention to when dealing with vendors. For example, I’m seeing phase 3 this cycle occur in the SAP world, where SAP’s acquisition of Business Objects dropped several data quality solutions in SAP’s lap. Now despite the many successful mashups of Trillium Software and SAP, customers are being shown other solutions from the acquisition. All along, history makes me question whether an ERP vendor will be committed long term to the data quality market.

After a merger occurs, a critical decision point comes to customers. Should a customer resist pulling out mashups, or should you try to unify the solution under one vendor? It's a tough decision. The decision may affect internal IT teams, causing conflict between those who have been working on the mashup versus the mega-vendor team. In making this decision, there are a couple of key questions to ask:

Is the newly acquired technology in the vendor’s core competency?
Is the vendor committed to interoperability with other enterprise applications, or just their own? How will this affect your efforts for an enterprise-wide data governance program?
Is the vendor committed to continual improvement this part of the solution?
How big is the development team and how many people has the vendor hired from the purchased company? (Take names.)
Can the vendor prove that taking out a successful solution to put in a new one will make you more successful?
Are there any competing solutions within the vendor’s own company, poised to become the standard?
Who has been successful with this solution, and do they have the same challenges that I have?

As customers of enterprise applications, we should be aware of history and the Solution Maturity Cycle.

Wednesday, April 9, 2008

Must-read Analyst Reports on Data Governance

If you’re thinking of implementing a data governance strategy at your company, here are some key analyst reports I believe are a must-read.

Data Governance: What Works And What Doesn't by Rob Karel, Forrester
A high-level overview of data governance strategies. It’s a great report to hand to a c-level executive in your company who may need some nudging.

Data Governance Strategies
by Philip Russom and TDWI
A comprehensive overview of data governance, including extensive research and case studies. This one is hot off the presses from TDWI. Sponsored by many of the top information quality vendors.

The Forrester Wave™: Information Quality Software by J. Paul Kirby, Forrester
This report covers the strengths and weaknesses of top information quality software vendors. Many of the vendors covered here have been gobbled up by other companies, but the report is still worth a read. $$

Best Practices for Data Stewardship
Magic Quadrant for Data Quality Tools
by Ted Friedman, Gartner
I have included the names of two of Ted’s reports on this list, but Ted offers much insight in many forms. He has written and spoken often on the topic. (When you get to the Gartner web site, you're going to have to search on the above terms as Gartner makes it difficult to link directly.) $$
Ed Note: The latest quadrant (2008) is now available here.

The case for a data quality platform
Philip Howard, Bloor Research
Andy Hayler and Philip Howard are prolific writers on information quality at Bloor Research. They bring an international flair to the subject that you won’t find in the rest.

Sunday, April 6, 2008

Politics, Presidents and Data Governance

I was curious about the presidential candidates and their plans to build national ID cards and a database of citizens, so I set out to do some research on the candidates stance on this issue. It strikes me as a particularly difficult task, given the size of the database that would be needed and the complexity. Just how realistic would the data governance strategy for the candidates be?

I searched the candidate’s web sites with the following Google commands:
database site:http://www.johnmccain.com
database site:http://www.barackobama.com
database site:http://www.hillaryclinton.com

Hardly scientific, but interesting results nonetheless. The candidates have very different data management plans for the country. This simple search gave some insight into the candidate’s data management priorities.

Clinton:
Focused on national health care and the accompanying data challenges.
• Patient Health Care Records Database
• Health Care Provider Performance Tracking Database
• Employer History of Complaints
Comments: It’s clear that starting a national database of doctors and patients is a step toward a national health plan. There are huge challenges with doctor data, however. Many doctors work in multiple locations, having a practice at a major medical center and a private practice, for example. Consolidating doctor lists from insurance companies would rely heavily on unique health care provider ID numbers, doctor age and sex, and factors other than name and address for information quality. This is an ambitious plan, particularly given data compliance regulations, but necessary for a national health plan.

Obama:
Not much about actual database plans, but Obama has commented in favor of:
• Lobbyist Activity Database
• National Sex Offender Database
Comments: Many states currently monitor sex offenders, so the challenge would be coordinating a process and managing the metadata from the states. Not a simple task to say the least. I suspect none of the candidates are really serious about this, but it’s a strong talk-track. Ultimately, this may be better left to the states to manage.
As far as the lobbyist activity database, I frankly can’t see how it’d work. Would lobbyists would complete online forms describing their activities with politicians. If lobbyists have to describe their interaction with the politician, would they be given an open slate in which to scribble some notes about the event/gift/dinner/meeting topics? This would likely be chock full of unstructured data, and its usefulness would be questionable in my opinion.

McCain:
• Grants and Contracts Database
• Lobbyist Activity Database
• National Sex Offender Database
Comments: Adding in the grants and contracts database into McCain’s plan, I see this as similar to Obama’s plan in that it’s storage of unstructured data.

To succeed in any of these plans from our major presidential candidates, I see a huge effort in the “people” and “process” components of data governance. Congress will have to enact laws that describe data models, data security, information quality, exceptions processing and much more. Clearly, this is not their area of expertise. Yet the candidates seem to be talking about technology as a magic wand to heal our country’s problems. It’s not going to be easy for any of them to make any of this a reality, even with all the government’s money.
Instead of these popular vote-grabbing initiatives, wouldn't the government be better served by a president who is understands data governance? When you think about it, the US Government is making the same mistake that businesses make, growing and expanding data silos, leading to more and more inefficiencies. I can’t help but thinking what we really need is a federal information quality and metadata management agency (since the government like acronyms, shall we call it FIQMM) to oversee the government’s data. The agency could be empowered by the president to have access to government data, define data models, and provide people, process and technologies to improve efficiency. Imagine what efficiencies we could gain with a federal data governance strategy. Just imagine.

Data Governance Insider

Tuesday, June 3, 2008

Trillium Software News Items

Monday, May 19, 2008

Unusual Data Quality Problems

Wednesday, May 14, 2008

The Best Books on Data Governance

Sunday, May 4, 2008

Data Governance Structure and Organization Webinar

Sunday, April 27, 2008

The Solution Maturity Cycle

Wednesday, April 9, 2008

Must-read Analyst Reports on Data Governance

Sunday, April 6, 2008

Politics, Presidents and Data Governance

Blog Archive

Share and Follow

About Me

wikipedia

Kindle Edition

Book Recommendations

Other Blogs I Like

RSS Feed