Showing posts with label information quality. Show all posts
Showing posts with label information quality. Show all posts

Saturday, November 12, 2011

The ‘Time’ Factor in Data Management

I've been thinking about how many ways time influences the data management world. When it comes to managing data, we think about improving processes, coercing the needs and desires of people and how technology comes to help us manage it all. However, an often overlooked aspect of data management is time. Time impacts data management from many different directions.

Time Means Technology Will Improve
As time marches on, technology offers twists and turns to the data steward through innovation.  20 years ago, mainframes ruled the world.  We’ve migrated through relational databases on powerful servers to a place where we see our immediate future in cloud and big data. As technology shifts, you must consider the impact of data.

The good news is that with these huge challenges, you also get access to new tools.  In general, tools have become less arcane and more business-user focused as time marches on. 

Time Causes People to Change

Like changes in technology, people also mature, change careers, retire. With regard to data management, the corporation must think about the expertise needed to complete the data mission. Data management must pass the “hit by a bus” test where the company would not suffer if one or more key people were to be hit by a Greyhound traveling from Newark to Richmond.

Here, time is requiring us to be more diligent in documenting our processes.  It is requiring us to avoid undocumented hand-coding and pick a reproducible data management platform.  It helps to have third-party continuity, like consultants who, although will also experience changes in personnel, will change on a different schedule than their clients.

Time Leads to Clarity in the Imperative of Data Management

With regard to data management, corporations have a maturity process they go through. They often start as chaotic immature organizations and realize the power of data management in a tactical maturity stage. Finally, they realize data management is a strategic initiative when they begin to govern the data.  Throughout it all, people, process and technologies change.

Knowing where you are in this maturity cycle can help you plan where you want to go from here and what tactics you need to put in place to get there. For example, very few companies go from chaotic, ad hoc data management to full-blown MDM. For the most part, they get there through making little changes, seeing the positive impact of the little changes and wanting more. Rather, a chaotic organization might be more apt to evolve their data management maturity by consolidating two or more ERP systems and revel in its efficiency.

Time Prevents Us from Achieving Successful Projects
When it comes to specific projects, taking too much time can lead to failure in projects.  In the not so distant past, circa 2007, the industry commonly took on massive, multi-year, multimillion dollar MDM projects. We now know that these projects are not the best way to manage data. Why? Think about how much your own company has changed in the last two years.  If it is a dynamic, growing company, it likely has different goals, different markets, different partners and new leadership. The world has changed significantly, too.  Today’s worldwide economy is so much different that even one year ago. (Have you heard about the recession and European debt crisis?) The goals of a project that you set up two years ago will never achieve success today.

Time makes us take an agile approach to data management. It requires that we pick off small portions of our problems, solve them, prove value and re-use what we’ve learned on the next agile project.  Limit and hold scope to achieve success.

Time Achieves Corporate Growth (which is counter to data management)
Companies who are just starting out generally have fewer data management problems than those who are mature. Time pushes our data complexity deeper and deeper. Therefore time dictates that even small companies should have some sort of data management strategy.  The good news is that now achievable with help from open source and lower cost data management solutions. Proper data management tools are affordable by both Fortune 1000 and small to medium-sized enterprises.

Time Holds Us Responsible
That said, the longer a corporation is in business, the longer it can be held responsible for lower revenue, decreased efficiency and lack of compliance due to poor data management. The company decides how it is going to govern (or not govern) data, what data is acceptable in the CRM and who is responsible for the mistakes that happen due to poor data management. The longer you are in business, the more responsible the corporation is for its governance. Time holds us responsible if the problems aren’t solved.

Time and Success Lead to Apathy

Finally, time often brings us success in data management.  With success, there is a propensity for corporations to take the eye off the prize and spend monies on more pressing issues.  Time and success can lead to a certain apathy, believing that the data management problem is solved.  But, as time marches on, new partners, new data sources, new business processes. Time requires us to be ever vigilant in our efforts to manage data.

Tuesday, August 30, 2011

Top Ten Root Causes of Data Quality Problems: Part Four

Part 4 of 5: Data Flow
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them.  In part four, we examine some of the areas involving the pervasive nature of data and how it flows to and fro within an organization.

Root Cause Number Seven: Transaction Transition

More and more data is exchanged between systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases.

However, what happens when transactions go awry? A malfunctioning system could cause problems with downstream business applications.  In fact, even a small data model change could cause issues.

Root Cause Attack Plan
  • Schema Checks – Employ schema checks in your job streams to make sure your real-time applications are producing consistent data.  Schema checks will do basic testing to make sure your data is complete and formatted correctly before loading.
  • Real-time Data Monitoring – One level beyond schema checks is to proactively monitor data with profiling and data monitoring tools.  Tools like the Talend Data Quality Portal and others will ensure the data contains the right kind of information.  For example, if your part numbers are always a certain shape and length, and contain a finite set of values, any variation on that attribute can be monitored. When variations occur, the monitoring software can notify you.

Root Cause Number Eight: Metadata Metamorphosis

Metadata repository should be able to be shared by multiple projects, with audit trail maintained on usage and access.  For example, your company might have part numbers and descriptions that are universal to CRM, billing, ERP systems, and so on.  When a part number becomes obsolete in the ERP system, the CRM system should know. Metadata changes and needs to be shared.

In theory, documenting the complete picture of what is going on in the database and how various processes are interrelated would allow you to completely mitigate the problem. Sharing the descriptions and part numbers among all applicable applications needs to happen. To get started, you could then analyze the data quality implications of any changes in code, processes, data structure, or data collection procedures and thus eliminate unexpected data errors. In practice, this is a huge task.

Root Cause Attack Plan
  • Predefined Data Models – Many industries now have basic definitions of what should be in any given set of data.  For example, the automotive industry follows certain ISO 8000 standards.  The energy industry follows Petroleum Industry Data Exchange standards or PIDX.  Look for a data model in your industry to help.
  • Agile Data Management – Data governance is achieved by starting small and building out a process that first fixes the most important problems from a business perspective. You can leverage agile solutions to share metadata and set up optional processes across the enterprise.

This post is an excerpt from a white paper available here. My final post on this subject in the days ahead.

Saturday, October 16, 2010

Is 99.8 % data accuracy enough?

Ripped from recent headlines, we see how even a .2% failure can have a big impact.

WASHINGTON (AP) ― More than 89,000 stimulus payments of $250 each went to people who were either dead or in prison, a government investigator says in a new report.

Let’s take a good, hard look at this story. It begins with the US economy slumping.  The president proposes and passes through congress one of the biggest stimulus packages ever. The idea is sound to many; get America working by offering jobs in green energy, shovel-ready infrastructure projects. Among other actions, the plan is to give lower income people some government money so they can stimulate the economy.

I’m not really here to praise or zing the wisdom of this. I’m just here to give the facts. In hindsight, it appears as though it hasn’t stimulated the economy as many had hoped, but that’s beside the point.

Continuing on, the government issues 52 million people on social security a check for $250. It turns out of that number nearly 100,000 people were in prison or dead, roughly 0.2% of the checks. Some checks are returned, some are cashed. Ultimately, the government loses $22.3 million on the 0.2% error.

While $22.3 million is a HUGE number, 0.2% is a tiny number.  It strikes at the heart at why data quality is so important.  Social Security spokesman Mark Lassiter said, "…Each year we make payments to a small number of deceased recipients usually because we have not yet received reports of their deaths."

There is strong evidence that the SSA is hooked up to the right commercial data feeds and have the processes in place to use them. It seems as though the social security administration is quite proactive in their search for the dead and imprisoned, but people die and go to prison all the time. They also move, get married and become independent of their parents.

If we try to imagine what it would take to achieve closer to 100% accuracy, it would take up-to-the-minute reference data. It seems that the only real solution is to put forth legislation that requires the reporting to the federal government any of these life changing events. Should we mandate the bereaved or perhaps funeral directors to report the death immediately in a central database? Even with such a law, there still would be a small percentage of checks that would be issued while the recipient was alive and delivered after the recipient is dead. We’d have better accuracy for this issue, but not 100%

While this story takes a poke at the SSA for sending checks to dead people, I have to applaud their achievement of 99.8% accuracy. It could be a lot worse America.  A lot worse.

Thursday, January 21, 2010

ETL, Data Quality and MDM for Mid-sized Business


Is data quality a luxury that only large companies should be able to afford?  Of course the answer is no. Your company should be paying attention to data quality no matter if you are a Fortune 1000 or a startup. Like a toothache, poor data quality will never get better on its own.

As a company naturally grows, the effects of poor data quality multiply.  When a small company expands, it naturally develops new IT systems. Mergers often bring in new IT systems, too. The impact of poor data quality slowly invades and hinders the company’s ability to service customers, keep the supply chain efficient and understand its own business. Paying attention to data quality early and often is a winning strategy for even the small and medium-sized enterprise (SME).

However, SME’s have challenges with the investment needed in enterprise level software. While it’s true that the benefit often outweighs the costs, it is difficult for the typical SME to invest in the license, maintenance and services needed to implement a major data integration, data quality or MDM solution.

At the beginning of this year, I started with a new employer, Talend. I became interested in them because they were offering something completely different in our world – open source data integration, data quality and MDM.  If you go to the Talend Web site, you can download some amazing free software, like:
  • a fully functional, very cool data integration package (ETL) called Talend Open Studio
  • a data profiling tool, called Talend Open Profiler, providing charts and graphs and some very useful analytics on your data
The two packages sit on top of a database, typically MySQL – also an open source success.

For these solutions, Talend uses a business model similar to what my friend Jim Harris has just blogged about – Freemium. Under this new model, free open source content is made available to everyone—providing the opportunity to “up-sell” premium content to a percentage of the audience. Talend works like this.  You can enhance your experience from Talend Open Studio by purchasing Talend Integration Suite (in various flavors).  You can take your data quality initiative to the next level by upgrading Talend Open Profiler to Talend Data Quality.

If you want to take the combined data integration and data quality to an even higher level, Talend just announced a complete Master Data Management (MDM) solution, which you can use in a more enterprise-wide approach to data governance. There’s a very inexpensive place to start and an evolutionary path your company can take as it matures its data management strategy.

The solutions have been made possible by the combined efforts of the open source community and Talend, the corporation. If you’d like, you can take a peek at some source code, use the basic software and try your hand at coding an enhancement. Sharing that enhancement with community will only lead to a world full of better data, and that’s a very good thing.

Monday, December 21, 2009

The World is Addicted to Data (and that's good for us)


In the famous book “The Transparent Society”, we are asked to consider some of the privacy ills we will be facing as technology improves and our society gains access to more data sets. The book was groundbreaking when it was written in 1999. It imagines the emergence of groups who are more powerful because they own the data. However, as we sit here ten years later with 20/20 hindsight, it’s clear that the existence and access to specialized data sets makes our life better, not worse.

There are countless examples of this daily improvement in our lives, but some personal ones:
  • I was in the supermarket recently and per usual, there was a long line at the deli. On the other hand, there was no line at the “deli kiosk” so I gave it a try. Based on my frequent shopper card number and underlying database, the deli kiosk already knew my preferred brand and type of cheese and delicious deli meats. Ordering was a snap thanks to a database, and I didn’t even have to mispronounce “Deutschmacher” to the deli man, like I usually do.
  • For Thanksgiving, I visited some relatives that I don’t often see. My GPS led me there thanks to a geospatial database. It told me how long it was going to take based on traffic data, which is often aggregated from several sources, including road sensors and car and taxi fleets. I also was informed about all the coffee shops along the way, thanks to the data set provided by the Dunkin Donuts. Before I left, I used Google Street View and Microsoft Bing’s Birds Eye view to see what the destination looked like. Ten years ago, all of this was pretty much unheard of, but thanks to the coming together of geospatial data, real-time traffic data, satellite and airplane imagery, street view imagery, Dunkin Donuts franchise data, and small, cheap processors, my trip was fantastic.
  • Fantasy Football is a new phenomenon, made possible by data our addiction to data. We know exactly where we stand on any given Sunday as player stats are made available instantly during the games. When Wes Welker scores, I see the six points reflected on my score instantly. Companies like STATS not only cover football, but according to their web site - 234 sports.
  • For iPhone users, there are tons of data-centric applications. For example, Wait Watchers is an app that uses user submissions to generate and display a table of the current ride wait times at major theme parks throughout the world. As this information is updated by users, other users at Disney can make decisions about whether to go to Space Mountain or It’s a small world, for example.

In the corporate world, it’s much of the same and even more important to our society. Marketing teams are addicted to information from web analytics and use marketing automation tools to track the success of their programs. Operations teams track assets like computers, buildings, trucks and people with data. Sales has been and will continue to track customers with data. Finance relies on the collision of credit scores data, invoice and payment data as well as making sure they have enough money in reserves to meet regulations. Executives will continue to rely on business intelligence and data. In fact, it’s hard to find anyone in the business world who doesn’t rely on data.

Of course, much of this is anecdotal. I haven’t found any specific study on the increase in database use, but we do know from an old IDC study that the number of servers in use worldwide, presumably some used for database, has roughly doubled from 2000 to 2005. A doubling of servers, combined with a typically bigger hard drive capacity, point to higher database use.

It was difficult to imagine us here ten years ago, and it’s even more difficult to imagine where we’ll be at the beginning of 2020.  It seems to me that we'll have more opportunity to create and use information with applications on our mobile devices. The collision of iPhone/Droid devices with increasing bandwidths of 3G and 4G networks on the major mobile phone carriers tells me that data in the future will let us do things we can only imagine today.

The world is addicted to data and that bodes well for anyone who helps the world manage it. In 2010, no matter if the economy turns up or down, our industry will continue to feed the addiction to good, clean data.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.