I've been thinking about how many ways time influences the data management world. When it comes to managing data, we think about improving processes, coercing the needs and desires of people and how technology comes to help us manage it all. However, an often overlooked aspect of data management is time. Time impacts data management from many different directions.
Time Means Technology Will Improve
As time marches on, technology offers twists and turns to the data steward through innovation. 20 years ago, mainframes ruled the world. We’ve migrated through relational databases on powerful servers to a place where we see our immediate future in cloud and big data. As technology shifts, you must consider the impact of data.
The good news is that with these huge challenges, you also get access to new tools. In general, tools have become less arcane and more business-user focused as time marches on.
Time Causes People to Change
Like changes in technology, people also mature, change careers, retire. With regard to data management, the corporation must think about the expertise needed to complete the data mission. Data management must pass the “hit by a bus” test where the company would not suffer if one or more key people were to be hit by a Greyhound traveling from Newark to Richmond.
Here, time is requiring us to be more diligent in documenting our processes. It is requiring us to avoid undocumented hand-coding and pick a reproducible data management platform. It helps to have third-party continuity, like consultants who, although will also experience changes in personnel, will change on a different schedule than their clients.
Time Leads to Clarity in the Imperative of Data Management
With regard to data management, corporations have a maturity process they go through. They often start as chaotic immature organizations and realize the power of data management in a tactical maturity stage. Finally, they realize data management is a strategic initiative when they begin to govern the data. Throughout it all, people, process and technologies change.
Knowing where you are in this maturity cycle can help you plan where you want to go from here and what tactics you need to put in place to get there. For example, very few companies go from chaotic, ad hoc data management to full-blown MDM. For the most part, they get there through making little changes, seeing the positive impact of the little changes and wanting more. Rather, a chaotic organization might be more apt to evolve their data management maturity by consolidating two or more ERP systems and revel in its efficiency.
Time Prevents Us from Achieving Successful Projects
When it comes to specific projects, taking too much time can lead to failure in projects. In the not so distant past, circa 2007, the industry commonly took on massive, multi-year, multimillion dollar MDM projects. We now know that these projects are not the best way to manage data. Why? Think about how much your own company has changed in the last two years. If it is a dynamic, growing company, it likely has different goals, different markets, different partners and new leadership. The world has changed significantly, too. Today’s worldwide economy is so much different that even one year ago. (Have you heard about the recession and European debt crisis?) The goals of a project that you set up two years ago will never achieve success today.
Time makes us take an agile approach to data management. It requires that we pick off small portions of our problems, solve them, prove value and re-use what we’ve learned on the next agile project. Limit and hold scope to achieve success.
Time Achieves Corporate Growth (which is counter to data management)
Companies who are just starting out generally have fewer data management problems than those who are mature. Time pushes our data complexity deeper and deeper. Therefore time dictates that even small companies should have some sort of data management strategy. The good news is that now achievable with help from open source and lower cost data management solutions. Proper data management tools are affordable by both Fortune 1000 and small to medium-sized enterprises.
Time Holds Us Responsible
That said, the longer a corporation is in business, the longer it can be held responsible for lower revenue, decreased efficiency and lack of compliance due to poor data management. The company decides how it is going to govern (or not govern) data, what data is acceptable in the CRM and who is responsible for the mistakes that happen due to poor data management. The longer you are in business, the more responsible the corporation is for its governance. Time holds us responsible if the problems aren’t solved.
Time and Success Lead to Apathy
Finally, time often brings us success in data management. With success, there is a propensity for corporations to take the eye off the prize and spend monies on more pressing issues. Time and success can lead to a certain apathy, believing that the data management problem is solved. But, as time marches on, new partners, new data sources, new business processes. Time requires us to be ever vigilant in our efforts to manage data.
Showing posts with label agile data management. Show all posts
Showing posts with label agile data management. Show all posts
Saturday, November 12, 2011
Wednesday, August 31, 2011
Top Ten Root Causes of Data Quality Problems: Part Five
Part 5 of 5: People Issues
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. Companies rely on data to make significant decisions that can affect customer service, regulatory compliance, supply chain and many other areas. As you collect more and more information about customers, products, suppliers, transactions and billing, you must attack the root causes of data quality.
Root Cause Number Nine: Defining Data Quality
More and more companies recognize the need for data quality, but there are different ways to clean data and improve data quality. You can:
Root Cause Attack Plan
Root Cause Number Ten: Loss of Expertise
On almost every data intensive project, there is one person whose legacy data expertise is outstanding. These are the folks who understand why some employee date of hire information is stored in the date of birth field and why some of the name attributes also contain tax ID numbers.
Data might be a kind of historical record for an organization. It might have come from legacy systems. In some cases, the same value in the same field will mean a totally different thing in different records. Knowledge of these anomalies allows experts to use the data properly.
If you encounter this situation, there are some business processes you can follow.
Root Cause Attack Plan
This post is an excerpt from a white paper available here. More to come on this subject in the days ahead.
See also:
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. Companies rely on data to make significant decisions that can affect customer service, regulatory compliance, supply chain and many other areas. As you collect more and more information about customers, products, suppliers, transactions and billing, you must attack the root causes of data quality.
Root Cause Number Nine: Defining Data Quality
More and more companies recognize the need for data quality, but there are different ways to clean data and improve data quality. You can:
- Write some code and cleanse manually
- Handle data quality within the source application
- Buy tools to cleanse data
Root Cause Attack Plan
- Standardize Tools – Whenever possible, choose tools that aren’t tied to a particular solution. Having data quality only in SAP, for example, won’t help your Oracle, Salesforce and MySQL data sets. When picking a solution, select one that is capable of accessing any data, anywhere, at any time. It shouldn't cost you a bundle to leverage a common solution across multiple platforms and solutions.
- Data Governance – By setting up a cross-functional data governance team, you will have the people in place to define a common data model.
Root Cause Number Ten: Loss of Expertise
On almost every data intensive project, there is one person whose legacy data expertise is outstanding. These are the folks who understand why some employee date of hire information is stored in the date of birth field and why some of the name attributes also contain tax ID numbers.
Data might be a kind of historical record for an organization. It might have come from legacy systems. In some cases, the same value in the same field will mean a totally different thing in different records. Knowledge of these anomalies allows experts to use the data properly.
If you encounter this situation, there are some business processes you can follow.
Root Cause Attack Plan
- Profile and Monitor – Profiling the data will help you identify most of these types of issues. For example, if you have a tax ID number embedded in the name field, analysis will let you quickly spot it. Monitoring will prevent a recurrence.
- Document – Although they may be reluctant to do so for fear of losing job security, make sure experts document all of the anomalies and transformations that need to happen every time the data is moved.
- Use Consultants – Expert employees may be so valuable and busy that there is no time to document the legacy anomalies. Outside consulting firms are usually very good at documenting issues and providing continuity between legacy and new employees.
This post is an excerpt from a white paper available here. More to come on this subject in the days ahead.
See also:
- Part One: The Basics
- Part Two: Renegades and Pirates
- Part Three: Secret Code and Corporate Evolution
- Part Four: Data Flow
Tuesday, August 30, 2011
Top Ten Root Causes of Data Quality Problems: Part Four
Part 4 of 5: Data Flow
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part four, we examine some of the areas involving the pervasive nature of data and how it flows to and fro within an organization.
Root Cause Number Seven: Transaction Transition
More and more data is exchanged between systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases.
However, what happens when transactions go awry? A malfunctioning system could cause problems with downstream business applications. In fact, even a small data model change could cause issues.
Root Cause Attack Plan
Root Cause Number Eight: Metadata Metamorphosis
Metadata repository should be able to be shared by multiple projects, with audit trail maintained on usage and access. For example, your company might have part numbers and descriptions that are universal to CRM, billing, ERP systems, and so on. When a part number becomes obsolete in the ERP system, the CRM system should know. Metadata changes and needs to be shared.
In theory, documenting the complete picture of what is going on in the database and how various processes are interrelated would allow you to completely mitigate the problem. Sharing the descriptions and part numbers among all applicable applications needs to happen. To get started, you could then analyze the data quality implications of any changes in code, processes, data structure, or data collection procedures and thus eliminate unexpected data errors. In practice, this is a huge task.
Root Cause Attack Plan
This post is an excerpt from a white paper available here. My final post on this subject in the days ahead.
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part four, we examine some of the areas involving the pervasive nature of data and how it flows to and fro within an organization.
Root Cause Number Seven: Transaction Transition
More and more data is exchanged between systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases.
However, what happens when transactions go awry? A malfunctioning system could cause problems with downstream business applications. In fact, even a small data model change could cause issues.
Root Cause Attack Plan
- Schema Checks – Employ schema checks in your job streams to make sure your real-time applications are producing consistent data. Schema checks will do basic testing to make sure your data is complete and formatted correctly before loading.
- Real-time Data Monitoring – One level beyond schema checks is to proactively monitor data with profiling and data monitoring tools. Tools like the Talend Data Quality Portal and others will ensure the data contains the right kind of information. For example, if your part numbers are always a certain shape and length, and contain a finite set of values, any variation on that attribute can be monitored. When variations occur, the monitoring software can notify you.
Root Cause Number Eight: Metadata Metamorphosis
Metadata repository should be able to be shared by multiple projects, with audit trail maintained on usage and access. For example, your company might have part numbers and descriptions that are universal to CRM, billing, ERP systems, and so on. When a part number becomes obsolete in the ERP system, the CRM system should know. Metadata changes and needs to be shared.
In theory, documenting the complete picture of what is going on in the database and how various processes are interrelated would allow you to completely mitigate the problem. Sharing the descriptions and part numbers among all applicable applications needs to happen. To get started, you could then analyze the data quality implications of any changes in code, processes, data structure, or data collection procedures and thus eliminate unexpected data errors. In practice, this is a huge task.
Root Cause Attack Plan
- Predefined Data Models – Many industries now have basic definitions of what should be in any given set of data. For example, the automotive industry follows certain ISO 8000 standards. The energy industry follows Petroleum Industry Data Exchange standards or PIDX. Look for a data model in your industry to help.
- Agile Data Management – Data governance is achieved by starting small and building out a process that first fixes the most important problems from a business perspective. You can leverage agile solutions to share metadata and set up optional processes across the enterprise.
This post is an excerpt from a white paper available here. My final post on this subject in the days ahead.
Thursday, August 25, 2011
Top Ten Root Causes of Data Quality Problems: Part Two
Part 2 of 5: Renegades and Pirates
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part two, we examine IT renegades and corporate pirates as two of the root causes for data quality problems.
Root Cause Number Three: Renegade IT and Spreadmarts
A renegade is a person who deserts and betrays an organizational set of principles. That’s exactly what some impatient business owners unknowingly do by moving data in and out of business solutions, databases and the like. Rather than wait for some professional help from IT, eager business units may decide to create their own set of local applications without the knowledge of IT. While the application may meet the immediate departmental need, it is unlikely to adhere to standards of data, data model or interfaces. The database might start by making a copy of a sanctioned database to a local application on team desktops. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. There are no backups, versioning or business rules.
Root Cause Attack Plan
Root Cause Number Four: Corporate Mergers
Corporate mergers increase the likelihood for data quality errors because they usually happen fast and are unforeseen by IT departments. Almost immediately, there is pressure to consolidate and take shortcuts on proper planning. The consolidation will likely include the need to share data among a varied set of disjointed applications. Many shortcuts are taken to “make it happen,” often involving known or unknown risks to the data quality.
On top of the quick schedule, merging IT departments may encounter culture clash and a different definition of truth. Additionally, mergers can result in a loss of expertise when key people leave midway through the project to seek new ventures.
Root Cause Attack Plan
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part two, we examine IT renegades and corporate pirates as two of the root causes for data quality problems.
Root Cause Number Three: Renegade IT and Spreadmarts
A renegade is a person who deserts and betrays an organizational set of principles. That’s exactly what some impatient business owners unknowingly do by moving data in and out of business solutions, databases and the like. Rather than wait for some professional help from IT, eager business units may decide to create their own set of local applications without the knowledge of IT. While the application may meet the immediate departmental need, it is unlikely to adhere to standards of data, data model or interfaces. The database might start by making a copy of a sanctioned database to a local application on team desktops. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. There are no backups, versioning or business rules.
Root Cause Attack Plan
- Corporate Culture – There should be a consequence for renegade data, making it more difficult for the renegades to create local data applications.
- Communication – Educate and train your employees on the negative impact of renegade data.
- Sandbox – Having tools that can help business users and IT professionals experiment with the data in a safe environment is crucial. A sandbox, where users are experimenting on data subsets and copies of production data, has proven successful for many for limiting renegade IT.
- Locking Down the Data – A culture where creating unsanctioned spreadmarts is shunned is the goal. Some organizations have found success in locking down the data to make it more difficult to export.
Root Cause Number Four: Corporate Mergers
Corporate mergers increase the likelihood for data quality errors because they usually happen fast and are unforeseen by IT departments. Almost immediately, there is pressure to consolidate and take shortcuts on proper planning. The consolidation will likely include the need to share data among a varied set of disjointed applications. Many shortcuts are taken to “make it happen,” often involving known or unknown risks to the data quality.
On top of the quick schedule, merging IT departments may encounter culture clash and a different definition of truth. Additionally, mergers can result in a loss of expertise when key people leave midway through the project to seek new ventures.
Root Cause Attack Plan
- Corporate Awareness – Whenever possible civil division of labor should be mandated by management to avoid culture clashes and data grabs by the power hungry.
- Document – Your IT initiative should survive even if the entire team leaves, disbands or gets hit by a bus when crossing the street. You can do this with proper documentation of the infrastructure.
- Third-party Consultants – Management should be aware that there is extra work to do and that conflicts can arise after a merger. Consultants can provide the continuity needed to get through the transition.
- Agile Data Management – Choose solutions and strategies that will keep your organization agile, giving you the ability to divide and conquer the workload without expensive licensing of commercial applications.
Labels:
agile data management,
data governance,
data quality,
mergers,
sandbox
Subscribe to:
Comments (Atom)
Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.





