Showing posts with label master data management. Show all posts
Showing posts with label master data management. Show all posts

Thursday, March 22, 2012

Big Data Hype is an Opportunity for Data Management Pros

Big Data is a hot topic in the data management world. Recently, I’ve seen press and vendors describing it with the words crucial, tremendous opportunity, overcoming vexing challenges, and enabling technology.  With all the hoopla, management is probably asking many of you about your Big Data strategy. It has risen to the corporate management level; your CxO is probably aware.

Most of the data management professionals I’ve met are fairly down-to-earth, pragmatic folks.  Data is being managed correctly or not. The business rule works, or it does not. Marketing spin is evil. In fact, the hype and noise around big data may be something to be filtered by many of you. You’re appropriately trying to look through the hype and get to the technology or business process that’s being enhanced by Big Data.
However, in addition to filtering through the big data hype to the IT impact, data management professionals should also embrace the hype.

Sure, we want to handle the high volume transactions that often come with big data, but we still have relational databases and unstructured data sources to deal with.  We still have business users using Excel for databases with who-knows-what in them.  We still have e-mail attachments from partners that need to be incorporated into our infrastructure.  We still have a wide range of data sources and targets that we have to deal with, including, but not limited to, big data. In my last blog post, I wrote about how big data is just one facet of total data management.

The opportunity is for data management pros to think about their big data management strategy holistically and solve some of their old and tired issues around data management. It’s pretty easy to draw a picture for management that Big Data needs to take a Total Data Management approach.  An approach that includes some of our worn-out and politically-charged data governance issues, including:


  • Data Ownership – One barrier to big data management is accountability for the data.  By deciding you are going to plan for big data, you also need to make decisions about who owns the big data, and all your data sets for that matter.
  • Spreadmarts – Keeping unmanaged data out of spreadsheets is increasingly more crucial in companies who must handle Big Data. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. However, big data can help make it easy for everyone to use corporate information, no matter what size.
  • Unstructured Data – Although big data might tend be more analytical than operational, big data is most commonly unstructured data.  A total data management approach takes into account unstructured data in either case. Having technology and processes that handles unstructured data, big or small, is crucial to total data management.
  • Corporate Strategy and Mergers – If your company is one that grows through acquisition, managing big data is about being able to handle, not only your own data, but the data of those companies you acquire.  Since you don’t know what systems those companies will have, a big data governance strategy and flexible tools are important to big data.


My point is, with big data, try to avoid the typical noise filtering exercises you normally take on the latest buzzword.  Instead, use the hype and buzz to your advantage to address a holistic view of data management in your organization.

Tuesday, August 30, 2011

Top Ten Root Causes of Data Quality Problems: Part Four

Part 4 of 5: Data Flow
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them.  In part four, we examine some of the areas involving the pervasive nature of data and how it flows to and fro within an organization.

Root Cause Number Seven: Transaction Transition

More and more data is exchanged between systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases.

However, what happens when transactions go awry? A malfunctioning system could cause problems with downstream business applications.  In fact, even a small data model change could cause issues.

Root Cause Attack Plan
  • Schema Checks – Employ schema checks in your job streams to make sure your real-time applications are producing consistent data.  Schema checks will do basic testing to make sure your data is complete and formatted correctly before loading.
  • Real-time Data Monitoring – One level beyond schema checks is to proactively monitor data with profiling and data monitoring tools.  Tools like the Talend Data Quality Portal and others will ensure the data contains the right kind of information.  For example, if your part numbers are always a certain shape and length, and contain a finite set of values, any variation on that attribute can be monitored. When variations occur, the monitoring software can notify you.

Root Cause Number Eight: Metadata Metamorphosis

Metadata repository should be able to be shared by multiple projects, with audit trail maintained on usage and access.  For example, your company might have part numbers and descriptions that are universal to CRM, billing, ERP systems, and so on.  When a part number becomes obsolete in the ERP system, the CRM system should know. Metadata changes and needs to be shared.

In theory, documenting the complete picture of what is going on in the database and how various processes are interrelated would allow you to completely mitigate the problem. Sharing the descriptions and part numbers among all applicable applications needs to happen. To get started, you could then analyze the data quality implications of any changes in code, processes, data structure, or data collection procedures and thus eliminate unexpected data errors. In practice, this is a huge task.

Root Cause Attack Plan
  • Predefined Data Models – Many industries now have basic definitions of what should be in any given set of data.  For example, the automotive industry follows certain ISO 8000 standards.  The energy industry follows Petroleum Industry Data Exchange standards or PIDX.  Look for a data model in your industry to help.
  • Agile Data Management – Data governance is achieved by starting small and building out a process that first fixes the most important problems from a business perspective. You can leverage agile solutions to share metadata and set up optional processes across the enterprise.

This post is an excerpt from a white paper available here. My final post on this subject in the days ahead.

Friday, December 10, 2010

Six Data Management Predictions for 2011

This time of year everyone makes prognostications about the state of the data management field for 2011. I thought I’d take my turn by offering my predictions for the coming year.

Data will become more open
In the old days good quality reference data was an asset kept in the corporate lockbox. If you had a good reference table for common misspellings of parts, cities, or names for example, the mind set was to keep it close and away from falling into the wrong hands.  The data might have been sold for profit or simply not available.  Today, there really is no “wrong hands”.  Governments and corporations alike are seeing the societal benefits of sharing information. More reference data is there for the taking on the internet from sites like data.gov and geonames.org.  That trend will continue in 2011.  Perhaps we’ll even see some of the bigger players make announcements as to the availability of their data. Are you listening Google?

Business and IT will become blurry
It’s becoming harder and harder to tell an IT guy from the head of marketing. That’s because in order to succeed, the IT folks need to become more like the marketer and vice versa.  In the coming year, the difference will be less noticeable and business people get more and more involved in using data to their benefit.  Newsflash One: If you’re in IT, you need marketing skills to pitch your projects and get funding.  Newsflash Two: If you’re in business, you need to know enough about data management practices to succeed.

Tools will become easier to use
As the business users come into the picture, they will need access to the tools to manage data.  Vendors must respond to this new marketplace or die.

Tools will do less heavy lifting
Despite the improvements in the tools, corporations will turn to improving processes and reporting in order to achieve better data management. Dwindling are the days where we’re dealing with data that is so poorly managed that it requires overly complicated data quality tools.  We’re getting better at the data management process and therefore, the burden on the tools becomes less. Future tools with focus on supporting the process improvement with work flow features, reporting and better graphical user interfaces.

CEOs and Government Officials will gain enlightenment
Feeding off the success of a few pioneers in data governance as well as failures of IT projects in our past, CEOs and governments will gain enlightenment about managing their data and put teams in place to handle it.  It has taken decades of our sweet-talk and cajoling for government and CEOs to achieve enlightenment, but I believe it is practically here.

We will become more reliant on data
Ten years ago, it was difficult to imagine us where we are today with respect to our data addiction. Today, data is a pervasive part of our internet-connected society, living in our PCs, our TVs, our mobile phones many other devices. It’s a huge part of our daily lives. As I’ve said in past posts, the world is addicted to data and that bodes well for anyone who helps the world manage it. In 2011, no matter if the economy turns up or down, our industry will continue to feed the addiction to good, clean data.

Tuesday, November 16, 2010

Ideas Having Sex: The Path to Innovation in Data Management

I read a recent analyst report on the data quality market and “enterprise-class” data quality solutions. Per usual, the open source solutions were mentioned at a passing while the data quality solutions of the past were given high marks. Some of the solutions picked in the top originated from days when mainframe was king. Some of the top contenders still contained cobbled-together applications from ill-conceived acquisitions. It got me thinking about the way we do business today and how so much of it is changing.

Back in the 1990’s or earlier, if you had an idea for a new product, you’d work with an internal team of engineers and build the individual parts.  This innovation took time, as you might not always have exactly the right people working on the job.  It was slow and tedious. The product was always confined by its own lineage.

The Android phone market is a perfect examples of the modern way to innovate.  Today, when you want to build something groundbreaking like an Android, you pull in expertise from all around the world. Sure, Samsung might make the CPU and Video processing chips, but Primax Electronics in Taiwan might make the digital camera and Broadcomm in the US makes the touch screen, plus many others. Software vendors push the platform further with their cool apps. Innovation happens at break-neck speed because the Android is a collection of ideas that have sex and produce incredible offspring.

Isn’t that really the model of a modern company?  You have ideas getting together and making new ideas. When you have free exchange between people, there is no need to re-invent something that has already been invented. See the TED for more on this concept, where British author Matt Ridley argues that, through history, the engine of human progress and prosperity is "ideas having sex.”

The business model behind open source has a similar mission.  Open source simply creates better software. Everyone collaborates, not just within one company, but among an Internet-connected, worldwide community. As a result, the open source model often builds higher quality, more secure, more easily integrated software. It does so at a vastly accelerated pace and often at a lower cost.

So why do some industry analysts ignore it? There’s no denying that there are capitalist and financial reasons.  I think if an industry analyst were to actually come out and say that the open source solution is the best, it would be career suicide. The old-school would shun the analysts making him less relevant. The link between the way the industry pays and promotes analysts and vice versa seems to favor enterprise application vendors.

Yet the open source community along with Talend has developed a very strong data management offering that should be considered in the top of its class. The solution leverages other cutting edge solutions. To name just a few examples:
  • if you want to scale up, you can use distributed platform technology from Hadoop, which enables it to work with thousands of nodes and petabytes of data.
  • very strong enterprise class data profiling.  
  • matching that users can actually use and tune without having to jump between multiple applications.
  • a platform that grows with your data management strategy so that if your future is MDM, you can seamlessly move there without having to learn a new GUI.
The way we do business today has changed. Innovation can only happen when ideas have sex, as Matt Ridley puts it. As long as we’re engaged in exchange and specialization, we will achieve those new levels of innovation.

Monday, August 9, 2010

Data Quality Pro Discussion

Last week I sat down with Dylan Jones of DataQualityPro.com to talk about data governance. Here is the replay. We discussed a range of topics including organic governance approaches, challenges of defining data governance, industry adoption trends, policy enforcement vs legislature and much more.

Link

Friday, April 9, 2010

Links from my eLearning Webinar

I recently delivered a webinar on the Secrets of Affordable Data Governance. In the webinar, I promised to deliver links for lowering the costs of data management.  Here are those links:

  • Talend Open Source - Download free data profiling, data integration and MDM software.
  • US Census - Download census data for cleansing of city name and state with latitude and longitude appends.
  • Data.gov - The data available from the US government.
  • Geonames - Postal codes and other location reference data for almost every country in the world.
  • GRC Data - A source of low-cost customer reference data, including names, addresses, salutations, and more.
  • Regular Expressions - Check the shape of data in profiling software or within your database application.
If you search on the term "download reference data", you will find many other sources.

Tuesday, February 16, 2010

The Secret Ingredient in Major IT Initiatives

One of my first jobs was that of assistant cook at a summer camp.  (In this case, the term ‘cook’ was loosely applied meaning to scrub pots and pans for the head cook.) It was there I learned that most cooks have ingredients that they tend to use more often.  The cook at Camp Marlin tended to use honey where applicable.  Food TV star Emeril likes to use garlic and pork fat.  Some cooks add a little hot pepper to their chocolate recipes – it is said to bring out the flavor of the chocolate.  Definitely a secret ingredient.
For head chefs taking on major IT initiatives the secret ingredient is always data quality technology. Attention to data quality doesn’t make the recipe of an IT initiative alone so much as it makes an IT initiative better.  Let’s take a look at how this happens.

Profiling
No matter what the project, data profiling provides a complete understanding of the data before the project team attempts to migrate it. This can help the project team create a more accurate plan for integration.  On the other hand, it is ill-advised to migrate data to your new solution as-is, as it can lead to major costs over-runs and project delays as you have to load and reload it.

Customer Relationship Management (CRM)
By using data quality technology in CRM, the organization will benefit from a cleaner customer list with fewer duplicate records. Data quality technology can work as a real-time process, limiting the amount of typos and duplicates in the system, thus leading to improved call center efficiency.  Data profiling can also help an organization understand and monitor the quality of a purchased list for integration will avoid issues with third-party data.

Enterprise Resource Planning (ERP) and Supply Chain Management (SCM)

If data is accurate, you will have a more complete picture of the supply chain. Data quality technology can be used to more accurately report inventory levels, lowering inventory costs. When you make it part of your ERP project, you may also be able to improve bargaining power with suppliers by gaining improved intelligence about their corporate buying power. 

Data Warehouse and Business  Intelligence
Data quality helps disparate data sources to act as one when migrated to a data warehouse. Data quality makes data warehouse possible by standardizing disparate data. You will be able to generate more accurate reports when trying to understand sales patterns, revenue, customer demographics and more.

Master Data Management (MDM)
Data quality is a key component of master data management.     An integral part of making applications communicate and share data is to have standardized data.  MDM enhances the basic premise of data quality with additional features like persistent keys, a graphical user interface to mitigate matching, the ability to publish and subscribe to enterprise applications, and more.

So keep in mind, when you decide to improve data quality, it is often because of your need to make a major IT initiative even stronger.  In most projects, data quality is the secret ingredient to make your IT projects extraordinary.  Share the recipe.

Thursday, January 21, 2010

ETL, Data Quality and MDM for Mid-sized Business


Is data quality a luxury that only large companies should be able to afford?  Of course the answer is no. Your company should be paying attention to data quality no matter if you are a Fortune 1000 or a startup. Like a toothache, poor data quality will never get better on its own.

As a company naturally grows, the effects of poor data quality multiply.  When a small company expands, it naturally develops new IT systems. Mergers often bring in new IT systems, too. The impact of poor data quality slowly invades and hinders the company’s ability to service customers, keep the supply chain efficient and understand its own business. Paying attention to data quality early and often is a winning strategy for even the small and medium-sized enterprise (SME).

However, SME’s have challenges with the investment needed in enterprise level software. While it’s true that the benefit often outweighs the costs, it is difficult for the typical SME to invest in the license, maintenance and services needed to implement a major data integration, data quality or MDM solution.

At the beginning of this year, I started with a new employer, Talend. I became interested in them because they were offering something completely different in our world – open source data integration, data quality and MDM.  If you go to the Talend Web site, you can download some amazing free software, like:
  • a fully functional, very cool data integration package (ETL) called Talend Open Studio
  • a data profiling tool, called Talend Open Profiler, providing charts and graphs and some very useful analytics on your data
The two packages sit on top of a database, typically MySQL – also an open source success.

For these solutions, Talend uses a business model similar to what my friend Jim Harris has just blogged about – Freemium. Under this new model, free open source content is made available to everyone—providing the opportunity to “up-sell” premium content to a percentage of the audience. Talend works like this.  You can enhance your experience from Talend Open Studio by purchasing Talend Integration Suite (in various flavors).  You can take your data quality initiative to the next level by upgrading Talend Open Profiler to Talend Data Quality.

If you want to take the combined data integration and data quality to an even higher level, Talend just announced a complete Master Data Management (MDM) solution, which you can use in a more enterprise-wide approach to data governance. There’s a very inexpensive place to start and an evolutionary path your company can take as it matures its data management strategy.

The solutions have been made possible by the combined efforts of the open source community and Talend, the corporation. If you’d like, you can take a peek at some source code, use the basic software and try your hand at coding an enhancement. Sharing that enhancement with community will only lead to a world full of better data, and that’s a very good thing.

Thursday, October 22, 2009

Book Review: Data Modeling for Business


A couple of weeks ago, I book-swapped with author Donna Burbank. She has a new book entitled Data Modeling for Business. Donna, an experienced consultant by trade, has teamed up with Steve Hoberman, a previous published author and technologist and Chris Bradley, also a consultant, for an excellent exploration of the process of creating a data model. With a subtitle like “A handbook for Aligning the Business with IT using a High-Level Data Model” I knew I was going to find some value in the swap.

The book describes in plain English the proper way to create a data model, but that simple description doesn’t do it justice. The book is designed for those who are learning from scratch – those who only vaguely understand what a data model is. It uses commonly understood concepts to describe data model concepts. The book describes the impact of the data model to the project’s success and digs into setting up data definitions and the levels of detail necessary for them to be effective. All of this is accomplished in a very plain-talk, straight-forward tone without the pretentiousness you sometimes get in books about data modeling.

We often talk about the need for business and IT to work together to build a data governance initiative. But many, including myself, have pointed to the communication gap that can exist in a cross-functional team. In order to bridge the gap, a couple of things need to happen. First, IT teams need to expand their knowledge of business processes, budgets and corporate politics. Second, business team members need to expand their knowledge of metadata and data modeling. This book provides an insightful education for the latter. In my book, the Data Governance Imperative, the goal was the former.

The book is well-written and complete. It’s a perfect companion for those who are trying to build a knowledgeable, cross-function team for data warehouse, MDM or data governance projects. Therefore, I’ve added it to my recommended reading list on my blog.

Saturday, September 20, 2008

New Data Governance Books

A couple of new, important books hit the streets this month. I’m adding these books to my recommended reading list.

Data Driven: Profiting from Your Most Important Business Asset is Tom Redmond’s new book making the most of your data to sharpen your company's competitive edge and enhance its profitability. I like how Tom uses real-life metaphors in this book to simplify the concepts of governing your data.

Master Data Management is David Loshin’s new book that provides help for both business and technology managers as they strive to improve data quality. Among the topics covered are strategic planning, managing organizational change and the integration of systems and business processes to achieve better data.

Both Tom and David have written several books on data quality and master data management, and I think their material gets stronger and stronger as they plug in new experiences and reference new strategies.

EDIT: In April of 2009, I also released my own book on data governance called "The Data Governance Imperative".
Check it out.>>

Sunday, March 16, 2008

Data Governance in a Recession

What effect will a recession have on your data governance projects? Some have predicted that the nation will fall into a recession in 2008, although others disagree. In other words, it depends on whom you believe as to our economic fate in 2008. Still, with even the hint that a recession is pending, companies often move to cut costs. These cuts tend to affect major IT initiatives like data governance.

For those of us in the IT and enterprise software business, CFO-thinking runs counter to logic. During revenue-generating high times, IT tends to spend money to deliver automation that either cuts costs and/or improves productivity. So, the money spent delivers something back. However, during tougher economic times, or even when those times are presumed to be around the corner, cost cutting will be on the forefront, preventing us from fixing the inefficiencies. When revenues are good, efficiencies can be made better through IT. When revenues are bad, efficiencies are thrown out the window.

Talk of a recession may slide your plans for big projects like master data management and data governance onto the back burner. Instead, you may be asked to be more tactical – solving problems at a project level rather than an enterprise level. Instead of setting strategy, you may be asked to migrate a database, cleanse a list for a customer mailing, etc. without devoting resources to establishing a clear corporate strategy.

The good news is that times will get better. If and when there is a recession, we most certainly DON’T want to have to rewire and re-do our efforts later on. If you are asked to become more tactical, there are some things to keep in mind that’ll save you strategic frustration:

  • Convince management that data governance will save money, despite the resources needed up-front. Any vendor worth their salt has case studies showing the return on investment and can help you make the case if you bring them into the process early.
  • If you have to stay tactical, make sure the tools and tactics you choose on the project-based initiatives have a life in the strategic initiative. In other words, don’t cut costs on technology that won’t scale. Don’t choose tools that have limitations like lack of global support, poor connectivity, or limited performance if you’ll need those things later. Choosing these tools may hurt you when you want to go enterprise-wide; they’ll get into your plumbing and will be hard to replace. They’ll also get into the minds of your people, potentially requiring training and retraining. Even in tough economic times, you’re setting the standard when you select tools. Don’t let it come back to haunt you when times are good.
  • Make sure you understand all the pieces you need to buy early in the process. Many enterprise vendors require a LOT of different packages to do real-time data quality, for example. Hidden costs can be particularly problematic.
  • Make sure you understand all of the work involved, both in your project and in an enterprise implementation. There are big differences in the effort needed to get things done. Take the services effort into account during scoping.
  • If cutbacks are severe but the need is still great, consider software leasing and SaaS (Software as a Service) to minimize costs. Many vendors offer their enterprise software as a service offering. If times are tough, work with the vendor on alternative ways to purchase.

On another note, I want to thank Beth from the ‘Confessions of a Database Geek’ blog for the mention of my blog this week. If you’re a blogger, you know that every mention by other bloggers gives you street cred, and I am most appreciative of that. It's great to be mentioned by one of the best. Thanks Beth!


Monday, March 10, 2008

Approaching IT Projects with Data Quality in Mind


I co-authored a white paper at the end of 2006 with a simple goal: to talk directly to project managers about the process they go through when putting together a data intensive project. By “data intensive” project, I mean dealing with mergers and acquisitions data, CRM, ERP consolidation, Master Data Management, and any project where you have to move big data.

Project teams can be so focused on application features and functions that they sometimes miss the most important part. In the case of a merger, project teams must often deal with unknown data coming in from the merger that may require profiling at part of their project plan. In the case of a CRM system, companies are trying to consolidate whatever ad hoc system is in place and data from people who may care very little about data quality. In the case of master data management and data governance, the thought of sharing data across the enterprise brings to mind a need for a corporate standard for data. Data intensive projects may have different specific needs, but just remembering that you need to consider data in your project will get you far.

To achieve real success, companies need to plan a way to manage data as part of the project steps. If you don’t think about the data as part of the project preparation, blueprinting, implementation, rollout preparation, go live and maintenance, your project is vulnerable to failure. Most commonly, delay and failure is due to late-project realization that the data has problems. Knowing the data challenges you face early in the process is the key to success.

This white paper discusses the importance and ways to best involve business users in the project to ensure their needs are met. It covers ways to stay in scope on the project while considering the big picture and the going concern of data quality within your organization. Finally, it covers how to incorporate technology throughout a project to expedite data quality initiatives. The white paper is still available today for download. Click here and see "Data Quality Essentials: For Any Data-Intensive Project"

Answer to above: All of them

Thursday, December 20, 2007

MDM Readiness Kit

I'm excited that Trillium Software is now offering a Master Data Management Readiness Kit. It represents some of the best thought leadership pieces that Trillium Software has produced yet, plus a smattering of industry knowledge about master data management. Certainly this kit is worth a download, if you're implementing, or thinking about implementing a master data management strategy at your company.

The kit includes the Gartner Magic Quadrant for Data Quality Tools 2007, a Data Quality Project Planning Checklist for MDM, an UMB Bank Case Study for Data Governance, and section on how to build a Business Case for Data Quality and MDM.

Sunday, December 16, 2007

Data Governance or Magic

Today, I wanted to report on what I have discovered - an extremely large data governance project. The project is shrouded in secrecy, but bits and pieces have come out that point to the largest data governance project in the world. I hesitate to give you the details. This quasi-governmental, cross-secular organization is one of the foundational organizations or our society. Having said that, not everyone recognizes it as an authority.

Some statistics: the database contains over 40 million names in the US alone. In Canada, Mexico, South America, and many countries in Europe, the names and addresses of up to 15% percent of the population is stored in this data warehouse. Along with geospatial information, used to optimize product delivery, there’s a huge amount of transactional data. Customers in the data warehouse are served for up to 12 years, when the trends show that most customers move on and eventually pass their memberships on to their children. Because of the nature of their work, there is sleep pattern information on each individual, as well as a transaction when they do something “nice” for society, or whether they pursue more “naughty” actions. For example, when the individual exhibits emotional outbursts, such as pouting or crying, this kicks off a series of events that affect a massive manufacturing facility and supply chain, staffed by thousands of specialty workers who adjust as the clients’ disposition reports come into the system. Many of the clients are simply delivered coal, but other customers receive the toy, game, new sled, of their dreams. Complicating matters even more, the supply chain must deliver all products on a single day each year, December 25th.

I am of course talking about the implementation managed by Kris Kringle at the North Pole. I tried to find out more about the people, processes and products in place, but apparently there is a custom application in place. According to Mr. Kringle, “Our elves use ‘magic’ to understand our customers and manage our supply chain, so there is no need for Teradata, SAP, Oracle, Trillium Software any other enterprise application in this case. Our magic solution has served us well for many years, and we plan to continue with this strategy for years to come.” If only we could productize some of that Christmas magic.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.