Thursday, May 17, 2012

Naming your Data Management Project


 In my line of work, I get to see many requests for proposals and sometimes I am invited to take part when a project is progressing.  I may be one of the only people on earth who gets pleasure in companies improving their data management strategy because I almost always see a huge return on investment. We’re making the world a better place by managing data the right way, so thanks to those who have made me part of your project.


I do have one word of advice for project managers, however. Please think when you name your projects. I can’t tell you how many times I’ve come into a project where some long description is the name of a project and it soon becomes and equally uncompelling acronym.  They are project names like:

  • Salesforce Marketing Analyst Data Mart and Sales Marketing Information Daily Audit or you can go by the catchy acronym SMADMASMIDA
  • Outlook Sales Partner Contact Daily Reconciliation or OSPCDR
  • Operational Business Intelligence for Marketing Analytics or OBIMA

The names and their acronyms are pretty close to meaningless.  People will be more excited by references to the news and pop culture than by intellectual terminology. It matters. Using the technical terms put you in an elitist club of IT, and remember, we’re trying to break down the barriers between business and IT.

Some examples:

  • Any Business Intelligence project today that doesn’t have the name ‘Moneyball’ in the title is missing a huge opportunity.  Everyone knows what the movie Moneyball is about and the way that the Oakland A’s used business intelligence to win. Easy sale of your project to business.
  • Big Data initiatives could be named after Adele’s “Rolling in the Deep”.  Rolling in the Deep is what a ship does while out at sea. The image is a small ship tossed on a very deep, dark ocean (of data).
  • The song title is an adaptation of a British slang phrase “roll deep” which means to have a group who always has your back, who can get you out of trouble. It’s a nice image to signify the pervasiveness of data, the fact that there is strength in numbers and for data governance.  

Of course, pop culture is a good way to start, but company culture and the history of your organization are also great inspiration for naming your project.   Given the French background of Talend, my current employer, a name for a data consolidation project might be something like ‘Pas de Deux’ which promotes a vision of a relationship between two people or things.

The point is, try to use the name of the project to promote a vision of the business problem you’re trying to solve.  It’ll play better with the business folks. The name matters.

Monday, April 2, 2012

Why Code Base is Important in Vendor Selection


The horticulture of software

Spring has sprung here in the northern hemisphere and mind turn to the plant life that will be sprouting all across our home towns. The new growth has me thinking about the similarities between horticulture and the code base of our data management solutions.

Reviewing software solutions before you buy is a major effort for users and/or vendor selection committees. Much time is spent on looking at whether the features of the product will meet team needs. Features are so important that companies will spend time to produce RFPs with extensive feature lists. They may even require a proof of concept; the vendor must install and test the solution in the purchaser’s work environment. This goes for those applications used to manage data, but also many other applications.

However, I believe that buyers should carefully look at the style of growth to the code base. In the data management field, we undergone decades of technology combined with decades of market consolidation.  The code base for the application you’re about to buy may have grown from the following horticultural strategy:

  • Grafting  –  A large software company sees potential in the data management field and begins to acquire companies and grafting them together to create a solution. Sometimes the acquisition isn’t done by technologists, but by upper management seeking to fill holes in the product line. Sometimes they even buy competing technologies, leaving everyone trying to figure out who will win. Sometimes the graft doesn’t take.
  • Old Growth – Companies have an existing technology that has worked for decades. However, back in 1990 when they released version 1.0, JAVA was experimental and not the dominant force it is today.  FORTRAN was the preferred programming language and COBOL copybooks were the data model.  I know some companies in the data management market have spent millions updating old growth code to be more competitive in this market, and some others who have not.  This becomes a dilemma for all vendors at some point.  When do you prune out the dead wood?
  • Sapling – Companies who are just breaking into the data management marketplace and have a good-looking start for data management.  However, the sapling doesn’t yet have all the branches you want on it.  Will the sapling survive among the other deciduous solutions in the market?

When you’re selecting a vendor, you ideally want a code base that is mature, but not too mature.  You want limited grafting.   The growth of the code and the grafting affects:

  • Speed of innovation for the vendor
  • Customization for you
  • Future expansion for both of you
  • The age and experience of the technologists necessary to operate it
  • Consulting requirements
  • Ability to cross-train personnel (E.g. DI people running DQ and vice versa)

So, when you’re selecting a data management solution, or any technology solution, don’t just compare the features, but take a look at how the product grew to where it is today.  Look for the solution in the optimal stage of growth that will meet your needs today and those for the future.


Thursday, March 22, 2012

Big Data Hype is an Opportunity for Data Management Pros

Big Data is a hot topic in the data management world. Recently, I’ve seen press and vendors describing it with the words crucial, tremendous opportunity, overcoming vexing challenges, and enabling technology.  With all the hoopla, management is probably asking many of you about your Big Data strategy. It has risen to the corporate management level; your CxO is probably aware.

Most of the data management professionals I’ve met are fairly down-to-earth, pragmatic folks.  Data is being managed correctly or not. The business rule works, or it does not. Marketing spin is evil. In fact, the hype and noise around big data may be something to be filtered by many of you. You’re appropriately trying to look through the hype and get to the technology or business process that’s being enhanced by Big Data.
However, in addition to filtering through the big data hype to the IT impact, data management professionals should also embrace the hype.

Sure, we want to handle the high volume transactions that often come with big data, but we still have relational databases and unstructured data sources to deal with.  We still have business users using Excel for databases with who-knows-what in them.  We still have e-mail attachments from partners that need to be incorporated into our infrastructure.  We still have a wide range of data sources and targets that we have to deal with, including, but not limited to, big data. In my last blog post, I wrote about how big data is just one facet of total data management.

The opportunity is for data management pros to think about their big data management strategy holistically and solve some of their old and tired issues around data management. It’s pretty easy to draw a picture for management that Big Data needs to take a Total Data Management approach.  An approach that includes some of our worn-out and politically-charged data governance issues, including:


  • Data Ownership – One barrier to big data management is accountability for the data.  By deciding you are going to plan for big data, you also need to make decisions about who owns the big data, and all your data sets for that matter.
  • Spreadmarts – Keeping unmanaged data out of spreadsheets is increasingly more crucial in companies who must handle Big Data. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. However, big data can help make it easy for everyone to use corporate information, no matter what size.
  • Unstructured Data – Although big data might tend be more analytical than operational, big data is most commonly unstructured data.  A total data management approach takes into account unstructured data in either case. Having technology and processes that handles unstructured data, big or small, is crucial to total data management.
  • Corporate Strategy and Mergers – If your company is one that grows through acquisition, managing big data is about being able to handle, not only your own data, but the data of those companies you acquire.  Since you don’t know what systems those companies will have, a big data governance strategy and flexible tools are important to big data.


My point is, with big data, try to avoid the typical noise filtering exercises you normally take on the latest buzzword.  Instead, use the hype and buzz to your advantage to address a holistic view of data management in your organization.

Tuesday, January 24, 2012

Big Data, Enterprise Data and Discrete Data

Total Data Management©
The data management world is buzzing about big data.  Many are the number of blog posts articles and white papers covering this new area. Just about every data management vendor is scrambling to build tools to meet the needs of big data.

The world is correct to pay notice. The ability for companies to handle big data represents exciting innovation where large relational databases with high price tags are sometimes replaced with flat files, technologies like Hadoop and intelligent parsers to create analytics from massive amounts of data.  It’s a game-changer for those in the Business Intelligence and relational database business.  It’s about managing an increasingly common huge data problem more effectively and at lower cost.

However, where there is big data, there is also enterprise (medium) data and discrete (small) data. With each size of data come very specific challenges.   



BIG DATA
ENTERPRISE DATA
DISCRETE DATA
Technologies
Hadoop and flat files to reduce costs and avoid relational database costs.
Relational databases
Spreadsheets and flat files and flat databases. May come from other non-relational sources, such as e-mail attachments, social media JSON, and XML data.
Use Cases
Real-time analytics of a large number of transactions, including web analytics, SaaS up-time optimization, mission-critical analysis of transactions
Just about every business application today, including CRM, ERP, Data Warehouse, and MDM.
Companies with no or little data management strategy, or for those companies dealing with immature data architecture. Companies who receive mission-critical data via e-mail.  Companies who need to closely follow social media streams.
Innovation
Handles huge amounts of data that is predominantly used for business analytics and operational BI.
Provides a power data management architecture that can be accessed by a common language (SQL).
Handles more diverse and more dynamic sources.
Positives
Replaces high cost multi-server relational databases with lower costs flat files and Hadoop server farms.
Provides a scalable, reproducible environment in which database applications and solutions can be developed. Replaces unwieldy human-intensive data processes with streamlined central repository of information. Used in many businesses in day-to-day operations.
‘Simplifies’ the data management process to the point of being completely within the grasp of the business users without too much complicated technology.  In the long run, however, data management is more costly and unwieldy when it is in spreadmarts.
Negatives
Relatively new technology with limited pool of Big Data experts. Legacy medium-sized systems can sometimes scale.
Can be costly when data volumes become high, as new servers and new enterprise licenses get more common.  Also, the number of sources and diversity of data types.
Error-prone and labor intensive.
Cost Focus
Expertise
Servers and licenses/ Connectors and database technology
Efficiency and productivity























Growing Up
An organization’s data management maturity plays a role in big and little data.  If you’re still managing your customer list in a spreadsheet, it’s probably something you started when your company was fairly young.  Now, the uses for the data should be expanded and you are still stuck in the young company’s process. Something that was agile when you were young is inefficient today.

Your pain may also have something to do with your partners’ data management maturity.  While the other companies you do business with are good at what they do, supplying products and services to your company, they may not be as good at data management. The new parts catalog comes every so often as an e-mail attachment.  You need an efficient process to update whoever uses it.

No matter how mature you are, it is likely that you will have to deal with all types of data. When selecting tools, make sure you examine the cost and efficiency of all of these types, not just big data.


Tuesday, January 10, 2012

What is Data Governance?

I recently did a quick movie for a Talend promotion to define data governance. It turns out that defining data governance is trickier than you think. Here, I examine the characteristics of data management initiative and how they define data governance.

Saturday, November 12, 2011

The ‘Time’ Factor in Data Management

I've been thinking about how many ways time influences the data management world. When it comes to managing data, we think about improving processes, coercing the needs and desires of people and how technology comes to help us manage it all. However, an often overlooked aspect of data management is time. Time impacts data management from many different directions.

Time Means Technology Will Improve
As time marches on, technology offers twists and turns to the data steward through innovation.  20 years ago, mainframes ruled the world.  We’ve migrated through relational databases on powerful servers to a place where we see our immediate future in cloud and big data. As technology shifts, you must consider the impact of data.

The good news is that with these huge challenges, you also get access to new tools.  In general, tools have become less arcane and more business-user focused as time marches on. 

Time Causes People to Change

Like changes in technology, people also mature, change careers, retire. With regard to data management, the corporation must think about the expertise needed to complete the data mission. Data management must pass the “hit by a bus” test where the company would not suffer if one or more key people were to be hit by a Greyhound traveling from Newark to Richmond.

Here, time is requiring us to be more diligent in documenting our processes.  It is requiring us to avoid undocumented hand-coding and pick a reproducible data management platform.  It helps to have third-party continuity, like consultants who, although will also experience changes in personnel, will change on a different schedule than their clients.

Time Leads to Clarity in the Imperative of Data Management

With regard to data management, corporations have a maturity process they go through. They often start as chaotic immature organizations and realize the power of data management in a tactical maturity stage. Finally, they realize data management is a strategic initiative when they begin to govern the data.  Throughout it all, people, process and technologies change.

Knowing where you are in this maturity cycle can help you plan where you want to go from here and what tactics you need to put in place to get there. For example, very few companies go from chaotic, ad hoc data management to full-blown MDM. For the most part, they get there through making little changes, seeing the positive impact of the little changes and wanting more. Rather, a chaotic organization might be more apt to evolve their data management maturity by consolidating two or more ERP systems and revel in its efficiency.

Time Prevents Us from Achieving Successful Projects
When it comes to specific projects, taking too much time can lead to failure in projects.  In the not so distant past, circa 2007, the industry commonly took on massive, multi-year, multimillion dollar MDM projects. We now know that these projects are not the best way to manage data. Why? Think about how much your own company has changed in the last two years.  If it is a dynamic, growing company, it likely has different goals, different markets, different partners and new leadership. The world has changed significantly, too.  Today’s worldwide economy is so much different that even one year ago. (Have you heard about the recession and European debt crisis?) The goals of a project that you set up two years ago will never achieve success today.

Time makes us take an agile approach to data management. It requires that we pick off small portions of our problems, solve them, prove value and re-use what we’ve learned on the next agile project.  Limit and hold scope to achieve success.

Time Achieves Corporate Growth (which is counter to data management)
Companies who are just starting out generally have fewer data management problems than those who are mature. Time pushes our data complexity deeper and deeper. Therefore time dictates that even small companies should have some sort of data management strategy.  The good news is that now achievable with help from open source and lower cost data management solutions. Proper data management tools are affordable by both Fortune 1000 and small to medium-sized enterprises.

Time Holds Us Responsible
That said, the longer a corporation is in business, the longer it can be held responsible for lower revenue, decreased efficiency and lack of compliance due to poor data management. The company decides how it is going to govern (or not govern) data, what data is acceptable in the CRM and who is responsible for the mistakes that happen due to poor data management. The longer you are in business, the more responsible the corporation is for its governance. Time holds us responsible if the problems aren’t solved.

Time and Success Lead to Apathy

Finally, time often brings us success in data management.  With success, there is a propensity for corporations to take the eye off the prize and spend monies on more pressing issues.  Time and success can lead to a certain apathy, believing that the data management problem is solved.  But, as time marches on, new partners, new data sources, new business processes. Time requires us to be ever vigilant in our efforts to manage data.

Wednesday, August 31, 2011

Top Ten Root Causes of Data Quality Problems: Part Five

Part 5 of 5: People Issues
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them.  Companies rely on data to make significant decisions that can affect customer service, regulatory compliance, supply chain and many other areas. As you collect more and more information about customers, products, suppliers, transactions and billing, you must attack the root causes of data quality. 

Root Cause Number Nine: Defining Data Quality

More and more companies recognize the need for data quality, but there are different ways to   clean data and improve data quality.   You can:
  • Write some code and cleanse manually
  • Handle data quality within the source application
  • Buy tools to cleanse data
However, consider what happens when you have two or more of these types of data quality processes adjusting and massaging the data. Sales has one definition of customer, while billing has another.  Due to differing processes, they don’t agree on whether two records are a duplicate.

Root Cause Attack Plan
  • Standardize Tools – Whenever possible, choose tools that aren’t tied to a particular solution. Having data quality only in SAP, for example, won’t help your Oracle, Salesforce and MySQL data sets.  When picking a solution, select one that is capable of accessing any data, anywhere, at any time.  It shouldn't cost you a bundle to leverage a common solution across multiple platforms and solutions.
  • Data Governance – By setting up a cross-functional data governance team, you will have the people in place to define a common data model.

Root Cause Number Ten: Loss of Expertise

On almost every data intensive project, there is one person whose legacy data expertise is outstanding. These are the folks who understand why some employee date of hire information is stored in the date of birth field and why some of the name attributes also contain tax ID numbers. 
Data might be a kind of historical record for an organization. It might have come from legacy systems. In some cases, the same value in the same field will mean a totally different thing in different records. Knowledge of these anomalies allows experts to use the data properly.
If you encounter this situation, there are some business processes you can follow.

Root Cause Attack Plan
  • Profile and Monitor – Profiling the data will help you identify most of these types of issues.  For example, if you have a tax ID number embedded in the name field, analysis will let you quickly spot it. Monitoring will prevent a recurrence.
  • Document – Although they may be reluctant to do so for fear of losing job security, make sure experts document all of the anomalies and transformations that need to happen every time the data is moved.
  • Use Consultants – Expert employees may be so valuable and busy that there is no time to document the legacy anomalies. Outside consulting firms are usually very good at documenting issues and providing continuity between legacy and new employees.

This post is an excerpt from a white paper available here. More to come on this subject in the days ahead.

See also:


Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.