In this YouTube tutorial for Talend, I walk through some techniques for profiling ERP, SCM and materials master data using Talend Open Profiler. In addition to basic profiling, the correlation analysis feature can be used to identify relationships between part numbers and descriptions.
Saturday, August 28, 2010
ERP and SCM Data Profiling Techniques
Monday, August 16, 2010
Data Governance and Data Quality Insider 100th
I have reached my 100th post milestone. I hope you won't mind if I get a little introspective here and tell you a little about my social media journey over these past three years.
How did I get started? One day back in 2007, I disagreed with Vince McBurney’s post (topic unimportant now). I responded and Vince politely told me to shut up and if I really wanted to have an opinion to write my own blog. I did. Thanks for the kick in the pants, Vince.
Some of my most popular posts over these past three years have been:
- Probabilistic Matching: Sounds like a good idea, but…
Here, I take a swipe at the sanctity of probabilistic matching. I probably have received the most hate-mail from this post. My stance still is that a hybrid approach to matching, using both probabilistic and deterministic is key to getting match results. Probabilistic alone is not the solution.
- Data Governance and the Coke Machine Syndrome
I recount a parable given to me by a well-respected boss in my past about meeting management. Meetings can take unexpected turns where huge issues can be settled in minutes, while insignificant ones can eat up the resources of your company. I probably wrote it just after a meeting.
- Data Quality Project Selection
A posting about picking the right data quality projects to work on.
- The “Do Nothing” Option
A posting the recounts a lesson I learned about selling the power of data quality to management.
Blogging has not always been easy. I’ve met some opposition to along the way. There were times when my blogging was perceived as somehow threatening to corporate. At the time, blogging was new and corporations didn't know how to handle it. More companies now have definitive blogging policies and realize the positive impact it has.
What about the people I’ve met? I’ve gained a lot of friendships along the way with people I’ve yet to meet face-to-face. We’re able to build a community here in cyberspace – a data geek community that I am very fond of. I’m hesitant to write a list because I don’t want to leave anyone out, but you know who you are.
If you're thinking of blogging, please, find something you’re passionate about and write. You’ll have a great time!
Thursday, August 12, 2010
Change Management and Data Governance
As I read through the large amount of information on change management, I’m struck by the parallels between change management and data governance. The focus is on processes. It ensures that no matter what changes happen in a corporation, whether it’s downsizing or rapid growth, significant changes are implemented in an orderly fashion and make everyone more effective.
On the other hand, humans are resistant to change. Change management aims to gain buy-in from management to achieve the organization's goal of an orderly and effective transformation. Sound familiar? Data governance speaks to this ability to manage data properly, no matter what growth spurts, mergers or downsizing occurs. It is about changing the hearts and minds of individuals to better manage data and achieve more success while doing so.
Change Management Models
As you examine data governance models, look toward change management models that have been developed by vendors and analysts in the change management space. One that struck my attention was the ADKAR model developed by a company called Prosci. In this model, there are five specific stages that must be realized in order for an organization to successfully change. They include:
- Awareness - An organization must know why a specific change is necessary.
- Desire - The organizational must have the motivation and desire to participate in the call for change.
- Knowledge – The organization must know how to change. Knowing why you must change is not enough.
- Ability - Every individual in the company must implement new skills and processes to make the necessary changes happen.
- Reinforcement - Individuals must sustain the changes, making them the new behavior, averting the tendency to revert back to their old processes.
I often talk about business users and IT working together to solve the data governance problem. By looking at the extensive information available on change management, you can learn a lot about making changes for data governance.
Monday, August 9, 2010
Data Quality Pro Discussion
Last week I sat down with Dylan Jones of DataQualityPro.com to talk about data governance. Here is the replay. We discussed a range of topics including organic governance approaches, challenges of defining data governance, industry adoption trends, policy enforcement vs legislature and much more.
![]() |
| Link |
Friday, July 30, 2010
Deterministic and Probabilistic Matching White Paper
The white paper is sort of a primer containing elementary principles of record matching, As the description says, it outlines the basic theories and strategies of record matching. It describes the nuances of deterministic and probabilistic matching and the algorithms used to identify relationships within records. It covers the processes to employ in conjunction with matching technology to transform raw data into powerful information that drives success in enterprise applications like CRM, data warehouse and ERP.
Wednesday, July 28, 2010
DGDQI Viewer Mail
Occasionally, I see things in my in-box that bother me, though. Here is one item that I’ll address in a post. The names have been changed to protect the innocent.
A public relations firm asked:
Hi Steve,
I wonder if you could answer these questions for me.
- What are the key business drivers for the advent of data governance software solutions?
- What industries can best take advantage of data governance software solutions?
- Do you see cloud computing-based data governance solutions developing?
I couldn’t answer these questions, because they all pre-supposed that data governance is a software solution. It made me wonder if I have made myself clear enough on the fact that data governance is mostly about changing the hearts and minds of your colleagues to re-think their opinion of data and its importance. Data governance is a company’s mindful decision that information is important and they’re going to start leveraging it. Yes, technology can help, but a complete data governance software solution would have more features than a Workchamp XL Swiss Army Knife. It would have to include data profiling, data quality, data integration, business process management, master data management, wikis, a messaging platform, a toothpick and a nail file in order to be complete.
Can you put all this on the cloud? Yes. Can you put the hearts and minds of your company on a cloud? If only it were that easy...
Wednesday, July 21, 2010
Lemonade Stand Data Quality
My children expressed interest in opening up a lemonade stand this weekend. I’m not sure if it’s done worldwide, but here in America every kid between the age of five and twelve tries their hand at earning extra money during the summer months. Most parents in America indulge this because the whole point of a lemonade stand is really to learn about capitalism. You figure out your costs, how much the lemonade, ice and cups cost, then you charge a little more than what it costs you. At the end of the day, you can hope to show a little profit.
I couldn’t help but think there are lessons we can learn from the lemonade stand that apply to the way we manage our own data quality initiatives. Data governance programs and data quality projects are still driven by capitalism and lemonade stand fundamentals.
- Concept – While the lemonade stand requires your audience to have a clear understanding of the product and the price, so does data quality. In the data world, profiling can help you create an accurate assessment of it and tell the world exactly what it is and how much it’s going to cost.
- Marketing – My kids proved that more people will come to your lemonade stand if you shout out “Ice Cold Lemonade” and put a few flyers around the neighborhood. Likewise you need to tell management, business people and anyone who will listen about data quality – it’s ice cold and delicious.
- Pricing – A lemonade stand works by setting the right price. Too little and the profit will be too low, too high and no one will buy. In the data quality world, setting the scope with the proper amount of spend and the right amount of return on investment will be successful.
- Location – While a busy street and a hot day make a profitable lemonade stand, data quality project managers know that you begin by picking the projects with the least effort and highest potential ROI. In turn, you get to open more lemonade stands and build your data quality projects into a data governance program.
When it comes down to it, data quality projects are a form of capitalism; you need to sell the customers a refreshing glass and keep them coming back for more.






