Tuesday, May 4, 2010
Are we ready for a National ID card in America?
However, I have my doubts that a group of Senators can really understand the enormous challenges of such a project. The issue is a politically charged one for certain, so that will be the focus. The details, which we all know contain the devil, will likely be forgotten.
I recall just a short time ago the US government’s Cash for Clunkers program. The program involved buying a new car and turning in your old “clunker” for a new fuel-efficient one. The idea was to support the auto industry and get the gas guzzlers off the road. The devil was in the details, however. Rather than a secure web site with sufficient backbone to properly serve car dealerships, the program required the dealers complete pages and pages of paperwork… real paper paperwork… and fax it into the newly formed government agency for approval. Then they hired workers on other end to enter the data. It was a business process that would have been appropriate for 1975, not 2010.
ACLU legislative counsel Christopher Calabrese said of this National ID program that “all of this will come with a new federal bureaucracy — one that combines the worst elements of the DMV and the TSA”. Based on recent history it’s an accurate description of what will likely happen.
If the government wants to do this thing, they need to bring in a dream team of database experts. Guys like Dr. Ralph Kimball or Bill Inmon, both of whom are world renown for data modeling, should contribute if they are willing. They should ask in Dr. Rich Wang from MIT’s IQ program to be in charge of information quality issues. They should invite guys like Jim Harris to communicate the complex issues to the public. Also, they need to bring in folks with practical experience, like a Jill Dyche or Gwen Thomas. There are probably some others that I haven’t mentioned. Security experts, hardware scalability experts and business process experts need to be part of the mix to protect the citizenry of the United States. They would need to make a plan without bias toward any district or political action committee. That’s why a national database won’t happen.
Don’t get me wrong, if we do so, we could come up with much more efficient systems for checking backgrounds, I-9 job verification, international travel, and more. Identity theft is a big problem here and everywhere, but with a central citizen repository, the US could legislate a notification system when new bank accounts are opened in your name. The census would always show a more accurate number and wouldn't cost billions and billions of dollars to us every ten years. Let's face it, the business process of the census, mailing paper forms and personal door to door interviews, is outdated.
Let’s start this by making it voluntary. If you want to be in the database and avoid long lines at the airport, fine. If you want to be anonymous and wait, that’s fine, too. We’ll get the kinks worked out with the early adopters and roll it out to the laggards later.
What we’re really talking about here is a personal primary key. That data already exists in multiple linkable systems with your name and addresses (past and present) linking it. We as data professionals spend a lot of time and effort working with data to try to find these links. So why not have a primary key to link your personal data instead? Are you really giving up anything that DBAs haven't already figured out?
For those of you against a national database, I don’t think you have anything to fear. Call me a skeptic, but given the political divide between groups, it’s unlikely that any national database of citizens will be done within this decade. But if you’re listening Senators and you decide to move forward, make sure you have the right people, processes and technology in place to do it right.
Friday, April 9, 2010
Links from my eLearning Webinar
I recently delivered a webinar on the Secrets of Affordable Data Governance. In the webinar, I promised to deliver links for lowering the costs of data management. Here are those links:
- Talend Open Source - Download free data profiling, data integration and MDM software.
- US Census - Download census data for cleansing of city name and state with latitude and longitude appends.
- Data.gov - The data available from the US government.
- Geonames - Postal codes and other location reference data for almost every country in the world.
- GRC Data - A source of low-cost customer reference data, including names, addresses, salutations, and more.
- Regular Expressions - Check the shape of data in profiling software or within your database application.
Friday, April 2, 2010
Donating the Data Quality Asset
It’s clear that everyone organization, no matter what the size or influence, can benefit from properly managing their data. Even charitable organizations can benefit with a cleaner customer list to get the word out when they need donations. Non-profits who handle charitable goods can benefit from better data in their inventory management. If food banks had a better way of managing data and soliciting volunteers, wouldn’t more people be fed? If churches kept better records of their members, would their positive influence be more widespread? If organizations who accept goods in donation kept a better inventory system, wouldn’t more people benefit? The data asset is not limited to Fortune 1000 companies, but until recently, solutions to manage data properly were only available to the elite.
Open source is coming on strong and is a factor that eases us to donate the data quality. In the past, it many have been a challenge to get mega-vendors to donate high-end solutions, but we can make significant progress on the data quality problem with little or no solutions cost these days. Solutions like Talend Open Profiler, Talend Open Studio, Pentaho and DataCleaner offer data integration and data profiling.
In my last post, I discussed the reference data that is now available for download. Reference data used to be proprietary and costly. It’s a new world – a better one for low-cost data management solutions.
Can we save the world through data quality? If we can help good people spread more goodness, then we can. Let’s give it a try.
Monday, February 22, 2010
Referential Treatment - The Open Source Reference Data Trend
Reference data is not limited to customer address, however. If everyone were to use the same reference data for parts, you could easily exchange procurement data between partners. If only certain values are allowed in any given table, it would support validation. By having standards for supply chain data, procurement, supply chain, finance and accounting data, processes are more efficient. Organizations like the ISO and ECCMA are working on that.
Availability of Reference Data
In the past, it was difficult to get your hands on reference data. Long ago, no one wanted to share reference data with you - you had to send your customer data to a service provider and get the enriched data back. Others struggled to develop reference data on their own. Lately I’m seeing more and more high quality reference data available for free on the Internet. For data jockeys, these are good times.
GeoNames
A good example of this is GeoNames. The GeoNames geographical database is available for download free of charge under a creative commons attribution license. According to the web site, it “aggregates over 100 different data sets to build a list containing over eight million geographical names and consists of 7 million unique features whereof 2.6 million populated places and 2.8 million alternate names. The data is accessible free of charge through a number of web services and a daily database export. “
GeoNames combines geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Like Wikipedia, users may manually edit, correct and add new names.
US Census Data
Another rich set of reference data is the US Census “Gazetteer” data. Courtesy of the US government, you can download a database with the following fields:
- Field 1 - State Fips Code
- Field 2 - 5-digit Zipcode
- Field 3 - State Abbreviation
- Field 4 - Zipcode Name
- Field 5 - Longitude in Decimal Degrees (West is assumed, no minus sign)
- Field 6 - Latitude in Decimal Degrees (North is assumed, no plus sign)
- Field 7 - 2000 Population (100%)
- Field 8 - Allocation Factor (decimal portion of state within zipcode)
- "25","02026","MA","DEDHAM",71.163741,42.243685,23782,0.003953
When I talk about reference data at parties, I immediately see eyes glaze over and it’s clear that my fellow party-goers want to escape my enthusiasm for it. But this availability of reference data is really great news! Together with the open source data integration tools like Talend Open Studio, we’re starting to see what I like to call “open source reference data” becoming available. It all makes the price of improving data quality much lower and our future much brighter.
There’s so much to talk about with regard to reference data and so many good sources. I plan to make more posts on this topic, but feel free to post your beloved reference data sources here in the comments section.
Tuesday, February 16, 2010
The Secret Ingredient in Major IT Initiatives
For head chefs taking on major IT initiatives the secret ingredient is always data quality technology. Attention to data quality doesn’t make the recipe of an IT initiative alone so much as it makes an IT initiative better. Let’s take a look at how this happens.
Profiling
No matter what the project, data profiling provides a complete understanding of the data before the project team attempts to migrate it. This can help the project team create a more accurate plan for integration. On the other hand, it is ill-advised to migrate data to your new solution as-is, as it can lead to major costs over-runs and project delays as you have to load and reload it.
Customer Relationship Management (CRM)
By using data quality technology in CRM, the organization will benefit from a cleaner customer list with fewer duplicate records. Data quality technology can work as a real-time process, limiting the amount of typos and duplicates in the system, thus leading to improved call center efficiency. Data profiling can also help an organization understand and monitor the quality of a purchased list for integration will avoid issues with third-party data.
Enterprise Resource Planning (ERP) and Supply Chain Management (SCM)
If data is accurate, you will have a more complete picture of the supply chain. Data quality technology can be used to more accurately report inventory levels, lowering inventory costs. When you make it part of your ERP project, you may also be able to improve bargaining power with suppliers by gaining improved intelligence about their corporate buying power.
Data Warehouse and Business Intelligence
Data quality helps disparate data sources to act as one when migrated to a data warehouse. Data quality makes data warehouse possible by standardizing disparate data. You will be able to generate more accurate reports when trying to understand sales patterns, revenue, customer demographics and more.
Master Data Management (MDM)
Data quality is a key component of master data management. An integral part of making applications communicate and share data is to have standardized data. MDM enhances the basic premise of data quality with additional features like persistent keys, a graphical user interface to mitigate matching, the ability to publish and subscribe to enterprise applications, and more.
So keep in mind, when you decide to improve data quality, it is often because of your need to make a major IT initiative even stronger. In most projects, data quality is the secret ingredient to make your IT projects extraordinary. Share the recipe.
Monday, February 1, 2010
A Data Governance Mission Statement
Every organization, including your data governance team has a purpose and a mission. It can be very effective to communicate your mission in a mission statement to show the company that you mean business. When you show the value of your team, it can change your relationship with management for the better.
The mission statement should pay tribute to the mission of the organization with regard to values, while defining why the data governance organization exists and setting a big picture goal for the future.
The data governance mission statement could revolve around any of the following key components:
- increasing revenue
- lowering costs
- reducing risks (compliance)
- meeting any of the organization’s other policies such as being green or socially responsible
The most popular format seems to follow:
Our mission is to [purpose] by doing [high level initiatives] to achieve [business benefits]
So, let’s try one:
Our mission is to ensure that the highest quality data is delivered via company-wide data governance strategy for the purpose of improving the efficiency, increasing the profitability and lowering the risk of the business units we serve.Flopped around:
Our mission is to improve the efficiency, increase the profitability and lower the business risks to Acme’s business units by ensuring that the highest quality data is delivered via company-wide data governance strategy.Not bad, but a mission statement should be inspiring to the team and to management. Since the passions of the company described above are unknown, it’s difficult for a generic mission statement to be inspirational about the data governance program. That’s up to you.
Goals & Objectives
There are mission statements and there are objectives. While every mission statement should say who you are and why you exist, every objective should specify what you’re going to do and the results you expect. Objectives include activities that can be easily tracked, measured, achieved and, of course, meet the objectives of the mission. When you start data governance projects, you can look back to the mission statement to make sure we’re on track. Are you using our people and technology in a way that will benefit the company?
Staying On Mission
When you take on a new project, the mission statement can help protect us and ensure that the project is worthwhile for both the team and the company. The mission statement should be considered as a way to block busy-work and unimportant projects. In our mission statement example above, if the project doesn’t improve efficiency, lower costs or lower business risk, it should not be considered.
In this case, your can clearly map three projects to the mission, but the fourth project is not as clear. Dig deeper into the mainframe project to see if any efficiency will come out of the migration. Is the data being used by anyone for a business purpose?
A Mission Never Ends
A mission statement is a written declaration of a data governance team's purpose and focus. This focus normally remains steady, while objectives may change often to adapt to changes in the business environment. A properly crafted mission statement will serve as a filter to separate what is important from what is not and to communicate your value to the entire organization.
.
Thursday, January 21, 2010
ETL, Data Quality and MDM for Mid-sized Business
As a company naturally grows, the effects of poor data quality multiply. When a small company expands, it naturally develops new IT systems. Mergers often bring in new IT systems, too. The impact of poor data quality slowly invades and hinders the company’s ability to service customers, keep the supply chain efficient and understand its own business. Paying attention to data quality early and often is a winning strategy for even the small and medium-sized enterprise (SME).
However, SME’s have challenges with the investment needed in enterprise level software. While it’s true that the benefit often outweighs the costs, it is difficult for the typical SME to invest in the license, maintenance and services needed to implement a major data integration, data quality or MDM solution.
At the beginning of this year, I started with a new employer, Talend. I became interested in them because they were offering something completely different in our world – open source data integration, data quality and MDM. If you go to the Talend Web site, you can download some amazing free software, like:
- a fully functional, very cool data integration package (ETL) called Talend Open Studio
- a data profiling tool, called Talend Open Profiler, providing charts and graphs and some very useful analytics on your data
For these solutions, Talend uses a business model similar to what my friend Jim Harris has just blogged about – Freemium. Under this new model, free open source content is made available to everyone—providing the opportunity to “up-sell” premium content to a percentage of the audience. Talend works like this. You can enhance your experience from Talend Open Studio by purchasing Talend Integration Suite (in various flavors). You can take your data quality initiative to the next level by upgrading Talend Open Profiler to Talend Data Quality.
If you want to take the combined data integration and data quality to an even higher level, Talend just announced a complete Master Data Management (MDM) solution, which you can use in a more enterprise-wide approach to data governance. There’s a very inexpensive place to start and an evolutionary path your company can take as it matures its data management strategy.
The solutions have been made possible by the combined efforts of the open source community and Talend, the corporation. If you’d like, you can take a peek at some source code, use the basic software and try your hand at coding an enhancement. Sharing that enhancement with community will only lead to a world full of better data, and that’s a very good thing.







