When you think of Syncsort, you think of, well… sorting. SyncSort offers their flagship product - a high-performance sort utility - that has been used for years to decrease processing time for large volumes of data. In the case of multiple customer databases, for example, you may want to sort the files different ways and compare them on many different keys. Sorting on multiple keys is a very resource-intensive data processing function, so maximizing sorting speed and efficiency is crucial.
SyncSort’s sheer performance is made possible by a fast, but proprietary sorting algorithm. Because of that performance boost, many Trillium Software customers use Syncsort sorting as part of their batch data quality processes.
On the other hand, when your company is named after what you do, it’s hard to change what you do. Syncsort's DMExpress has little to do with sorting, but instead is the company's low cost ETL tool. Trillium Software recently announced connectivity between Syncsort and the Trillium Software System. Trillium Software’s fast, scalable data cleansing combined with Syncsort’s fast scalable ETL makes for a great pairing.
I’m fascinated by some of the metrics that Syncsort has posted on their web site. An independent benchmark claims that it’s the fastest ETL ever. DMExpress extracted, transformed, cleansed and loaded 5.4 TB of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds, using HP BladeSystem c-Class running RedHat. In other words, low cost hardware and record performance. It beats the big boys of ETL on many levels.
Many of the case studies I read on Syncsort’s web site are from companies who can finally afford to get rid of those slow, hand-coded ETL processes. When you reduce extraction time by over 80% in many cases, it gives you the ability to provide business intelligence that’s a lot more current, and that’s a big deal. For a quick, low cost ETL, DMExpress makes perfect sense.
Thursday, February 19, 2009
Syncsort and Trillium Software Partnership
Wednesday, February 11, 2009
Using Data Quality Tools to Look for Bad Guys
Most companies do not want to do business with bad guys - those on the FBI most wanted or international terrorists. Here in Boston, we’re always on the lookout for James “Whitey” Bulger, a notorious mobster who has been on the FBI most wanted list for years. But how do you really know of you’re doing business with bad guys if you don’t pay attention to data quality?
If you work for a financial organization, you may be mandated by your country's government to avoid doing business with the bad guys. The mandates have to do with the lists of terrorists offered by the European Union, Australia, Canada and the United States. For example, in the U.S., the US Treasury Department publishes a list of terrorists and narcotics traffickers. These individuals and companies are called "Specially Designated Nationals" or "SDNs." Their assets are blocked and companies in the U.S. are discouraged from dealing with them by the Office of Foreign Asset Control (OFAC). In the U.K., the Bank of England maintains a separate list but with similar restrictions.
If your company fails to identify and block a bad guy (like Whitey here), there could be real world consequences such as an enforcement action against your bank or company, and negative publicity. On the other hand, many cases may be a "false positive," where the name is similar to a bad guy's name, but the rest of the information provided by the applicant does not match the SDN list. The false positives can make for poor customer relationships.
If you have to chase bad guys in your data, you need to make data quality a prerequisite. Data quality tools can help you both correctly identify foreign nationals on the SDN list and lower the number of false positives. If the data coming into your system is standardized and has all of the required information as mandated by your governance program, matching technologies and more easily and more automatically identify SDNs, and avoid those false positives.
Saturday, January 31, 2009
Improving Communication on Data Governance Teams
If data governance is about enabling people to improve processes, your team should consider some tools to help communication between the people. Particularly if your data governance team is global, communication software can improve efficiency by working through some of the issues of a diverse team. If teams are in different time zones, it will be difficult for you to hold status meetings at a time that's convenient for all. The good news is that there are some fantastic software tools including Web 2.0 tools that can support communications in a data governance team.
I'm sure you've heard of, and used, most of these technologies. But have you considered using them on your data governance project?
Blogs
Blogs are great ways to provide commentary or news on your data governance project. The writer may use text, images, and links to other blogs written by other team members to inform and foster teamwork. A blog allows for one person's perspective on the data governance project, but readers can leave comments and links to their own blogs. Blogs can educate and inform data governance groups, and they can use them to debate unresolved issues or to continue discussions between meetings.
Data governance teams could designate certain team members to blog about the problems they are trying to solve and the projects they are working on. Over time, this type of blog would help keep a record of the processes used - what works and what doesn't. It can also be used to inform data stewards, data governance constituents and other readers about how the company is working to solve data quality issues.
RSS Feeds
The problem with blogs is that you have to revisit them frequently in order to keep up on the latest news. RSS feeds are a great way to push crucial data governance information to the team benefits them by improving communication.
Wikis
Wikis can hold the latest corporate data policies. Wikis can be opened up to the corporation and provide communications across the enterprise.
There are a lot wikis to choose from. Your best bet is to check out the matrix at www.wikimatrix.org
Workflow
Let’s not forget workflow tools. Workflow software is genre of powerful tools for collaboration and should be considered to improve efficiency into your data governance process. With workflow tools, teams can manage the processes and coordination of the data governance team. The processes managed with workflow tools might include any of the following:
- work progress of a person or group
- business approval processes
- challenges of specific data governance technical processes like ETL or data profiling
- financial approval processes
Some examples of workflow tools include Attask, Basecamp, Clarizen, Sharepoint
Friday, January 9, 2009
Starting Your Own Personal Data Quality Crusade
As I talk to people in the industry, many folks comment on their organization's lack of interest when it comes to information quality. People have the tendency to think that responsibility for information quality starts with someone else, not themselves. In truth, we all know that information quality is the responsibility of everyone in the organization, from the call center operators to the sales force to IT and beyond.
So why not start your own personal crusade, your own marketing initiative to drive home the power of information quality? Use the power of the e-mail signature to get your message across.
Use these graphics in your signature file to drive home the important of IQ to your organization.
I may knock out a few more banners this weekend, but if you have your own ideas for a custom "Information Quality" banner, let me know and I'll post it.






Friday, January 2, 2009
Building a More Powerful Data Quality Scorecard
Most data governance practitioners agree that a data quality scorecard is an important tool in any data governance program. It provides comprehensive information about quality of data in a database, and perhaps even more importantly, allows business users and technical users to collaborate on the quality issue.
However, if we show that 7% of all tables have data quality issues, the number is useless - there is no context. You can’t say whether it is good or bad, and you can’t make any decisions based on this information. There is no value associated with the score.
In an effort to improve processes, the data governance teams should roll-up the data into metrics into slightly higher formulations. In their book “Journey to Data Quality”, authors Lee, Pipino, Funk and Wang correctly suggest that making the measurements quantifiable and traceable provide the next level of transparency to the business. The metrics may be rolled up into a completeness rating, for example if your database contains 100,000 name and address postal codes and 3,500 records are incomplete, 3.5% of your postal codes failed and 96.5% pass. Similar simple formulas exist for Accuracy, Correctness, Currency and Relevance, too. However, this first aggregation still doesn’t support data governance, because business users aren’t thinking that way. They have processes that are supported by data and it's still a stretch figuring out why this all matters.
Views of Data Quality Scorecard
Your plan must be to make data quality scorecards for different internal audiences - marketing, IT, c-level, etc.
The aggregation might look something like this:
You must design the scorecards to meet the needs of the interest of the different audiences, from technical through to business and up to executive. At the beginning of a data quality scorecard is information about data quality of individual data records. This is the default information that most profilers will deliver out of the box. As you aggregate scores, the high-level measures of the data quality become more meaningful. In the middle are various score sets allowing your company to analyze and summarize data quality from different perspectives. If you define the objective of a data quality assessment project as calculating these different aggregations, you will have much easier time maturing your data governance program. The business users and c-level will begin to pay attention.
Business users are looking for whether the data supports the business process. They want to know if the data is facilitating compliance with laws. They want to decide whether their programs are “Go”, “Caution” or “Stop” like a traffic light. They want to know whether the current processes are giving them good data so they can change them if necessary. You can only do this by aggregating the information quality results and aligning those results with business.
Tuesday, December 9, 2008
2009 MIT Information Quality Industry Symposium
This time of year, we’re all looking at our budgets and planning for 2009. I’d like to recommend an event that I’ve been participating in for the past several years – the MIT IQ symposium. It’s in my travel budget and I’m looking forward to going to this event again this year.
The symposium is a July event in Boston that is a discussion and exchange of ideas about data quality between practitioners and academicians. The goal is less commercial than you would find at a typical symposium. In the case of this MIT event, it’s more about the mission and philosophy of information quality.
Day one focuses on education, with highly qualified and very interesting speakers teaching you about enterprise architecture, data governance, business intelligence, data warehousing. and data quality. Latest methodologies, frameworks, and best practice cases are the topics. Day two, the sessions deconstruct industry-specific topics. There is a government track, healthcare track and business track. On the last day, a half day, the sessions are more about the future of information quality.
I’ve grown to really enjoy the presentations, information quality theory and hallway chat that you find here. If you have some travel budget, please consider earmarking some of it for this event.
Friday, December 5, 2008
Short Ham Rule and Data Governance
One of my old bosses, a long time IBM VP who was trained in the traditional Big Blue executive training program, used to refer to the “short ham” rule quite often. With my apologies for its lack of political correctness, the story goes something like this:
Sarah is recently married and for the first time decides to cook the Easter ham for her new extended family. Her spouse’s sisters, mother and grandmother are all coming to dinner and as a new bride, she is nervous. As the family arrives, she begins preparing it for dinner.
Sarah’s sister-in-law Debbie helps with the preparation. As Sarah begins to put the ham into the oven, Debbie stops her. “You must cut off the back half of the ham before it goes into the oven.” she says.
Sarah was nervous, but somehow musters the courage to ask a simple question – why? Debbie is shaken for a moment at the nerve of her new sister-in-law. How dare she question the family tradition?
Debbie pauses then says, “Well, I’m not sure. My Mom always does it. Let’s ask her why.”
When asked, Mom also hesitates. “Well, my Mom always cut off that part of the ham. I’m not sure why.”
Finally, the group turns to Grandma, who is sitting in her rocking chair listening to the discussion. By now, the entire party has heard about the outrageous boldness of Sarah. The party turns silent as the elder slowly begins to whisper her answer. “Well, I grew up in the depression and we didn’t have a pan big enough to fit the whole ham. So, we’d cut off part of it and saved it for another meal.”
Three factors in the short ham story caused change. First, Sarah’s courage to take on the project of cooking the ham started the change. Second, Sarah’s willingness to listen and learn the processes of others in the family gave her credibility in the eyes of the family. Finally, Sarah’s question – why – that created change. It was only with audacity that Sarah was able to educate and make the holiday feast more enjoyable.
The same can be said about leading your company toward of data governance. You have to have the courage to take on new projects, understand the business processes, and ask why to become an agent for change in your organization. A leader has to get past resistance and convince others to embrace new ways of doing things.
Building credibility is the key to overcoming the resistance. If you were to sit down and work for a day in the billing center, call center or purchasing agent job, for example, people there will see that you understand them and care about their processes. At the very least, you could invite a business person to lunch to understand their challenges. The hearts and minds of the people can be won if you walk a mile in their shoes.

