tag:blogger.com,1999:blog-68951755144295148122024-03-13T18:46:03.657-04:00Data Governance InsiderCovering the world of big data and data governance.Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.comBlogger130125tag:blogger.com,1999:blog-6895175514429514812.post-3438599196088758612017-04-17T11:36:00.000-04:002017-04-17T11:36:13.130-04:00Avoiding the three common myths of big data<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:DoNotShowInsertionsAndDeletions/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><br />
<!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
DefSemiHidden="false" DefQFormat="false" DefPriority="99"
LatentStyleCount="371">
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 9"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="header"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footer"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index heading"/>
<w:LsdException Locked="false" Priority="35" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of figures"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope return"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="line number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="page number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of authorities"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="macro"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="toa heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 5"/>
<w:LsdException Locked="false" Priority="10" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Closing"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Signature"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="true"
UnhideWhenUsed="true" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Message Header"/>
<w:LsdException Locked="false" Priority="11" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Salutation"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Date"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Block Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Hyperlink"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="FollowedHyperlink"/>
<w:LsdException Locked="false" Priority="22" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Document Map"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Plain Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="E-mail Signature"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Top of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Bottom of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal (Web)"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Acronym"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Cite"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Code"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Definition"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Keyboard"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Preformatted"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Sample"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Typewriter"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Variable"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Table"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation subject"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="No List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Contemporary"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Elegant"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Professional"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Balloon Text"/>
<w:LsdException Locked="false" Priority="39" Name="Table Grid"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Theme"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" QFormat="true"
Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" QFormat="true"
Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" QFormat="true"
Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" QFormat="true"
Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" QFormat="true"
Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" QFormat="true"
Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" SemiHidden="true"
UnhideWhenUsed="true" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="TOC Heading"/>
<w:LsdException Locked="false" Priority="41" Name="Plain Table 1"/>
<w:LsdException Locked="false" Priority="42" Name="Plain Table 2"/>
<w:LsdException Locked="false" Priority="43" Name="Plain Table 3"/>
<w:LsdException Locked="false" Priority="44" Name="Plain Table 4"/>
<w:LsdException Locked="false" Priority="45" Name="Plain Table 5"/>
<w:LsdException Locked="false" Priority="40" Name="Grid Table Light"/>
<w:LsdException Locked="false" Priority="46" Name="Grid Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="Grid Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="Grid Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="46" Name="List Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="List Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="List Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 6"/>
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0in;
line-height:107%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<br />
<div class="MsoNormal">
</div>
<div class="MsoNormal">
There are many common myths when it comes to big data
analytics. Like the lost city of Atlantis and the Bermuda Triangle, they seem
to be ubiquitous in the teachings of big data. Let’s explore three of them.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkFbcJnCqqaP4m9h_SCqvY5L3vXpKGV7w1evAVzazuj_4gRY9C7u6OgM9UJK7H-zpI1H7qKL-pz7id3QYNncEqmUbHjDUILLdsaxDevfGt70HgabPQ4UTWRrA5Q1QnHsBnOtjWTeKQNmo/s1600/iStock-485017745.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="133" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkFbcJnCqqaP4m9h_SCqvY5L3vXpKGV7w1evAVzazuj_4gRY9C7u6OgM9UJK7H-zpI1H7qKL-pz7id3QYNncEqmUbHjDUILLdsaxDevfGt70HgabPQ4UTWRrA5Q1QnHsBnOtjWTeKQNmo/s200/iStock-485017745.jpg" width="200" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Myth One:<span style="mso-spacerun: yes;"> </span>Big Data = Hadoop</b></div>
<div class="MsoNormal">
You often see the discussion about big data transition right
into how to solve any issue on Hadoop.<span style="mso-spacerun: yes;">
</span>However, Hadoop is not the only solution for big data analytics.<span style="mso-spacerun: yes;"> </span>Many of the bigger vendors in the RDBMS space
have been handling very large data sets for years, before the emergence of
Hadoop-based big data. The fact is, Hadoop is not a database and has severe
limitations on 1) the depth of analytics it can perform; 2) How many concurrent
queries it can handle, and; 3) database standards like ACID compliance, SQL
compliance and more.</div>
<div class="MsoNormal">
For example, <a href="http://ww.vertica.com/" target="_blank">Vertica </a>has been handling huge loads of data.<span style="mso-spacerun: yes;"> </span>One customer loads 60 TB per hour into
Vertica and has thousands of users (and applications) running analytics on it.
That is an extreme example of big data, but it is proof that other solutions
can scale to almost any workload. Hadoop is fantastic on the cost front, but is
not the only solution for big data.<span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Myth Two: Databases
are too expensive and incapable of big data</b></div>
<div class="MsoNormal">
I see in the media and in the white papers I read that
relational databases aren’t capable of performing analytics on big data.<span style="mso-spacerun: yes;"> </span>It’s true that <u>some</u> RDBM Systems
cannot handle big data. It’s also true that some legacy databases charge you a
lot and yet don’t seem to be able to scale. However, database companies like
Vertica who have adopted columnar, MPP architectures, greater scalability and a
simplified pricing model will often fit the bill for many companies. </div>
<div class="MsoNormal">
These systems are not perceived has cost effective
solutions. The truth is that you’re paying for a staff of engineers who can
debug and build stronger, better products.<span style="mso-spacerun: yes;">
</span>Although open source is easy to adopt and easy to test, most companies I
see invest more in engineering to support the open source solutions. You can
pay licensing costs or you can pay engineers, but either way, there is cost. </div>
<div class="MsoNormal">
One of the biggest benefits of open source is that it has
driven down the cost of all analytical platforms, so some of the new platforms
like Vertica have much lower costs than your legacy data warehouse technology.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Myth Three: Big Data
= NoSQL or SPARK</b></div>
<div class="MsoNormal">
Again, I see other new technologies being described as the champion
of big data. The truth is that the use case for Hadoop, NoSQL and Spark are all
slightly different.<span style="mso-spacerun: yes;"> </span>These nuances are
crucial when deciding how to architect your big data platform. </div>
<div class="MsoNormal">
NoSQL is best when you don’t want to spend the time putting
structure on the data.<span style="mso-spacerun: yes;"> </span>You can load data
into NoSQL databases with less attention to structure and analyze it.<span style="mso-spacerun: yes;"> </span>However, it’s the optimizations and the way
that data is stored that make it capable of big data analytics at the petabyte
scale, so don’t expect this solution to scale. Spark is great for fast
analytics in memory and particularly operational analytics, but it’s also hard
to scale if you need to keep all of the data in-memory in order to run
fast.<span style="mso-spacerun: yes;"> </span>It gets expensive to have this
hardware.<span style="mso-spacerun: yes;"> </span>Most successful architectures
that I’ve seen use Spark for fast running queries on data streams, then they
hand the data off to other solutions for the deep analysis. <span style="mso-spacerun: yes;"> </span>Vertica and other solutions are really best
for deep analysis of a lot of data and potentially a lot of concurrent
users.<span style="mso-spacerun: yes;"> </span>Analytical systems need to
support things like mixed workload management so that if you have concurrent
users and a whopper of a query comes in, it won’t eat up all the resources and
drag down the shorter queries. Analytical systems need optimizations for disk
access, since you can’t always load petabytes into memory. This is the domain
of Vertica.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Today’s Modern
Architecture</b></div>
<div class="MsoNormal">
In today’s modern architecture, you may have to rely on a
multitude of solutions to solve your big data challenges.<span style="mso-spacerun: yes;"> </span>If you have only a couple of terabytes,
almost any of the solutions mentioned will do the trick.<span style="mso-spacerun: yes;"> </span>However, if you eventually want to scale into
the tens or hundreds of terabytes (or more), using one solution for a varied
analytical workload will start to show signs of strain. It’s then that you need
to explore a hybrid solution and use the right tool for the right job.</div>
<div class="MsoNormal">
<br /></div>
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-86098548755640848452016-11-03T10:48:00.003-04:002016-11-03T10:48:54.138-04:00MPP Analytical Database vs SQL on HadoopUsers find lower licensing costs when storing data in Hadoop—although they often do pay for subscriptions. Storing data efficiently in a cluster of nodes is the table stakes for data management today. However, it’s important to remember what happens next. The next step is often about performing analytics on the data as it sits in the Hadoop cluster. When it comes to this, our internal benchmarking testing reveals limitations of the Apache Hadoop platform.<br />
<br />
Since I work there, I recently got some metrics from a team of Vertica engineers who set up a 5-node cluster of Hewlett Packard Enterprise DL380 ProLiant servers. They created 3 TBs of data in ORC, Parquet, and our own ROS format. Then, they put the TPC-DS benchmarks to the test with Vertica, Impala, Hive on Tez, and even Apache Spark. They took a look at CDH 5.7.1 and Impala 2.5.0 and HDP 2.4.2 Hawq 2.0 in comparison to <a href="http://hpsw.co/cVPTRc9" target="_blank">Vertica</a>.<br />
<br />
<b>Performing Complex Analytics</b><br />
They first took note of whether all the benchmarks would run. This becomes important when you’re thinking about the analytical workload. Do you plan to perform any complex analytics? In these benchmarks, Vertica completed 100% of the TPC-DS benchmarks while all others could not.<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwb4GK0TrUzl-X0amXn5YYKoTXJMCep9w5pJe2kB_BI90TeK5LodJYeGjn8VdvJYmHEg4PKFVqhKqMS5l_8MRhuXjIeh_tMVZaOZ-sIKkN1OGIKp0KTKEbSLFS9j5Fua8Yh48dNOH8Inw/s1600/PassFail.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="177" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwb4GK0TrUzl-X0amXn5YYKoTXJMCep9w5pJe2kB_BI90TeK5LodJYeGjn8VdvJYmHEg4PKFVqhKqMS5l_8MRhuXjIeh_tMVZaOZ-sIKkN1OGIKp0KTKEbSLFS9j5Fua8Yh48dNOH8Inw/s400/PassFail.png" width="400" /></a><br />
<br />
<br />
For example, if you want to perform time series analytics and the queries are not available, how much will it cost you to engineer a solution? How many lines of code will you have to write and maintain to accomplish the desired analytics. Hadoop-based solutions do not often have out-of-the-box geo-spatial, pattern matching, machine learning, and data preparation – these types of analytics are not part of the benchmark.<br />
<br />
<b>Achieving Top Speed</b><br />
Let’s assume that you don’t need to run all of the TPC-DS queries or that you can spend the time and resources to modify them. In our testing, I compared the performance metrics on just the queries that would run, Hadoop-based solutions were not comparable in performance either. For example, the numbers for Impala were as follows:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnuxQ28Y6oz8YinkkoQOWRedhBGEeqzSgjWCG0gp6sj6PkQrpkVYAPCWoKgavD33gzuCFzwuXfludgeNtCH9p_dExdzw_tNbsqAc9BJvbb3rfIqFUtnhEM1FvUDE9oNleUVBBr__rlz_Y/s1600/Impala.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnuxQ28Y6oz8YinkkoQOWRedhBGEeqzSgjWCG0gp6sj6PkQrpkVYAPCWoKgavD33gzuCFzwuXfludgeNtCH9p_dExdzw_tNbsqAc9BJvbb3rfIqFUtnhEM1FvUDE9oNleUVBBr__rlz_Y/s400/Impala.png" width="400" /></a></div>
<br />
<br />
This was actually the closest result. On the 59 queries that Hive on Tez could complete, it took about 21 hours to Vertica’s 90 minutes. Apache Spark took 11 hours to complete 29 queries while Vertica took 25 minutes.<br />
<br />
<b>Handling Concurrent Queries</b><br />
In the metrics I received, I also found that Hadoop-based solutions had limitations on the number of concurrent queries they can run before the query fails. In the tests, the engineers continually and simultaneously ran 1 long, 2 medium, and 5 short-running queries. In most cases, the Hadoop-based solutions choked on the long query and sped through the short queries in a reasonable time. Vertica completed all queries, every time.<br />
<br />
<b>An Analytical Database is the Right Way to Perform Big Data Analytics</b><br />
Although they are an inexpensive way to store data, Hadoop-based solutions are no match for columnar analytical databases like Vertica for big data analytics.<br />
<br />
Hadoop-based solutions cannot:<br />
<ul>
<li>Perform at the same level of the ANSI SQL analytics, often failing on the TPC-DS benchmark queries</li>
<li>Deliver analytics as fast, sometimes significantly slower than a column store</li>
<li>Offer the concurrency of an analytical database for a combination of long, medium and short running queries</li>
</ul>
<br />
Hadoop is a good platform for low-cost storage for big data and data transformation. It delivers some level of analytics for a small number of users or data scientists. But if you need to provide your organization with robust advanced analytics for hundreds or thousands of concurrent users and achieve excellent performance, a big data analytics database is the best solution.<br />
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-86740447544614577752016-02-23T15:52:00.001-05:002016-02-23T15:53:34.805-05:00Why you may need yet another database - operational vs analytical systems<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><br />
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:DoNotShowInsertionsAndDeletions/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
DefSemiHidden="false" DefQFormat="false" DefPriority="99"
LatentStyleCount="371">
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 9"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="header"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footer"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index heading"/>
<w:LsdException Locked="false" Priority="35" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of figures"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope return"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="line number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="page number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of authorities"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="macro"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="toa heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 5"/>
<w:LsdException Locked="false" Priority="10" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Closing"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Signature"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="true"
UnhideWhenUsed="true" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Message Header"/>
<w:LsdException Locked="false" Priority="11" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Salutation"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Date"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Block Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Hyperlink"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="FollowedHyperlink"/>
<w:LsdException Locked="false" Priority="22" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Document Map"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Plain Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="E-mail Signature"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Top of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Bottom of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal (Web)"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Acronym"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Cite"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Code"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Definition"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Keyboard"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Preformatted"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Sample"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Typewriter"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Variable"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Table"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation subject"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="No List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Contemporary"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Elegant"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Professional"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Balloon Text"/>
<w:LsdException Locked="false" Priority="39" Name="Table Grid"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Theme"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" QFormat="true"
Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" QFormat="true"
Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" QFormat="true"
Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" QFormat="true"
Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" QFormat="true"
Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" QFormat="true"
Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" SemiHidden="true"
UnhideWhenUsed="true" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="TOC Heading"/>
<w:LsdException Locked="false" Priority="41" Name="Plain Table 1"/>
<w:LsdException Locked="false" Priority="42" Name="Plain Table 2"/>
<w:LsdException Locked="false" Priority="43" Name="Plain Table 3"/>
<w:LsdException Locked="false" Priority="44" Name="Plain Table 4"/>
<w:LsdException Locked="false" Priority="45" Name="Plain Table 5"/>
<w:LsdException Locked="false" Priority="40" Name="Grid Table Light"/>
<w:LsdException Locked="false" Priority="46" Name="Grid Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="Grid Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="Grid Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="46" Name="List Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="List Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="List Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 6"/>
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0in;
line-height:107%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<br />
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<span style="mso-bidi-font-weight: bold;">If your company has
long been say, an Oracle shop, yet you’ve got a purchase order in your hand for
yet another database, you may be wondering why, just why you need another
database? </span></div>
<div class="MsoNormal">
<span style="mso-bidi-font-weight: bold;">Let’s face it, when
it comes to performing analytics on big data, there are major structural
differences in the ways that databases work.<span style="mso-spacerun: yes;">
</span>Your project team is asking you for technology that is best suited for
the problem at hand. <span style="mso-spacerun: yes;"> </span>You need to know
that databases tend specialize and offer different characteristics and benefits
to an organization.</span></div>
<div class="MsoNormal">
Let’s start exploring this concept by considering a
challenge where multiple analytical environments are needed to solve a problem.<span style="mso-spacerun: yes;"> </span>For example, consider a security analytics
application where a company wants to both a) look at the live stream of web and
application logs and be aware immediately of unusual activity in order to
thwart an attack, and; b) perform forensic analysis of say, three months of log
data, to determine vulnerabilities and understand completely what has happened
in the past.<span style="mso-spacerun: yes;"> </span>You need be able to look
quickly at both the stream and the data lake for answers.</div>
<div class="MsoNormal">
Unfortunately, no solution on the market offers the ultimate
solution for doing both of these tasks, particularly if we need to accomplish
the tasks with huge volumes of data. Be suspicious of any vendor who claims to
specialize in both because the very underpinnings of the database are usually
formulated with one or the other (or something completely different) in
mind.<span style="mso-spacerun: yes;"> </span>Either you use a ton of memory and
cache for quick storage and in-memory analytics, or you optimize the data as
it’s stored to enhance the performance of long-running queries.</div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH46JDK5V0THvrP7_sp7s4s6cPDXGOtYaI1SmLmyGVes98k-vARueC_0Aw8unMHEBa9WbBBjXqjFK9QNeCxwhc72isoavI7QgHrPoqw0WlLQvuneHwG6iOdAJcSl34fNeR-HnlOtKKpdY/s1600/analytics.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="145" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH46JDK5V0THvrP7_sp7s4s6cPDXGOtYaI1SmLmyGVes98k-vARueC_0Aw8unMHEBa9WbBBjXqjFK9QNeCxwhc72isoavI7QgHrPoqw0WlLQvuneHwG6iOdAJcSl34fNeR-HnlOtKKpdY/s400/analytics.png" width="400" /></a></div>
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">A Database is Not Just
a Database</b></div>
<div class="MsoNormal">
<a href="https://www.blogger.com/null" name="OLE_LINK2"></a><span style="mso-bookmark: OLE_LINK2;">Two common types of databases used in the above
scenario are operational and analytical. In operational systems, the goal is to
ingest data quickly with minimal transformations. The analytics that are
performed often look more at the stream on data, looking for outliers or
interruptions in normal operations. You may hear these referred to as
“small queries” because they tend to look at smaller amounts of data and ask
more simple questions. </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: OLE_LINK1;"><span style="mso-bookmark: OLE_LINK2;">On the other hand, analytical databases are more likely tied to the
questions that the business wants to answer from the data. To more quickly
answer questions like “how many widgets did we sell last year by region”, data
is modeled to answer in the quickest way possible. These are often where long
queries are executed, queries that involve JOINs with lots of data. Highly
scalable databases are often the best solution here, since it’s always best to
scale up with more hardware, give access to information consumers and
democratize the analytics.<span style="mso-spacerun: yes;"> </span>Columnar
databases like Vertica fit the bill very well for analytics because they do
just that – preconfigure the data for fast analytics at petabyte scale. </span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: OLE_LINK1;"><span style="mso-bookmark: OLE_LINK2;"><span style="mso-no-proof: yes;"></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: OLE_LINK1;"><span style="mso-bookmark: OLE_LINK2;"><b style="mso-bidi-font-weight: normal;">Enter the Messaging Bus</b></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: OLE_LINK1;"><span style="mso-bookmark: OLE_LINK2;">If you agree that sometimes we need a nimble analytical database to
fly through our small queries and a full-fledged MPP system to do our heavy
lifting, then how do we reconcile data between the systems? In the past,
practitioners would write custom code to have the systems share data, but the
complexity of doing this, given that data models and applications are always
changing, is high. An easier approach in the recent past was to create data
integration (ETL) jobs.<span style="mso-spacerun: yes;"> </span>The ETL would
help manage the metadata, data models and any change in the applications.<span style="mso-spacerun: yes;"> </span></span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: OLE_LINK1;"><span style="mso-bookmark: OLE_LINK2;">Today, the choice is often a messaging bus. Apache Kafka is often
used to take on this task because it’s fast and scalable. It uses a publish-subscribe
messaging system to share data with any application that subscribes to it. Having
one standard for data sharing makes sense for both the users and software
developers.<span style="mso-spacerun: yes;"> </span>If you want to make a new
database part of your ecosystem, sharing data is simplified if it supports
Kafka or another messaging bus technology.</span></span></div>
<span style="mso-bookmark: OLE_LINK2;"></span><span style="mso-bookmark: OLE_LINK1;"></span>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Who is doing this today?</b></div>
<div class="MsoNormal">
As I mentioned earlier, for many companies, the solution is
to have both analytical and operational solutions. With today’s big data
workloads, companies like Playtika, for example, have implemented Kakfa and
Spark to handle operational data and columnar for in-depth analytics. You
can read more about Playtika’s story <a href="http://vertica.tips/2015/08/31/playtika-is-winning-at-streaming-data-transformations/" target="_blank">here</a>.<span style="mso-bookmark: OLE_LINK3;"><span style="mso-bookmark: OLE_LINK4;"> </span></span>These
architectures may be more complex, but have a huge benefit of being able to
handle just about any workload thrown at it. They can handle the volume
and veracity of data while maximizing the value it can bring to the
organization.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">That’s not all</b></div>
<div class="MsoNormal">
There are other specialists in the database world.<span style="mso-spacerun: yes;"> </span>For example, Graph databases apply graph
theory to the storage of information about the relationships between entries. Think
about social media where understanding the relationships between people is the
goal, or recommendation engines that link the buyers’ affinity to purchase an
item based in their history. Relationship queries in your standard database can
be slow and unpredictable. Graph databases are designed specifically for this
sort of thing. More about that topic can be found in <a href="http://www.odbms.org/2015/06/uplevel-big-data-analytics-with-hp-vertica-part-1-graph-in-a-relational-database-seriously/" target="_blank">Walt Maguire’s excellentblog posts</a>. </div>
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-7788751056154735032016-02-01T12:45:00.001-05:002016-02-01T12:45:30.069-05:00The Format War for Hadoop Structured Data<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:RelyOnVML/>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><br />
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:DoNotShowInsertionsAndDeletions/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
DefSemiHidden="false" DefQFormat="false" DefPriority="99"
LatentStyleCount="371">
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 9"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="header"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footer"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index heading"/>
<w:LsdException Locked="false" Priority="35" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of figures"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope return"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="line number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="page number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of authorities"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="macro"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="toa heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 5"/>
<w:LsdException Locked="false" Priority="10" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Closing"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Signature"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="true"
UnhideWhenUsed="true" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Message Header"/>
<w:LsdException Locked="false" Priority="11" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Salutation"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Date"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Block Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Hyperlink"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="FollowedHyperlink"/>
<w:LsdException Locked="false" Priority="22" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Document Map"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Plain Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="E-mail Signature"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Top of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Bottom of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal (Web)"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Acronym"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Cite"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Code"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Definition"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Keyboard"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Preformatted"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Sample"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Typewriter"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Variable"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Table"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation subject"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="No List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Contemporary"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Elegant"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Professional"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Balloon Text"/>
<w:LsdException Locked="false" Priority="39" Name="Table Grid"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Theme"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" QFormat="true"
Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" QFormat="true"
Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" QFormat="true"
Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" QFormat="true"
Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" QFormat="true"
Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" QFormat="true"
Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" SemiHidden="true"
UnhideWhenUsed="true" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="TOC Heading"/>
<w:LsdException Locked="false" Priority="41" Name="Plain Table 1"/>
<w:LsdException Locked="false" Priority="42" Name="Plain Table 2"/>
<w:LsdException Locked="false" Priority="43" Name="Plain Table 3"/>
<w:LsdException Locked="false" Priority="44" Name="Plain Table 4"/>
<w:LsdException Locked="false" Priority="45" Name="Plain Table 5"/>
<w:LsdException Locked="false" Priority="40" Name="Grid Table Light"/>
<w:LsdException Locked="false" Priority="46" Name="Grid Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="Grid Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="Grid Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="46" Name="List Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="List Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="List Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 6"/>
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0in;
line-height:107%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><br />
<div class="MsoNormal">
A war is raging that pits Hadoop distribution vendors
against each other in determining exactly how to store structured big data. The
battle is between the ORC file format, spearheaded by Hortonworks, and the
Parquet file format, promoted by Cloudera. </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlt1Auj9Ljc4bsn6v29yKBABnt2-RQtZ82-ze-VRZr77X8CPvO4-wXw75WQcKhPQRD3EyE3k9x60jwoaS6OikUP8-a_lkSjaWjODdxJBzKjhybNmU1yFJ4oKN_tSYch6g3gWfL6hOjdJQ/s1600/Battle.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlt1Auj9Ljc4bsn6v29yKBABnt2-RQtZ82-ze-VRZr77X8CPvO4-wXw75WQcKhPQRD3EyE3k9x60jwoaS6OikUP8-a_lkSjaWjODdxJBzKjhybNmU1yFJ4oKN_tSYch6g3gWfL6hOjdJQ/s320/Battle.jpg" width="320" /></a></div>
<div class="MsoNormal">
ORC and Parquet are separate Apache projects with the
similar goal of providing very fast analytics. To achieve performance, the formats
have similar characteristics in that they both store data in columns rather
than rows. This enables a majority of analytics to run faster than if the data
was stored in rows or some semi-structured format. They also both support
compression; when you store data in columns, it tends to compress very
efficiently.<span style="mso-spacerun: yes;"> </span>It’s easier to compress a
column of dates, for example, than it is to compress mixed numbers, dates and
strings. Compression saves you intensive disk access, a common bottleneck for
analytics.</div>
<div class="MsoNormal">
If you’re part of the HPE <a href="http://www8.hp.com/us/en/software-solutions/sql-hadoop-big-data-analytics/" target="_blank">Vertica</a>
community, the goals of ORC and Parquet may sound familiar.<span style="mso-spacerun: yes;"> </span>Columnar databases, including Vertica, have
had columnar formats as part of the core product since the beginning. Before
ORC and Parquet were in incubation, Vertica developed the ROS format for
columnar, compressed big data storage.<span style="mso-spacerun: yes;">
</span>Over the years, we have tuned and enhanced the format by adding a large
number of compression algorithms designed to make the data storage and
retrieval very efficient.<span style="mso-spacerun: yes;"> </span>We’ve thought
through features like backup and restore. After all, with a columnar store
database, the concept of incremental backup/restore changes quite a bit.<span style="mso-spacerun: yes;"> </span>We’ve had time to think through security,
encryption and a long list of challenges when managing data in columnar format.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Orc vs Parquet – War,
what is it good for?</b></div>
<div class="MsoNormal">
Which format is better? Hortonworks has argued that ORC is
ahead of Parquet in its capabilities to do predicate pushdown.<span style="mso-spacerun: yes;"> </span>In layman terms, this claim is about
performing analytics closer to where the data sits rather than spurring on
excess network traffic. Cloudera has argued for Parquet in its efficient C++
code base.<span style="mso-spacerun: yes;"> </span>It also argues that ORC data
containers are primarily described with HIVE, while Parquet’s data containers can
be described using HIVE, Thrift and AVRO.<span style="mso-spacerun: yes;">
</span>The important thing to remember is that if you have chosen Hortonworks
as your Hadoop distribution, it may be a little tricky to perform analytics on
Parquet.<span style="mso-spacerun: yes;"> </span>Accessing ORC files from
Cloudera might also be a challenge.</div>
<div class="MsoNormal">
At HPE, our goal is to seamlessly support ORC, Parquet and
ROS as part of the Vertica analytics platform. Vertica has developed an ORC
reader, in collaboration with Hortonworks, to be super-efficient at performing
analytics on ORC files.<span style="mso-spacerun: yes;"> </span><b style="mso-bidi-font-weight: normal;">Just this week we also announced certification
of Vertica on the CDH 5 platform</b> and we have connectors into Parquet via
our HDFS connector.<span style="mso-spacerun: yes;"> </span>We’re also working
with Cloudera to continuously optimize our Parquet file access. The goal is to
read, write and federate multiple formats to minimize unnecessary data movement
and transformations. For the information workers who need to run analytics, it
shouldn’t matter where the data sits or in what format.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">On the Horizon – Kudu</b></div>
<div class="MsoNormal">
The aforementioned file formats are tied to analytical use
cases.<span style="mso-spacerun: yes;"> </span>In other words, if you have
petabytes of data in your data lake and you need to crunch through it in short
order, ORC, Parquet and ROS are valuable.<span style="mso-spacerun: yes;">
</span>However, Cloudera recently announce a new data structure and project
called Kudu (<a href="https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/">link</a>)
that also addresses the needs of an operational analytics use case – one where
you need to small queries on the smaller data sets, particularly as they are
ingested into the data lake. It’s still in incubation, but if the vision is
realized, it will mean better efficiency and easier implementation for
companies who need to do both analytical and operational systems.<span style="mso-spacerun: yes;"> </span>We’ll explore this and its tie to Kafka and
Spark in my next post.</div>
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-55778717169748942202016-01-14T12:51:00.000-05:002016-01-14T12:52:19.263-05:00What’s in store for Big Data Analytics in 2016<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiugdgIzPI-aLlhcGNEFvIK04jCCp-MY9iNuRWYLF7n4rZqW49caogvGyDn3dPgKIWloZ3fkd2Gpu2AzpXGJEQOi3d3ClmR0YFQ7jpR7D4CzQikcK0CsAFs1SmOlXtUeKDcfuL_AC8ovmk/s1600/original.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiugdgIzPI-aLlhcGNEFvIK04jCCp-MY9iNuRWYLF7n4rZqW49caogvGyDn3dPgKIWloZ3fkd2Gpu2AzpXGJEQOi3d3ClmR0YFQ7jpR7D4CzQikcK0CsAFs1SmOlXtUeKDcfuL_AC8ovmk/s200/original.jpg" width="200" /></a></div>
It’s the time of year again for predictions on all sorts of topics. Worthy, solid predictions are often based on the past and present trends and then projecting those trends into the coming year. Since I spend a lot of time studying trends of big data and analytics, I’m going to offer my predictions for the upcoming year.<br />
<br />
<b>Big Data will Triumph over Global Troubles</b><br />
While there were awesome use cases, big data in 2015 was still somewhat a science experiment. This year there is hope for major breakthroughs in solving some of the world’s most challenging problems with big data. Organizations are already doing amazing things, but we’re just scratching the surface of what we can accomplish with big data. I’ve had several conversations with clients who are looking to map the human genome and tackle problems like cancer, Alzheimer’s disease and more by mapping the genes linked to them. I believe there are eminent breakthroughs here credited to our ability to handle huge data volume and perform faster and faster analytics improves.<br />
<br />
But that’s not all. People are using big data science for transportation research, making planes, trains and automobiles smarter and more efficient. Non-profits are using big data to drive decisions about conservation and ecology with big data. We have a real opportunity this year to make the world a better place with big data. Data is the new currency in scientific breakthroughs. The capability we now have to crunch through it with our algorithms is the disruptor.<br />
<br />
<b>Algorithms will be the New Edge</b><br />
2016 is sure to be a year for using algorithms, specifically predictive analytics, to boost company revenue. Analysts like Gartner predict that differentiated algorithms alone will help corporations achieve a boost of 5% to 10% in revenue in the near future. Algorithms will make the best use of huge volume of customer-generated data that we get from our phones, devices and the internet of things to formulate more helpful, targeted offers for prospects and customers. New, younger companies will leverage predictive analytics to disrupt their markets and potentially unseat the established leaders. Predictive analytics can serve to update power delivery and consumption, medical research and treatment, and other lofty human problems, in addition to generating new revenue.<br />
<br />
It’s difficult to see whether the algorithms themselves will be an emerging market, as some analysts say, or whether we will share most of our algorithms in our communities of data scientists. I think society will benefit more from an open source approach here, and the young minds who develop the algorithms will probably be more willing to take an open approach. Think about it, if you could predict Alzheimer’s disease with your algorithm, wouldn’t you want to share it with the world?<br />
<br />
<b>Hybrid Architectures will Rule in 2016</b><br />
Companies are adopting a strategy where they use the right tool for the right job when it comes to big data analytics. This means that daily analytics and proprietary data is analyzed on-premise in ever growing data warehouse data volumes. Small, short-lived projects are often deployed on the cloud, and Hadoop is often used to keep costs low on data that is important, or data that needs to be farmed for mission-critical information. Finally, technologies like Spark are in their infancy to help with real-time, operational analytics.<br />
<br />
It will be up to the vendors and open source community to provide some consistency across these different deployment strategies. Information workers really won’t care where it is running, just that they can use their favorite visualization tools, SQL, R and Python. Sometimes these workloads run in their own environment, but vendors can help reduce the work involved if, for example, you want to move your cloud project to on-premise. By offering a consistent SQL, for example, across these deployment architectures, you can avoid the headaches of a hybrid environment.<br />
<b><br /></b>
<b>Open Source will Attain New Maturity</b><br />
I’ve written many times about the hype around Hadoop and the maturity of the Hadoop platform by comparison to commercially available software. Let’s face it, many open source solutions for big data analytics were somewhat immature in 2015. As I mentioned in my last post, it’s a matter of taking software that is extremely useful and spending a few years to overcome shortcomings and build out a complete platform for big data analytics. This year, the Hadoop community will build it out to be a more complete platform. My prediction is that we’ll see greater maturity in 2016. With greater maturity will come wider adoption.<br />
<br />
That said, I have observed that the open source community tends to focus on the start and not the finish. For example, over the past few years, SQL users have heard about many flavors of SQL on Hadoop. Spark seems to be the latest and coolest new project offering SQL analytics on big data and it show great promise. However, the shift seems to be toward new projects and away from making the legacy projects work better.<br />
<b><br /></b>
<b>Hewlett Packard Enterprise Role</b><br />
I was inspired to write these predictions by a webinar that I attended in which some of the executives of Hewlett Packard Enterprise and influencers gave their vision of 2016. For more information, watch the replay video <a href="http://hpsw.co/JDuuL6O" target="_blank">here</a>. Hewlett Packard Enterprise (HPE) has a role to play in making these predictions come true. HPE’s vision starts with the understanding that data fuels the new style of business driving the idea economy. Data will distinguish disruptors from the disrupted. Big data promises new customers, better experiences and new revenue streams. But all opportunities come with challenges. The recipe for success is continuously iterating on what questions to ask, which data to analyze and how to use the insights at all levels of your organization.<br />
<br /><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-82576994522930306582013-11-10T19:24:00.002-05:002013-11-10T19:26:04.679-05:00Big Data is Not Just Hadoop<h4>
Hybrid Solutions will Solve our Big Data Problems for Years to Come<span style="font-weight: normal;"> </span></h4>
<span style="font-weight: normal;">When I talk to the people on the front line of big data, I notice that the most common use case of big data is to provide visualization and analytics across the types of data and volumes of data we have in the modern world. For many, it’s an expansion of the power of the data warehouse that deals with the new data bloated world in which we live.</span><br />
<br />
<span style="font-weight: normal;"> Today, you have bigger volumes, more sources and you are being asked to turn around analytics even faster than before. Overnight runs are still in use, but real-time analytics are becoming more and more expected by our business users. </span><br />
<br />
<span style="font-weight: normal;">To deal with the new volumes of data, the yellow elephant craze is in full swing and many companies are looking for ways to use Hadoop to store and process big data. Last week at Strata/Hadoop World, many of the keynote speeches talked about the fact that there are really no limits to Hadoop. I agree. However, in data governance, you must consider not only the technical solutions, but also the processes and people in your organization, and you must fit the solutions to the people and process. </span><br />
<br />
As powerful as Hadoop is, there still is a skill shortage of <a href="http://en.wikipedia.org/wiki/Map/reduce" target="_blank">Map/Reduce</a> coders and <a href="http://pig.apache.org/" target="_blank">Pig scripters</a>. There are still talented analytics professionals who aren't experts <a href="http://en.wikipedia.org/wiki/R_%28programming_language%29" target="_blank">in R</a> yet. This shortage will be with us for decades as a new generation of IT workers are trained in Hadoop.<br />
<br />
This is in part why so many Hadoop distributions are in the process of putting SQL on Hadoop. This is also why many traditional analytics vendors are adding Hadoop and ways to access the Hadoop cluster from their SQL-based applications. The two worlds are colliding and it's very good for world of analytics.<br />
<br />
I’ve blogged about the <a href="http://data-governance.blogspot.com/2012/01/big-data-enterprise-data-and-discrete.html">cost of big data solutions</a>, traditional enterprise solutions and how the differ. In short, you tend to spend money on licenses when you have an old school analytics solution, while your money goes to expertise and training if you adopt a Hadoop-centric approach. But even this line is getting blurry with SQL-based solutions opening up their queries to Hadoop storage. Analytical databases can deliver fast big data analytics with access to Hadoop, as well as compression and columnar storage when the data is stored within. You don’t even need open source to have a term license model today. They are available more and more in other data storage solutions, as are pay-per-use models that charge per terabyte.<br />
<br />
If you have a big data problem that needs to be solved, don’t jump right on the Hadoop bandwagon. Consider the impact that big data will have on your solutions and on your teams and take a long look at the new generation of columnar data storage and SQL-centric analytical platforms to get the job done.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com2tag:blogger.com,1999:blog-6895175514429514812.post-44508902602658649202013-01-20T21:00:00.000-05:002013-01-20T21:00:01.781-05:00Top Four Reasons Why Financial Services Companies Need Solid Data Governance<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFPgpmT3WchKdEJJf-wNQlJN3I-LhWps7Ogzg8GhJyook0icM0OoEPcCpJAUFqVvvrCa9P7dUeDwQxvHaS84MXmAN6EJ_wG42tyj4o4C8ixit7O5tKrKUZLIEP7dRbBjORvIdQ7S0lahc/s1600/Banks.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="131" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFPgpmT3WchKdEJJf-wNQlJN3I-LhWps7Ogzg8GhJyook0icM0OoEPcCpJAUFqVvvrCa9P7dUeDwQxvHaS84MXmAN6EJ_wG42tyj4o4C8ixit7O5tKrKUZLIEP7dRbBjORvIdQ7S0lahc/s200/Banks.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Image licensed from iStockPhoto</td></tr>
</tbody></table>
In working with clients in the financial services business, I’ve noticed that there is a common set of reasons why they adopt data governance. When it comes down to proving value of data management, it’s all about revenue, efficiency and compliance. <br />
<b><br />Number One - Accurate Risk Assessment</b><br />
Based on new regulations like Sarbanes and Dodd-Frank, a financial services company's risk and assurance teams are often asked to determine the amount regulatory capital reserves when building credit risk models. A crucial part of this function is understanding how the underlying data has the on the accuracy of the calculations. Teams must be able to attest to the quality of the data by having in place the appropriate monitoring, controls, and alerts. They must provide regulators with information they can believe in.<br />
<br />
Data champions in this field must be able to draw the link between the regulations and data. They must assess the alignment of data and processes that support your models, quantify the impact of poor data quality on your regulatory capital calculations, and put into place monitoring and governance to manage this data over time. <br />
<b><br />Number Two – Process Efficiency</b><br />
If your team is spending a lot of time checking and rechecking your reports, it can be quite inefficient. When a report generated conflicts with another report, it may bring some doubt to the validity of all reports. There is likely a data quality issue is behind it. The problem manifests itself as a huge time-suck on monthly and quarterly closes. Data champions must point to this inefficiency in order to put in place a solid data management strategy. <br />
<b><br />Number Three - Anti-money Laundering</b><br />
Financial Services companies need to be vigilant about money laundering. To do this, some look for currency transactions designed to evade current reporting requirements. If a client is making five deposits of $3,000 each in a single day, for example, it may be an attempt to keep under the radar on reporting. Data quality must help identify these transactions, even if the client is making deposits from different branches, using different deposit mechanisms (ATM or Customer Service Rep.) and even when they are using slight variation on their name.<br />
<br />
Other systems monitor wire transfers to look for countries or individuals that appear on a list compiled by Treasury’s Office of Foreign Assets Control (OFAC). Being able to successfully match your clients against the OFAC list using fuzzy matching is crucial to success.<br />
<br />
<b>Number Four – Revenue</b><br />
Despite all of the regulations and reporting that banks must attend to, there is still obligation to stockholders to make money while providing excellent service to the customers. Revenue hinges upon a consistent, current and relevant view of clients across all of the bank’s products. Poor data management creates significant hidden cost and can hinder your ability to recognize and understand opportunity – where you can up-sell and cross-sell your customers. Data champions and data scientists must work with the marketing teams to identify and tackle the issues here. Knowing when and how to ask the customer for new business can lead to significant growth.<br />
<br />
These are just some examples that are very common to financial services. In my experience, most financial services companies have all of these issues to some degree, but tackle them with an agile approach, taking a small portion of one of these problems and solving it little by little. Along the way, they follow the value brought and the value potential if more investment is made.<br />
<br /><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com5tag:blogger.com,1999:blog-6895175514429514812.post-1841216373702159742013-01-06T17:01:00.001-05:002013-01-07T15:30:13.214-05:00Big Data After the Hype<h4>
Total Data Management</h4>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUSdz2f_gP842vfUHp-e_dy4zV82L2dBpo1OJszcIj_sWaB9HeAiz9jGtMAaJdZFNq8YLfPlIr7PfAarm3_HAIKADiAUR7_Wt2MtHxOC0bwXBx0SEjRSAuGCytt_RC11blyekrwl8F7vs/s1600/Checklistblu.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUSdz2f_gP842vfUHp-e_dy4zV82L2dBpo1OJszcIj_sWaB9HeAiz9jGtMAaJdZFNq8YLfPlIr7PfAarm3_HAIKADiAUR7_Wt2MtHxOC0bwXBx0SEjRSAuGCytt_RC11blyekrwl8F7vs/s200/Checklistblu.jpg" width="200" /></a></div>
This year, I’ve been following the meteoric rise of big data. It has been a boon for vendors who are venturing into this area. It has produced countless start-ups and much buzz in the data management world.<br />
<br />
However, when it comes down to it, what we’re really talking about here is data management and data governance. Whether you have to deal with big data, enterprise data or spread-marts, data needs to be managed no matter what size. The tides are turning for a total data management approach. <a href="http://www.talend.com/resources/whitepapers/how-big-is-big-data-adoption" target="_blank">Recent surveys</a> shows that despite the market hype, most technologists and business users feel that big data is an off-shoot of data management, not a branch of technology in itself. <br />
<br />
So, why the hype? I'm convinced it is mostly vendor-generated. In 2010, when big data began to gain notoriety, there was a disconnect for some vendors. While partnered with traditional enterprise data management companies like the Oracle and IBM’s of this world, not all vendors were prepared for the growing popularity of open source and Hadoop. Others were (and still are) better positioned. They began talking about big data as a product differentiator. Vendors who don’t have the basic architecture for managing data in Hadoop have been and will continue to struggle. <br />
<br />
For example, ETL tools that have a basic connection to move data in and out of Hbase, Hortonworks and Cloudera can’t stop there. The power of Hadoop must be harnessed, and it’s not always an easy thing to do when your technology requires executables tied to CPUs. One of the powerful things about Hadoop is that it scales based on a languages like PIG, Sqoop and Java without having to install anything. Want to expand the number of servers? Add a datanode server, tell the name namenode and rebalance - and your off and running. However, even this simple innovation is more difficult on some vendors’ architectures than others.<br />
<br />
Another rethinking that is taking place in the market is long-standing CPU-based pricing structure. Vendors who they keep their pricing structure based on core processors for Hadoop will continue to struggle because it runs counter to the power of Hadoop. You hear about the volume, velocity and variety. Technically, if you want to step up the volume with another datanode, it’s no big deal. However, it becomes a big deal if you have to renegotiate a vendor contract each time.<br />
<br />
Last year, around this time, <a href="http://data-governance.blogspot.com/2012/01/big-data-enterprise-data-and-discrete.html" target="_blank">I did write about</a> the various costs associated with the scale of data. In summary, the costs of licenses and connectors are the bigger for enterprise data, while the costs associated with skills are more likely to affect you with big data. There will come a time where the skills gap will be closed, however.<br />
<br />
In the year 2013, we’ll begin to see the un-hyping of big data in favor of this total data management approach. For buyers, big data will be a tick-box in their RFP’s in the effort to manage data, no matter what the size.<br />
<br /><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-47334289013944868072012-08-09T18:03:00.001-04:002012-08-09T18:06:39.003-04:00Big Data, Good and Evil<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8n9yrVUC4XvFcC_lifxbD-ZinyVGFoRZifYzvYdXEOwTY622-Q-arcyDSLXB_QYQA8QjXZnM3abku9Bk-D_a5gQHeTtM6JztpI92ML7mutuQO_R8te1OUQbb6_2_ll8VQk7sS2Nhc92Y/s1600/GoodEvil.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="199" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8n9yrVUC4XvFcC_lifxbD-ZinyVGFoRZifYzvYdXEOwTY622-Q-arcyDSLXB_QYQA8QjXZnM3abku9Bk-D_a5gQHeTtM6JztpI92ML7mutuQO_R8te1OUQbb6_2_ll8VQk7sS2Nhc92Y/s200/GoodEvil.jpg" width="200" /></a></div>
As I get involved more and more in the world of Big Data, I find myself reflecting upon where it all will go. Big Data could help us live better lives by solving crimes, predicting scientific outcomes, detecting fraud and, of course, optimize our marketing so that we don’t bother people who don’t want our products and target them when we think they do. While the ‘goodness’ of some of those items are decidedly debatable, that’s the bright side. Big Data does represent a paradigm shift for our society, but since it’s still young, we’re just not sure exactly how big Big Data is yet.<br />
<br />
When I write about Big Data, I’m talking about leveraging new sources of data like social media, transaction data, sensor data, networked devices and more. These data sources tend to be… well, big. Mashing them up with your traditional CRM data or supply chain data can tell you some fascinating things. They even tell you some interesting things all by themselves. It can give you information that wasn’t possible to attain, until recently, when we achieved the technology nd ability to handle Big Data in a meaningful way. We are already starting to see amazing <a href="http://www.talend.com/blog/2012/05/14/the-new-use-cases-of-big-data/">case studies from Big Data</a>.<br />
<br />
On the other hand, there is potential folly. Despite the absolute evolutionary power that Big Data can bring to us, it’s also human nature for some to abuse. When technological evolution brought us snail-mail, many abused it with junk mail. When technology brought us e-mail, a few abused it by spamming us. Abuse is my biggest concern. The potential abuse with Big Data is that corporations completely figure out what makes us tick thereby giving them unprecedented power over our buying decisions. It could lead our social issues, too. For example, if Big Data says that people who eat cheeseburgers after 9 PM are more likely to get a heart attack, do we justify outlawing cheeseburgers after 9? I'd rather make my own decisions.<br />
<br />
The movie “The minority report” starring Tom Cruise has come to mind. As truth imitates fiction, I can help but think of the <a href="http://www.youtube.com/watch?v=oBaiKsYUdvg">mall scene from the movie</a> which overall painted a fairly grim picture of marketing in the future. Now, I see it as prophetic. <br />
<br />
This type of marketing already exists within some free online e-mail systems. For example, if I’m e-mailing my friends about a trip to Vegas or gambling, or even when I post this blog that mentions Vegas, it’s no mistake when ads for Caesars Palace appear. It’s cool, but yet I am uneasy. Will future employers use big data to help decide if I am worthy of work. Will my e-mail conversations about Las Vegas lead them to believe I am a compulsive gambler thus giving the edge to someone else? If so, what is my recourse to set the record straight?<br />
<br />
Government has reportedly been getting in on big data, too. A <a href="http://www.wired.com/threatlevel/2012/03/ff_nsadatacenter/">recent Wired magazine story</a> talked about a huge government facility outside in Utah. While there is clearly a "good" aspect to this big data, namely the catching of bad guys, the most troubling aspect of this might be that the citizens have no control of their own data. Oversight on what can and cannot be done with the wealth of information at this facility is unclear.<br />
<br />
That said, I generally have an overall positive view of the good that Big Data will bring to society, and the positive influence it will have on data management professionals. We have a society today that is more open and more willing to post private information to the public. Society is therefore more tolerant today and will be even more so in the future.<br />
<br />
Ultimately, when and if Big Data becomes abusive to privacy, overzealous capitalism, social issues, et al, expect capitalism to also solve it. Look for companies who set up online e-mail and promote the fact that they don’t track conversations. Look for utilities to overwhelm any negative information about you in the Big Data universe with positive information. We could be looking at a cottage industry of managing and protecting your Big Data image.<br />
<br /><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-80483616362739347022012-05-17T13:31:00.002-04:002012-05-17T13:44:09.973-04:00Naming your Data Management Project<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjx-vZGoslR45pK42NYxNnzUwd-oMY2mWd2yUs6kKtJWiN4wS78tgJKsiBRBSFp_pLKvXwQXCK3-81Jzbkx2PqquxfRtewVCyL0BpTBZXgw3h0yr3JGpYpfW2kkucIwEDiqACHhG6PlwcQ/s1600/hellobadge2.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="129" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjx-vZGoslR45pK42NYxNnzUwd-oMY2mWd2yUs6kKtJWiN4wS78tgJKsiBRBSFp_pLKvXwQXCK3-81Jzbkx2PqquxfRtewVCyL0BpTBZXgw3h0yr3JGpYpfW2kkucIwEDiqACHhG6PlwcQ/s200/hellobadge2.jpg" width="200" /></a></div>
<div style="text-align: left;">
In my line of work, I get to see many requests for proposals and sometimes I am invited to take part when a project is progressing. I may be one of the only people on earth who gets pleasure in companies improving their data management strategy because I almost always see a huge return on investment. We’re making the world a better place by managing data the right way, so thanks to those who have made me part of your project.</div>
<br />
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I do have one word of advice for project managers, however. Please think when you name your projects. I can’t tell you how many times I’ve come into a project where some long description is the name of a project and it soon becomes and equally uncompelling acronym. They are project names like:</div>
<br />
<ul>
<li>Salesforce Marketing Analyst Data Mart and Sales Marketing Information Daily Audit or you can go by the catchy acronym SMADMASMIDA</li>
<li>Outlook Sales Partner Contact Daily Reconciliation or OSPCDR</li>
<li>Operational Business Intelligence for Marketing Analytics or OBIMA</li>
</ul>
<br />
The names and their acronyms are pretty close to meaningless. People will be more excited by references to the news and pop culture than by intellectual terminology. It matters. Using the technical terms put you in an elitist club of IT, and remember, we’re trying to break down the barriers between business and IT.<br />
<br />
Some examples:<br />
<br />
<ul>
<li>Any Business Intelligence project today that doesn’t have the name ‘Moneyball’ in the title is missing a huge opportunity. Everyone knows what the movie Moneyball is about and the way that the Oakland A’s used business intelligence to win. Easy sale of your project to business.</li>
<li>Big Data initiatives could be named after Adele’s “Rolling in the Deep”. Rolling in the Deep is what a ship does while out at sea. The image is a small ship tossed on a very deep, dark ocean (of data).</li>
<li>The song title is an adaptation of a British slang phrase “roll deep” which means to have a group who always has your back, who can get you out of trouble. It’s a nice image to signify the pervasiveness of data, the fact that there is strength in numbers and for data governance. </li>
</ul>
<br />
Of course, pop culture is a good way to start, but company culture and the history of your organization are also great inspiration for naming your project. Given the French background of Talend, my current employer, a name for a data consolidation project might be something like ‘Pas de Deux’ which promotes a vision of a relationship between two people or things.<br />
<br />
The point is, try to use the name of the project to promote a vision of the business problem you’re trying to solve. It’ll play better with the business folks. The name matters.<br />
<div>
<br /></div><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com3tag:blogger.com,1999:blog-6895175514429514812.post-76333673626845835732012-04-02T14:39:00.000-04:002012-04-02T14:40:37.872-04:00Why Code Base is Important in Vendor Selection<br />
<i>The horticulture of software</i><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNuOg6osq09qG9vMhtM7zwsga1t1EqBooXH2wGM69_mCGPsD3Y3orQKeM6Ok_9tciF-DmpVInBhhqJXYoOGqBIOwC45pBbZZuY1IXdsYl417Dkw9H6dDBBFvrh84RiGxqJR3d5EQ2TYWc/s1600/Growthl.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNuOg6osq09qG9vMhtM7zwsga1t1EqBooXH2wGM69_mCGPsD3Y3orQKeM6Ok_9tciF-DmpVInBhhqJXYoOGqBIOwC45pBbZZuY1IXdsYl417Dkw9H6dDBBFvrh84RiGxqJR3d5EQ2TYWc/s200/Growthl.jpg" width="128" /></a></div>
Spring has sprung here in the northern hemisphere and mind turn to the plant life that will be sprouting all across our home towns. The new growth has me thinking about the similarities between horticulture and the code base of our data management solutions.<br />
<br />
Reviewing software solutions before you buy is a major effort for users and/or vendor selection committees. Much time is spent on looking at whether the features of the product will meet team needs. Features are so important that companies will spend time to produce RFPs with extensive feature lists. They may even require a proof of concept; the vendor must install and test the solution in the purchaser’s work environment. This goes for those applications used to manage data, but also many other applications.<br />
<br />
However, I believe that buyers should carefully look at the style of growth to the code base. In the data management field, we undergone decades of technology combined with decades of market consolidation. The code base for the application you’re about to buy may have grown from the following horticultural strategy:<br />
<br />
<ul>
<li><b>Grafting </b>
– A large software company sees potential in the data management field and begins to acquire companies and grafting them together to create a solution. Sometimes the acquisition isn’t done by technologists, but by upper management seeking to fill holes in the product line. Sometimes they even buy competing technologies, leaving everyone trying to figure out who will win. Sometimes the graft doesn’t take.</li>
<li><b>Old Growth </b>– Companies have an existing technology that has worked for decades. However, back in 1990 when they released version 1.0, JAVA was experimental and not the dominant force it is today. FORTRAN was the preferred programming language and COBOL copybooks were the data model. I know some companies in the data management market have spent millions updating old growth code to be more competitive in this market, and some others who have not. This becomes a dilemma for all vendors at some point. When do you prune out the dead wood?</li>
<li><b>Sapling </b>– Companies who are just breaking into the data management marketplace and have a good-looking start for data management. However, the sapling doesn’t yet have all the branches you want on it. Will the sapling survive among the other deciduous solutions in the market?</li>
</ul>
<br />
When you’re selecting a vendor, you ideally want a code base that is mature, but not too mature. You want limited grafting. The growth of the code and the grafting affects:<br />
<br />
<ul>
<li>Speed of innovation for the vendor</li>
<li>Customization for you</li>
<li>Future expansion for both of you</li>
<li>The age and experience of the technologists necessary to operate it</li>
<li>Consulting requirements</li>
<li>Ability to cross-train personnel (E.g. DI people running DQ and vice versa)</li>
</ul>
<br />
So, when you’re selecting a data management solution, or any technology solution, don’t just compare the features, but take a look at how the product grew to where it is today. Look for the solution in the optimal stage of growth that will meet your needs today and those for the future.<br />
<br />
<br /><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-62627542043652133812012-03-22T15:11:00.000-04:002012-03-22T15:11:03.641-04:00Big Data Hype is an Opportunity for Data Management Pros<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXMKdmCvJmKR3QEVjfZ7s3vnJ1_BkJIMB6F835KteehdtrVzePM6NBY2QcAaHeOnpoIykLBl2sDB4rZGsjbaBDRpOzGnUrAhsjWvHEt0tqAUPi00PYU2FdOEniKgeHhR7j7EKPMdDZ1FE/s1600/Big+Data.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXMKdmCvJmKR3QEVjfZ7s3vnJ1_BkJIMB6F835KteehdtrVzePM6NBY2QcAaHeOnpoIykLBl2sDB4rZGsjbaBDRpOzGnUrAhsjWvHEt0tqAUPi00PYU2FdOEniKgeHhR7j7EKPMdDZ1FE/s200/Big+Data.jpg" width="197" /></a></div>Big Data is a hot topic in the data management world. Recently, I’ve seen press and vendors describing it with the words crucial, tremendous opportunity, overcoming vexing challenges, and enabling technology. With all the hoopla, management is probably asking many of you about your Big Data strategy. It has risen to the corporate management level; your CxO is probably aware.<br />
<br />
Most of the data management professionals I’ve met are fairly down-to-earth, pragmatic folks. Data is being managed correctly or not. The business rule works, or it does not. Marketing spin is evil. In fact, the hype and noise around big data may be something to be filtered by many of you. You’re appropriately trying to look through the hype and get to the technology or business process that’s being enhanced by Big Data.<br />
However, in addition to filtering through the big data hype to the IT impact, data management professionals should also embrace the hype.<br />
<br />
Sure, we want to handle the high volume transactions that often come with big data, but we still have relational databases and unstructured data sources to deal with. We still have business users using Excel for databases with who-knows-what in them. We still have e-mail attachments from partners that need to be incorporated into our infrastructure. We still have a wide range of data sources and targets that we have to deal with, including, but not limited to, big data. In my last blog post, I wrote about how big data is just one facet of total data management.<br />
<br />
The opportunity is for data management pros to think about their big data management strategy holistically and solve some of their old and tired issues around data management. It’s pretty easy to draw a picture for management that Big Data needs to take a Total Data Management approach. An approach that includes some of our worn-out and politically-charged data governance issues, including:<br />
<br />
<br />
<ul><li>Data Ownership – One barrier to big data management is accountability for the data. By deciding you are going to plan for big data, you also need to make decisions about who owns the big data, and all your data sets for that matter.</li>
<li>Spreadmarts – Keeping unmanaged data out of spreadsheets is increasingly more crucial in companies who must handle Big Data. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. However, big data can help make it easy for everyone to use corporate information, no matter what size.</li>
<li>Unstructured Data – Although big data might tend be more analytical than operational, big data is most commonly unstructured data. A total data management approach takes into account unstructured data in either case. Having technology and processes that handles unstructured data, big or small, is crucial to total data management.</li>
<li>Corporate Strategy and Mergers – If your company is one that grows through acquisition, managing big data is about being able to handle, not only your own data, but the data of those companies you acquire. Since you don’t know what systems those companies will have, a big data governance strategy and flexible tools are important to big data.</li>
</ul><br />
<br />
My point is, with big data, try to avoid the typical noise filtering exercises you normally take on the latest buzzword. Instead, use the hype and buzz to your advantage to address a holistic view of data management in your organization.<br />
<div><br />
</div><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-89357758833421671252012-01-24T10:14:00.001-05:002012-02-13T11:06:43.905-05:00Big Data, Enterprise Data and Discrete Data<b>Total Data Management</b>©<br />
The data management world is buzzing about big data. Many are the number of blog posts articles and white papers covering this new area. Just about every data management vendor is scrambling to build tools to meet the needs of big data.<br />
<br />
The world is correct to pay notice. The ability for companies to handle big data represents exciting innovation where large relational databases with high price tags are sometimes replaced with flat files, technologies like Hadoop and intelligent parsers to create analytics from massive amounts of data. It’s a game-changer for those in the Business Intelligence and relational database business. It’s about managing an increasingly common huge data problem more effectively and at lower cost.<br />
<br />
However, where there is big data, there is also enterprise (medium) data and discrete (small) data. With each size of data come very specific challenges. <br />
<br />
<br />
<table align="left" border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; margin-left: 6.75pt; margin-right: 6.75pt; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-table-anchor-horizontal: margin; mso-table-anchor-vertical: paragraph; mso-table-left: left; mso-table-lspace: 9.0pt; mso-table-rspace: 9.0pt; mso-table-top: 131.65pt; mso-yfti-tbllook: 1184;"><tbody>
<tr style="height: 14.25pt; mso-yfti-firstrow: yes; mso-yfti-irow: 0; page-break-inside: avoid;"> <td style="border: solid windowtext 1.0pt; height: 8.25pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-left: none; border: solid windowtext 1.0pt; height: 8.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><b><span style="font-size: x-small;">BIG DATA<o:p></o:p></span></b></div></td> <td style="border-left: none; border: solid windowtext 1.0pt; height: 8.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><b><span style="font-size: x-small;">ENTERPRISE DATA<o:p></o:p></span></b></div></td> <td style="border-left: none; border: solid windowtext 1.0pt; height: 8.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>DISCRETE DATA</b><o:p></o:p></span></div></td> </tr>
<tr style="height: 98.95pt; mso-yfti-irow: 1; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Technologies</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Hadoop and flat files to reduce costs and avoid relational database costs.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Relational databases<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Spreadsheets and flat files and flat databases. May come from other non-relational sources, such as e-mail attachments, social media JSON, and XML data.<o:p></o:p></span></div></td> </tr>
<tr style="height: 98.95pt; mso-yfti-irow: 2; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Use Cases</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Real-time analytics of a large number of transactions, including web analytics, SaaS up-time optimization, mission-critical analysis of transactions<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Just about every business application today, including CRM, ERP, Data Warehouse, and MDM.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 98.95pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Companies with no or little data management strategy, or for those companies dealing with immature data architecture. Companies who receive mission-critical data via e-mail. Companies who need to closely follow social media streams.<o:p></o:p></span></div></td> </tr>
<tr style="height: 56.2pt; mso-yfti-irow: 3; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 56.2pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Innovation</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 56.2pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Handles huge amounts of data that is predominantly used for business analytics and operational BI.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 56.2pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Provides a power data management architecture that can be accessed by a common language (SQL).<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 56.2pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Handles more diverse and more dynamic sources.<o:p></o:p></span></div></td> </tr>
<tr style="height: 14.25pt; mso-yfti-irow: 4; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Positives</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Replaces high cost multi-server relational databases with lower costs flat files and Hadoop server farms.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Provides a scalable, reproducible environment in which database applications and solutions can be developed. Replaces unwieldy human-intensive data processes with streamlined central repository of information. Used in many businesses in day-to-day operations.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">‘Simplifies’ the data management process to the point of being completely within the grasp of the business users without too much complicated technology. In the long run, however, data management is more costly and unwieldy when it is in spreadmarts.<o:p></o:p></span></div></td> </tr>
<tr style="height: 14.25pt; mso-yfti-irow: 5; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Negatives</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Relatively new technology with limited pool of Big Data experts. Legacy medium-sized systems can sometimes scale.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Can be costly when data volumes become high, as new servers and new enterprise licenses get more common. Also, the number of sources and diversity of data types.<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Error-prone and labor intensive. <o:p></o:p></span></div></td> </tr>
<tr style="height: 14.25pt; mso-yfti-irow: 6; mso-yfti-lastrow: yes; page-break-inside: avoid;"> <td style="border-top: none; border: solid windowtext 1.0pt; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;"><b>Cost Focus</b><o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 139.5pt;" valign="top" width="186"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Expertise<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.9in;" valign="top" width="182"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Servers and licenses/ Connectors and database technology<o:p></o:p></span></div></td> <td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 14.25pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.8in;" valign="top" width="173"><div class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: x-small;">Efficiency and productivity </span><o:p></o:p></div></td> </tr>
</tbody></table><br />
<br />
<br />
<br />
<br />
<br />
<br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b><br />
</b><br />
<b>Growing Up</b><br />
An organization’s data management maturity plays a role in big and little data. If you’re still managing your customer list in a spreadsheet, it’s probably something you started when your company was fairly young. Now, the uses for the data should be expanded and you are still stuck in the young company’s process. Something that was agile when you were young is inefficient today.<br />
<br />
Your pain may also have something to do with your partners’ data management maturity. While the other companies you do business with are good at what they do, supplying products and services to your company, they may not be as good at data management. The new parts catalog comes every so often as an e-mail attachment. You need an efficient process to update whoever uses it.<br />
<br />
No matter how mature you are, it is likely that you will have to deal with all types of data. When selecting tools, make sure you examine the cost and efficiency of all of these types, not just big data.<br />
<br />
<div><br />
</div><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com2tag:blogger.com,1999:blog-6895175514429514812.post-91512679408382836082012-01-10T15:40:00.000-05:002012-01-10T15:40:40.884-05:00What is Data Governance?I recently did a quick movie for a <a href="http://www.talend.com/">Talend </a>promotion to define data governance. It turns out that defining data governance is trickier than you think. Here, I examine the characteristics of data management initiative and how they define data governance.<br />
<br />
<object data="http://www.brainshark.com/brainshark/viewer/getplayer.ashx" height="366" id="bsplayer59581" name="bsplayer59581" type="application/x-shockwave-flash" width="440"><param name="movie" value="http://www.brainshark.com/brainshark/viewer/getplayer.ashx" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="pi=789047103&dm=5&pause=1&eurl=zHBzuyGQPz3T2ez0" /><a href="http://www.brainshark.com/brainshark/viewer/fallback.ashx?pi=789047103"><video width="440" height="366" controls="true" poster="http://www.brainshark.com/brainshark/brainshark.net/common/getimage.ashx?pi=789047103&w=440&h=366&sln=1"><source src="http://www.brainshark.com/brainshark/brainshark.net/apppresentation/getmovie.aspx?pi=789047103&fmt=2" /><img src="http://www.brainshark.com/brainshark/brainshark.net/apppresentation/splash.aspx?pi=789047103" width="440" height="366" border="0" /></video></a></object><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-16916204969336990822011-11-12T10:35:00.000-05:002011-11-12T10:35:58.016-05:00The ‘Time’ Factor in Data Management<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEd9Gba7CSXFCvQPMpNGRm0QJQR1Ux8MOpHQ8rPiqoDn5lqjbFFm3j41AzX6PLUWhDRNFMgK0U7DKeSRUe5Z2DkVe7_DZnfU2cyuC6OXyIVFekMtWA4rkCzdSfGbgCK551K5_IOQLV-VU/s1600/TIme.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEd9Gba7CSXFCvQPMpNGRm0QJQR1Ux8MOpHQ8rPiqoDn5lqjbFFm3j41AzX6PLUWhDRNFMgK0U7DKeSRUe5Z2DkVe7_DZnfU2cyuC6OXyIVFekMtWA4rkCzdSfGbgCK551K5_IOQLV-VU/s200/TIme.jpg" width="200" /></a></div>I've been thinking about how many ways time influences the data management world. When it comes to managing data, we think about improving processes, coercing the needs and desires of people and how technology comes to help us manage it all. However, an often overlooked aspect of data management is time. Time impacts data management from many different directions.<br />
<br />
<b>Time Means Technology Will Improve</b><br />
As time marches on, technology offers twists and turns to the data steward through innovation. 20 years ago, mainframes ruled the world. We’ve migrated through relational databases on powerful servers to a place where we see our immediate future in cloud and big data. As technology shifts, you must consider the impact of data.<br />
<br />
The good news is that with these huge challenges, you also get access to new tools. In general, tools have become less arcane and more business-user focused as time marches on. <br />
<b><br />
Time Causes People to Change</b><br />
Like changes in technology, people also mature, change careers, retire. With regard to data management, the corporation must think about the expertise needed to complete the data mission. Data management must pass the “hit by a bus” test where the company would not suffer if one or more key people were to be hit by a Greyhound traveling from Newark to Richmond.<br />
<br />
Here, time is requiring us to be more diligent in documenting our processes. It is requiring us to avoid undocumented hand-coding and pick a reproducible data management platform. It helps to have third-party continuity, like consultants who, although will also experience changes in personnel, will change on a different schedule than their clients.<br />
<b><br />
Time Leads to Clarity in the Imperative of Data Management</b><br />
With regard to data management, corporations have a maturity process they go through. They often start as chaotic immature organizations and realize the power of data management in a tactical maturity stage. Finally, they realize data management is a strategic initiative when they begin to govern the data. Throughout it all, people, process and technologies change.<br />
<br />
Knowing where you are in this maturity cycle can help you plan where you want to go from here and what tactics you need to put in place to get there. For example, very few companies go from chaotic, ad hoc data management to full-blown MDM. For the most part, they get there through making little changes, seeing the positive impact of the little changes and wanting more. Rather, a chaotic organization might be more apt to evolve their data management maturity by consolidating two or more ERP systems and revel in its efficiency.<br />
<br />
<b>Time Prevents Us from Achieving Successful Projects</b><br />
When it comes to specific projects, taking too much time can lead to failure in projects. In the not so distant past, circa 2007, the industry commonly took on massive, multi-year, multimillion dollar MDM projects. We now know that these projects are not the best way to manage data. Why? Think about how much your own company has changed in the last two years. If it is a dynamic, growing company, it likely has different goals, different markets, different partners and new leadership. The world has changed significantly, too. Today’s worldwide economy is so much different that even one year ago. (Have you heard about the recession and European debt crisis?) The goals of a project that you set up two years ago will never achieve success today. <br />
<br />
Time makes us take an agile approach to data management. It requires that we pick off small portions of our problems, solve them, prove value and re-use what we’ve learned on the next agile project. Limit and hold scope to achieve success.<br />
<br />
<b>Time Achieves Corporate Growth</b> (which is counter to data management)<br />
Companies who are just starting out generally have fewer data management problems than those who are mature. Time pushes our data complexity deeper and deeper. Therefore time dictates that even small companies should have some sort of data management strategy. The good news is that now achievable with help from open source and lower cost data management solutions. Proper data management tools are affordable by both Fortune 1000 and small to medium-sized enterprises.<br />
<br />
<b>Time Holds Us Responsible</b><br />
That said, the longer a corporation is in business, the longer it can be held responsible for lower revenue, decreased efficiency and lack of compliance due to poor data management. The company decides how it is going to govern (or not govern) data, what data is acceptable in the CRM and who is responsible for the mistakes that happen due to poor data management. The longer you are in business, the more responsible the corporation is for its governance. Time holds us responsible if the problems aren’t solved.<br />
<b><br />
Time and Success Lead to Apathy </b><br />
Finally, time often brings us success in data management. With success, there is a propensity for corporations to take the eye off the prize and spend monies on more pressing issues. Time and success can lead to a certain apathy, believing that the data management problem is solved. But, as time marches on, new partners, new data sources, new business processes. Time requires us to be ever vigilant in our efforts to manage data.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-52704319201448069692011-08-31T05:09:00.010-04:002011-08-31T05:09:00.175-04:00Top Ten Root Causes of Data Quality Problems: Part Five<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsIas41PxndLR4RbKAg1tcY81Q_o9_7uohBAtSMytn3o6EqVZXDJwl4APpJwBwId396EuaIPL-mkGU_6Dcm7Mtf75TJpOD24-a1sG8A7kmbYbkMWWRHK0Bt5yn4bSUamXsoUutR_bKl0I/s1600/Checklistblu.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsIas41PxndLR4RbKAg1tcY81Q_o9_7uohBAtSMytn3o6EqVZXDJwl4APpJwBwId396EuaIPL-mkGU_6Dcm7Mtf75TJpOD24-a1sG8A7kmbYbkMWWRHK0Bt5yn4bSUamXsoUutR_bKl0I/s200/Checklistblu.jpg" width="200" /></a></div><b>Part 5 of 5: People Issues</b><br />
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. Companies rely on data to make significant decisions that can affect customer service, regulatory compliance, supply chain and many other areas. As you collect more and more information about customers, products, suppliers, transactions and billing, you must attack the root causes of data quality. <br />
<b><br />
Root Cause Number Nine: Defining Data Quality</b><br />
More and more companies recognize the need for data quality, but there are different ways to clean data and improve data quality. You can:<br />
<ul><li>Write some code and cleanse manually</li>
<li>Handle data quality within the source application</li>
<li>Buy tools to cleanse data</li>
</ul>However, consider what happens when you have two or more of these types of data quality processes adjusting and massaging the data. Sales has one definition of customer, while billing has another. Due to differing processes, they don’t agree on whether two records are a duplicate.<br />
<br />
<b>Root Cause Attack Plan</b><br />
<ul><li>Standardize Tools – Whenever possible, choose tools that aren’t tied to a particular solution. Having data quality only in SAP, for example, won’t help your Oracle, Salesforce and MySQL data sets. When picking a solution, select one that is capable of accessing any data, anywhere, at any time. It shouldn't cost you a bundle to leverage a common solution across multiple platforms and solutions.</li>
<li>Data Governance – By setting up a cross-functional data governance team, you will have the people in place to define a common data model.</li>
</ul><b><br />
Root Cause Number Ten: Loss of Expertise</b><br />
On almost every data intensive project, there is one person whose legacy data expertise is outstanding. These are the folks who understand why some employee date of hire information is stored in the date of birth field and why some of the name attributes also contain tax ID numbers. <br />
Data might be a kind of historical record for an organization. It might have come from legacy systems. In some cases, the same value in the same field will mean a totally different thing in different records. Knowledge of these anomalies allows experts to use the data properly. <br />
If you encounter this situation, there are some business processes you can follow.<br />
<br />
<b>Root Cause Attack Plan</b><br />
<ul><li>Profile and Monitor – Profiling the data will help you identify most of these types of issues. For example, if you have a tax ID number embedded in the name field, analysis will let you quickly spot it. Monitoring will prevent a recurrence.</li>
<li>Document – Although they may be reluctant to do so for fear of losing job security, make sure experts document all of the anomalies and transformations that need to happen every time the data is moved.</li>
<li>Use Consultants – Expert employees may be so valuable and busy that there is no time to document the legacy anomalies. Outside consulting firms are usually very good at documenting issues and providing continuity between legacy and new employees.</li>
</ul><br />
This post is an excerpt from a white paper available <a href="http://info.talend.com/DQ_10_Root_Causes.html?src=datagovblog">here</a>. More to come on this subject in the days ahead.<br />
<br />
See also:<br />
<ul><li>Part One: <a href="http://data-governance.blogspot.com/2011/08/top-ten-root-causes-of-data-quality.html">The Basics</a></li>
<li>Part Two: <a href="http://data-governance.blogspot.com/2011/08/top-ten-root-causes-of-data-quality_25.html">Renegades and Pirates</a></li>
<li>Part Three: <a href="http://data-governance.blogspot.com/">Secret Code and Corporate Evolution</a></li>
<li>Part Four: <a href="http://data-governance.blogspot.com/">Data Flow</a></li>
</ul><br />
<br />
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-54713383058205382222011-08-30T05:09:00.001-04:002011-08-30T05:09:00.919-04:00Top Ten Root Causes of Data Quality Problems: Part Four<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5jLTdO9DEv4dNFhR154FEncYHeEw2nmkpxY8Q7nHbdLCiLTP9MyiwsDUsJxhcE5pMLBLaERLsqjrmRDcmjEKDIImKeyojgrOgYVL0TwHHk0-ZrRwss3J6G1tR2ba-EAZ-7zCwKjdRfdQ/s1600/Checklistgrn.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5jLTdO9DEv4dNFhR154FEncYHeEw2nmkpxY8Q7nHbdLCiLTP9MyiwsDUsJxhcE5pMLBLaERLsqjrmRDcmjEKDIImKeyojgrOgYVL0TwHHk0-ZrRwss3J6G1tR2ba-EAZ-7zCwKjdRfdQ/s200/Checklistgrn.jpg" width="200" /></a></div><b>Part 4 of 5: Data Flow</b><br />
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part four, we examine some of the areas involving the pervasive nature of data and how it flows to and fro within an organization.<br />
<b><br />
Root Cause Number Seven: Transaction Transition</b><br />
More and more data is exchanged between systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases.<br />
<br />
However, what happens when transactions go awry? A malfunctioning system could cause problems with downstream business applications. In fact, even a small data model change could cause issues.<br />
<br />
<b>Root Cause Attack Plan</b><ul><li>Schema Checks – Employ schema checks in your job streams to make sure your real-time applications are producing consistent data. Schema checks will do basic testing to make sure your data is complete and formatted correctly before loading.</li>
<li>Real-time Data Monitoring – One level beyond schema checks is to proactively monitor data with profiling and data monitoring tools. Tools like the <a href="http://www.talend.com/products-data-quality/talend-data-quality.php%20">Talend Data Quality Portal</a> and others will ensure the data contains the right kind of information. For example, if your part numbers are always a certain shape and length, and contain a finite set of values, any variation on that attribute can be monitored. When variations occur, the monitoring software can notify you.</li>
</ul><b><br />
Root Cause Number Eight: Metadata Metamorphosis</b><br />
Metadata repository should be able to be shared by multiple projects, with audit trail maintained on usage and access. For example, your company might have part numbers and descriptions that are universal to CRM, billing, ERP systems, and so on. When a part number becomes obsolete in the ERP system, the CRM system should know. Metadata changes and needs to be shared.<br />
<br />
In theory, documenting the complete picture of what is going on in the database and how various processes are interrelated would allow you to completely mitigate the problem. Sharing the descriptions and part numbers among all applicable applications needs to happen. To get started, you could then analyze the data quality implications of any changes in code, processes, data structure, or data collection procedures and thus eliminate unexpected data errors. In practice, this is a huge task.<br />
<b><br />
Root Cause Attack Plan </b><ul><li>Predefined Data Models – Many industries now have basic definitions of what should be in any given set of data. For example, the automotive industry follows certain ISO 8000 standards. The energy industry follows Petroleum Industry Data Exchange standards or PIDX. Look for a data model in your industry to help.</li>
<li>Agile Data Management – Data governance is achieved by starting small and building out a process that first fixes the most important problems from a business perspective. You can leverage agile solutions to share metadata and set up optional processes across the enterprise.</li>
</ul><br />
This post is an excerpt from a white paper available <a href="http://info.talend.com/DQ_10_Root_Causes.html?src=datagovblog">here</a>. My final post on this subject in the days ahead.<br />
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-78869586581239872832011-08-29T05:09:00.002-04:002011-08-29T05:09:00.318-04:00Top Ten Root Causes of Data Quality Problems: Part Three<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2ceDwILCpkSZuJ6rAwJXSJKCov0W3yS8Vjbs7Z6EzwCX_HKbsky48G1WyEoBStIgHYt9Lkh0XNTcvh8dnfRDE8jDZ1qCtwFqK_x2uoLq1gFBc53yDYQpb2gCc_p7NjTfvjhw9NEstkdA/s1600/Checklistblu.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2ceDwILCpkSZuJ6rAwJXSJKCov0W3yS8Vjbs7Z6EzwCX_HKbsky48G1WyEoBStIgHYt9Lkh0XNTcvh8dnfRDE8jDZ1qCtwFqK_x2uoLq1gFBc53yDYQpb2gCc_p7NjTfvjhw9NEstkdA/s200/Checklistblu.jpg" width="200" /></a></div><b>Part 3 of 5: Secret Code and Corporate Evolution</b><br />
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part three, we examine secret code and corporate evolution as two of the root causes for data quality problems.<br />
<b><br />
</b><b>Root Cause Number Five: </b><b>Corporate Evolution</b><br />
<i>Change is good… except for data quality</i><br />
An organizations undergoes business process change to improve itself. Good, right? Prime examples include:<br />
<ul><li>Company expansion into new markets</li>
<li>New partnership deals</li>
<li>New regulatory reporting laws</li>
<li>Financial reporting to a parent company</li>
<li>Downsizing </li>
</ul>If data quality is defined as “fitness for purpose,” what happens when the purpose changes? It’s these new data uses that bring about changes in perceived level of data quality even though underlying data is the same. It’s natural for data to change. As it does, the data quality rules, business rules and data integration layers must also change.<br />
<br />
<b>Root Cause Attack Plan</b> <br />
<ul><li>Data Governance – By setting up a cross-functional data governance team, you will always have a team who will be looking at the changes your company is undergoing and considering its impact on information. In fact, this should be in the charter of a data governance team.</li>
<li>Communication – Regular communication and a well-documented metadata model will make the process of change much easier.</li>
<li>Tool Flexibility – One of the challenges of buying data quality tools embedded within enterprise applications is that they may not work in ALL all enterprise applications. When you choose tools, make sure they are flexible enough to work with data from any application and that the company is committed to flexibility and openness.</li>
</ul><br />
<b>Root Cause Number Six: </b><b>Secret Code</b><br />
Databases rarely start begin their life empty. The starting point is typically a data conversion from some previously existing data source. The problem is that while the data may work perfectly well in the source application, it may fail in the target. It’s difficult to see all the custom code and special processes that happen beneath the data unless you profile. <br />
<br />
<b>Root Cause Attack Plan</b><br />
<ul><li>Profile Early and Often – Don’t assume your data is fit for purpose because it works in the source application. Profiling will give you an exact evaluation of the shape and syntax of the data in the source. It also will let you know how much work you need to do to make it work in the target.</li>
<li>Corporate Standards - Data governance will help you define corporate standards for data quality. </li>
<li>Apply Reusable Data Quality Tools When Possible – Rather than custom code in the application, a better strategy is to let data quality tools apply standards. Data quality tools will apply corporate standards in a uniform way, leading to more accurate sharing of data.</li>
</ul><br />
This post is an excerpt from a white paper available <a href="http://info.talend.com/DQ_10_Root_Causes.html?src=datagovblog">here</a>. The final posts on this subject will come in the days ahead.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-31301865542323918282011-08-25T05:09:00.010-04:002011-08-25T05:09:00.308-04:00Top Ten Root Causes of Data Quality Problems: Part Two<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAv0YuRemr2XPcwvXDEkSWBqAgxvNHbGIrE6p-5os3NBlOQ4UE854it9coyjAhUztzy1Wbx-BESU0vMENrptIAfY6ZAE3vu4jKkcIwOb6rELj0xzTNc-ErUo1DWJVAooJx07fOiJBykjw/s1600/Checklist.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAv0YuRemr2XPcwvXDEkSWBqAgxvNHbGIrE6p-5os3NBlOQ4UE854it9coyjAhUztzy1Wbx-BESU0vMENrptIAfY6ZAE3vu4jKkcIwOb6rELj0xzTNc-ErUo1DWJVAooJx07fOiJBykjw/s200/Checklist.jpg" width="200" /></a></div><b>Part 2 of 5: Renegades and Pirates</b><br />
In this continuing series, we're looking at root causes of data quality problems and the business processes you can put in place to solve them. In part two, we examine IT renegades and corporate pirates as two of the root causes for data quality problems.<br />
<br />
<b>Root Cause Number Three: Renegade IT and Spreadmarts</b><br />
A renegade is a person who deserts and betrays an organizational set of principles. That’s exactly what some impatient business owners unknowingly do by moving data in and out of business solutions, databases and the like. Rather than wait for some professional help from IT, eager business units may decide to create their own set of local applications without the knowledge of IT. While the application may meet the immediate departmental need, it is unlikely to adhere to standards of data, data model or interfaces. The database might start by making a copy of a sanctioned database to a local application on team desktops. So-called “spreadmarts,” which are important pieces of data stored in Excel spreadsheets, are easily replicated to team desktops. In this scenario, you lose control of versions as well as standards. There are no backups, versioning or business rules.<br />
<br />
<b>Root Cause Attack Plan</b> <br />
<ul><li>Corporate Culture – There should be a consequence for renegade data, making it more difficult for the renegades to create local data applications.</li>
<li>Communication – Educate and train your employees on the negative impact of renegade data.</li>
<li>Sandbox – Having tools that can help business users and IT professionals experiment with the data in a safe environment is crucial. A sandbox, where users are experimenting on data subsets and copies of production data, has proven successful for many for limiting renegade IT.</li>
<li>Locking Down the Data – A culture where creating unsanctioned spreadmarts is shunned is the goal. Some organizations have found success in locking down the data to make it more difficult to export.</li>
</ul><b><br />
Root Cause Number Four: Corporate Mergers</b><br />
Corporate mergers increase the likelihood for data quality errors because they usually happen fast and are unforeseen by IT departments. Almost immediately, there is pressure to consolidate and take shortcuts on proper planning. The consolidation will likely include the need to share data among a varied set of disjointed applications. Many shortcuts are taken to “make it happen,” often involving known or unknown risks to the data quality. <br />
On top of the quick schedule, merging IT departments may encounter culture clash and a different definition of truth. Additionally, mergers can result in a loss of expertise when key people leave midway through the project to seek new ventures.<br />
<br />
<b>Root Cause Attack Plan</b><br />
<ul><li>Corporate Awareness – Whenever possible civil division of labor should be mandated by management to avoid culture clashes and data grabs by the power hungry.</li>
<li>Document – Your IT initiative should survive even if the entire team leaves, disbands or gets hit by a bus when crossing the street. You can do this with proper documentation of the infrastructure.</li>
<li>Third-party Consultants – Management should be aware that there is extra work to do and that conflicts can arise after a merger. Consultants can provide the continuity needed to get through the transition.</li>
<li>Agile Data Management – Choose solutions and strategies that will keep your organization agile, giving you the ability to divide and conquer the workload without expensive licensing of commercial applications.</li>
</ul>This post is an excerpt from a white paper available <a href="http://info.talend.com/DQ_10_Root_Causes.html?src=datagovblog">here</a>. More to come on this subject in the days ahead.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-26248801945980422862011-08-24T05:01:00.000-04:002011-08-24T05:01:25.911-04:00Top Ten Root Causes of Data Quality Problems: Part One<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3CM-G7_w0c5gViI8eXiymdSrMLCN_PNahNnB__ISzwH_M9mnm0LAte39bJENNXG9zJZK6x1IbzZjrtn8UeW0OWae_yjMoLtKgtK5zT3-LaTY0gJy5aoAWe7NLo0QrxNBUZV1cAnvGbzg/s1600/Checklist.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3CM-G7_w0c5gViI8eXiymdSrMLCN_PNahNnB__ISzwH_M9mnm0LAte39bJENNXG9zJZK6x1IbzZjrtn8UeW0OWae_yjMoLtKgtK5zT3-LaTY0gJy5aoAWe7NLo0QrxNBUZV1cAnvGbzg/s200/Checklist.jpg" width="200" /></a></div><b>Part 1 of 5: The Basics</b><br />
We all know data quality problems when we see them. They can undermine your organization’s ability to work efficiently, comply with government regulations and make revenue. The specific technical problems include missing data, misfielded attributes, duplicate records and broken data models to name just a few.<br />
But rather than merely patching up bad data, most experts agree that the best strategy for fighting data quality issues is to understand the root causes and put new processes in place to prevent them. This five part blog series discusses the top ten root causes of data quality problems and suggests steps the business can implement to prevent them. <br />
In this first blog post, we'll confront some of the more obvious root causes of data quality problems.<br />
<br />
<b>Root Cause Number One: Typographical Errors and Non-Conforming Data</b><br />
Despite a lot of automation in our data architecture these days, data is still typed into Web forms and other user interfaces by people. A common source of data inaccuracy is that the person manually entering the data just makes a mistake. People mistype. They choose the wrong entry from a list. They enter the right data value into the wrong box.<br />
<br />
Given complete freedom on a data field, those who enter data have to go from memory. Is the vendor named Grainger, WW Granger, or W. W. Grainger? Ideally, there should be a corporate-wide set of reference data so that forms help users find the right vendor, customer name, city, part number, and so on.<br />
<br />
<b>Root Cause Attack Plan </b><br />
<ul><li>Training – Make sure that those people who enter data know the impact they have on downstream applications.</li>
<li>Metadata Definitions – By locking down exactly what people can enter into a field using a definitive list, many problems can be alleviated. This metadata (for vendor names, part numbers, and so on can) become part of data quality in data integration, business applications and other solutions.</li>
<li>Monitoring – Make public the results of poorly entered data and praise those who enter data correctly. You can keep track of this with data monitoring software such as the Talend Data Quality Portal.</li>
<li>Real-time Validation – In addition to forms, validation data quality tools can be implemented to validate addresses, e-mail addresses and other important information as it is entered. Ensure that your data quality solution provides the ability to deploy data quality in application server environments, in the cloud or in an enterprise service bus (ESB).</li>
</ul><br />
<b>Root Cause Number Two: Information Obfuscation</b><br />
Data entry errors might not be completely by mistake. How often do people give incomplete or incorrect information to safeguard their privacy? If there is nothing at stake for those who enter data, there will be a tendency to fudge.<br />
<br />
Even if the people entering data want to do the right thing, sometimes they cannot. If a field is not available, an alternate field is often used. This can lead to such data quality issues as having Tax ID numbers in the name field or contact information in the comments field.<br />
<br />
<b>Root Cause Attack Plan</b><br />
<ul><li>Reward – Offer an incentive for those who enter personal data correctly. This should be focused on those who enter data from the outside, like those using Web forms. Employees should not need a reward to do their job. The type of reward will depend upon how important it is to have the correct information.</li>
<li>Accessibility – As a technologist in charge of data stewardship, be open and accessible about criticism from users. Give them a voice when processes change requiring technology change. If you’re not accessible, users will look for quiet ways around your forms validation. </li>
<li>Real-time Validation – In addition to forms, validation data quality tools can be implemented to validate addresses, e-mail addresses and other important information as it is entered.</li>
</ul>This post is an excerpt from a white paper available <a href="http://info.talend.com/DQ_10_Root_Causes.html?src=datagovblog">here</a>. More to come on this subject in the days ahead.<br />
<br />
<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com4tag:blogger.com,1999:blog-6895175514429514812.post-9926871920698975522011-06-13T14:50:00.002-04:002011-06-13T15:20:43.471-04:00The Differences Between Small and Big Data<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRpnLl5HSsDgbwhT7lRrr4tf1f4DBa24mQLk3PQCSk0khgxGfwonYTnTYE6lhf304swbNUdhexsDXI7pshjXDJAfbRso3LqvKTKkWGjMmUJuxxHQeGzfKOjjKKFT26jrLfHIRbh7KTOaA/s1600/Fish.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRpnLl5HSsDgbwhT7lRrr4tf1f4DBa24mQLk3PQCSk0khgxGfwonYTnTYE6lhf304swbNUdhexsDXI7pshjXDJAfbRso3LqvKTKkWGjMmUJuxxHQeGzfKOjjKKFT26jrLfHIRbh7KTOaA/s200/Fish.jpg" width="200" /></a></div>There is a lot of buzz today about big data and companies stepping up to meet the challenge of ever increasing data volumes. In the center of it all, are Hadoop and the Cloud. Hadoop can intelligently manage the distribution of processing and your files. It manages the infrastructure needed to break down big data into more manageable chunks for processing by multiple servers. Likewise, a cloud strategy can take data management outside the walls of a corporation into a high scalable infrastructure.<br />
<br />
Do you have big data? It’s difficult to know precisely whether you do because big data is vaguely defined. You may qualify for big data technology if you face hundreds of gigabytes of data, or it may hundreds or thousands of terabytes. The classification of “big data” is not strictly defined by data size, but other business processes, too. Your data management infrastructure needs to take into account factors like future data volumes, peaks and lulls in requirements, business requirements and much more.<br />
<b><br />
Small and Medium-Sized Data</b><br />
What about “small” and medium-sized data? For example, data from spreadsheet, the occasional flat file, leads from a trade show, and catalog data from vendors may be vital to your business processes. With a new industry focus on transparency, business user involvement and sharing of data, small data is a constant issue. Spreadsheets and flat files are the preferred method to share data today because most companies have some process for handling them. When you get these small to medium sized data sets, it is still necessary to:<br />
<ul><li>profile them</li>
<li>integrate them into your relational database</li>
<li>aggregate data from these sources, or extract only the vital parts</li>
<li>apply data quality standards when necessary</li>
<li>use them as part of a master data management (MDM) initiative</li>
</ul><br />
<b>The Difference Goals of Big Data and Little Data </b><br />
With big data, the concern is usually about your data management technology’s ability to handle massive quantities in order to provide you aggregates that are meaningful. You need solutions that will scale to meet your data management needs. However, handling small and medium data sets is more about short and long term costs. How can you quickly and easily integrate data without a lot of red tape, big license fees, pain and suffering.<br />
<br />
Think about it. When you need to handle small and medium data, you have options:<br />
<ul><li>Hand-coding: Using hand-coding is sometimes faster than any solution and it still may be OK for ad-hoc, one off data integration. Once you find yourself hand-coding again and again, you’ll find yourself rethinking that strategy. Eventually managing all that code will waste time and cost you a bundle. If your data volumes grow, hand-coded quickly becomes obsolete due to lack of scaling. Hand-coding gets high marks on speed to value, but falters in sustainability and long-term costs.</li>
<li>Open Source: Open source data management tools provide a quick way to get started, low overall costs and high sustainability. By just downloading and learning the tools, you’re on your way to getting data management done. The open source solutions may have some limitations on scalability, but most open source providers have low-cost commercial upgrades that meet these needs. In other words, it's easy to start today and leverage Hadoop and the Cloud if you need it later. Open source gets high marks on speed to value, sustainability and costs.</li>
<li>Traditional Data Management Vendors: Small data is a tough issue for the mega-vendors. Even for 50K-100K records, the license cost in both the short term and long term could be prohibitive. The mega-vendor solutions do tend to scale well, making them sustainable at a cost. However mergers in the data management business do happen. The sustainability of a product can be affected by these mergers. Commercial vendors get respectable marks in speed to value and sustainability, but falter in high up-front costs and maintenance fees.</li>
</ul>I've heard it a million times in this business - start small and fast with technology that gives you a fast success but also scales to future tasks. <br />
<ul></ul><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com2tag:blogger.com,1999:blog-6895175514429514812.post-87223106630732284102011-05-16T18:33:00.000-04:002011-05-16T18:33:00.606-04:00The Butterfly Effect and Data Quality<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivKO7wpliMKwF2nOBLVhyq7FzMD8eYIMeAL2hXWIP3YV7H7f6EqV3dvqt76XqJLIKSv8LW0JSwW1LC0PiZGtvHDn8bgAgRckWnCQIuYByr0p4hINfSD5IcFWpn0BBVuoQKKfPpG_iJzDY/s1600/Butterfly.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="178" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivKO7wpliMKwF2nOBLVhyq7FzMD8eYIMeAL2hXWIP3YV7H7f6EqV3dvqt76XqJLIKSv8LW0JSwW1LC0PiZGtvHDn8bgAgRckWnCQIuYByr0p4hINfSD5IcFWpn0BBVuoQKKfPpG_iJzDY/s200/Butterfly.jpg" width="200" /></a></div>I just wrote a paper called the <a href="http://www.talend.com/document-download.php?doc=butterfly&src=DataGovernanceBlog">‘Butterfly Effect’ of poor data quality</a> for Talend.<br />
<br />
The term butterfly effect refers to the way a minor event – like the movement of a butterfly’s wing – can have a major impact on a complex system – like the weather. The movement of the butterfly wing represents a small change in the initial condition of the system, but it starts a chain of events: moving pollen through the air, which causes a gazelle to sneeze, which triggers a stampede of gazelles, which raises a cloud of dust, which partially blocks the sun, which alters the atmospheric temperature, which ultimately alters the path of a tornado on the other side of the world.<br />
<br />
Enterprise data is equally susceptible to the butterfly effect. When poor quality data enters the complex system of enterprise data, even a small error – the transposed letters in a street address or part number – can lead to 1) revenue loss; 2) process inefficiency and; 3) failure to comply with industry and government regulations. Organizations depend on the movement and sharing of data throughout the organization, so the impact of data quality errors are costly and far reaching. Data issues often begin with a tiny mistake in one part of the organization, but the butterfly effect can produce far reaching results. <br />
<br />
<b>The Pervasiveness of Data</b><br />
When data enters the corporate ecosystem, it rarely stays in one place. Data is pervasive. As it moves throughout a corporation, data impacts systems and business processes. The negative impact of poor data quality reverberates as it crosses departments, business units and cross-functional systems.<br />
<ul><li><b>Customer Relationship Management (CRM)</b> - By standardizing customer data, you will be able to offer better, more personalized customer service. And you will be better able to contact your customers and prospects for cross-sell, up-sell, notification and services.</li>
<li><b>ERP / Supply Chain Data</b>- If you have clean data in your supply chain, you can achieve some tangible benefits. First, the company will have a clear picture about delivery times on orders because of a completely transparent supply chain. Next, you will avoid unnecessary warehouse costs by holding the right amount of inventory in stock. Finally, you will be able to see all the buying patterns and use that information when negotiating supply contracts.</li>
<li><b>Orders / Billing System </b>- If you have clean data in your billing systems, you can achieve the tangible benefits of more accurate financial reporting and correct invoices that reach the customer in a timely manner. An accurate bill not only leads to trust among workers in the billing department, but customer attrition rates will be lower if invoices are delivered accurately and on time.</li>
<li><b>Data Warehouse</b> - If you have standardized the data feeding into your data warehouse, you can dramatically improve business intelligence. Employees can access the data warehouse and be assured that the data they use for reports, analysis and decision making is accurate. Using the clean data in a warehouse can help you find trends, see relationships between data, and understand the competition in a new light.</li>
</ul>To read more about the butterfly effect of data quality, <a href="http://www.talend.com/document-download.php?doc=butterfly&src=DataGovernanceBlog">download it from the Talend site</a>.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com1tag:blogger.com,1999:blog-6895175514429514812.post-60959268091938521742011-05-09T14:38:00.000-04:002011-05-09T14:38:00.378-04:00MIT Information Quality SymposiumThis year I’m planning to attend the <a href="http://mitiq.mit.edu/iqis/2011/">MIT IQ symposium</a> again. I’m also one of the vice chairs of the event. The symposium is a July event in Boston that is a discussion and exchange of ideas about data quality between practitioners and academicians. <br />
<br />
I return to this conference and participate in the planning every year because I think it’s one of the most important data quality events. The people here really do change the course of information management. On these hot summer days in Boston, government, healthcare and general business professionals collaborate on the latest updates about data quality. This event has the potential to dramatically change the world – the people, organizations, and governments who manage data. I’ve grown to really enjoy the combination of ground-breaking presentations, high ranking government officials, sharp consultants and MIT hallway chat that you find here.<br />
<br />
If you have some travel budget, please consider joining me for this event.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-84457728129351647512011-04-29T10:47:00.006-04:002011-04-29T10:47:00.829-04:00Open Source and Data QualityMy latest video on the <a href="http://www.youtube.com/user/TalendChannel#p/u/3/jP7T2ga_rf8">Talend Channel</a> about data quality and open source.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><object width="320" height="266" class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://3.gvt0.com/vi/jP7T2ga_rf8/0.jpg"><param name="movie" value="http://www.youtube.com/v/jP7T2ga_rf8&fs=1&source=uds" /><param name="bgcolor" value="#FFFFFF" /><embed width="320" height="266" src="http://www.youtube.com/v/jP7T2ga_rf8&fs=1&source=uds" type="application/x-shockwave-flash"></embed></object></div><br />
This was filmed in the Paris office in January. I can get excited in any time zone when it comes to data quality.<div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0tag:blogger.com,1999:blog-6895175514429514812.post-72219200598555405242011-04-25T16:28:00.000-04:002011-04-25T16:28:31.487-04:00Data Quality Scorecard: Making Data Quality Relevant<!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:AllowPNG/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
</style> <![endif]--> <div class="MsoNormal">Most data governance practitioners agree that a data quality scorecard is an important tool in any data governance program. It provides comprehensive information about quality of data in a database, and perhaps even more importantly, allows business users and technical users to collaborate on the quality issue.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal">However, there are multiple levels of metrics that you should consider. There are:</div><table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: medium none;"><tbody>
<tr> <td style="border: 1pt solid windowtext; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>METRIC CLASSIFICATION</b></div></td> <td style="border-color: windowtext windowtext windowtext -moz-use-text-color; border-style: solid solid solid none; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 3.45in;" valign="top" width="331"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><b>EXAMPLES</b></div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 24pt;">1</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Metrics that the technologists use to fix data quality problems</div><div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><br />
</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 3.45in;" valign="top" width="331"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">7% of the e-mail attribute is blank. 12% of the e-mail attribute does not follow the standard e-mail syntax. 13% of our US mail addresses fail address validation.</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 24pt;">2</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Metrics business people use to make decisions about the data</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 3.45in;" valign="top" width="331"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">9% of my contacts have invalid e-mails. <span> </span>3% have both invalid e-mails and invalid addresses.</div></td> </tr>
<tr> <td style="border-color: -moz-use-text-color windowtext windowtext; border-style: none solid solid; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 0.45in;" valign="top" width="43"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;"><span style="font-size: 24pt;">3</span></div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2.75in;" valign="top" width="264"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">Metrics managers use to get a big picture</div></td> <td style="border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-style: none solid solid none; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 3.45in;" valign="top" width="331"> <div class="MsoNormal" style="line-height: normal; margin-bottom: 0.0001pt;">This customer data is good enough to use for a campaign.</div></td> </tr>
</tbody></table><div class="MsoNormal"><br />
</div><div class="MsoNormal">All levels are important for the various members of the data governance team.<span> </span>Level one shows the steps you need to take to fix the data.<span> </span>Level two shows context to the task at hand. Level three tells the uniformed about the business issue without having to dig into the details.</div><div class="MsoNormal"><br />
</div><div class="MsoNormal">So, when you’re building your DQ metrics, remember to roll-up the data into metrics into slightly higher formulations. You must design the scorecards to meet the needs of the interest of the different audiences, from technical through to business and up to executive. At the beginning of a data quality scorecard is information about data quality of individual data attributes. This is the default information that most profilers will deliver out of the box. As you aggregate scores, the high-level measures of the data quality become more meaningful. In the middle are various score sets allowing your company to analyze and summarize data quality from different perspectives. If you define the objective of a data quality assessment project as calculating these different aggregations, you will have much easier time maturing your data governance program. The business users and c-level will begin to pay attention.</div><div class="blogger-post-footer">Covering the world of data integration, data governance, and data quality from the perspective of an industry insider.</div>Steve Sarsfieldhttp://www.blogger.com/profile/12892788380306110697noreply@blogger.com0