You Want the Truth? You Can’t Handle the Truth!

So, I’m a complete sucker for the courtroom scene in the Rob Reiner film “A Few Good Men” for a number of reasons: it was written by Aaron Sorkin (loved West Wing and Sports Night), it’s a classic Jack Nicholson scene, it is one of Tom Cruise’s roles where he actually does some acting, and it’s a great 6-degrees of Kevin Bacon movie (I mean, it’s got people from Demi Moore to Cuba Gooding Jr. to Kiefer Sutherland, to Noah Wylie – think of the connections you could make with just those actors). I get sucked into this scene whenever I click by it on the television.

For those unfamiliar with the movie, Tom Cruise plays a lawyer tasked with defending two Marines on trial for murder and Jack Nicholson stands in the way of him winning the acquittal of his clients. In the culminating scene of the movie Tom Cruise goes on the attack against an agitated Colonel Nathan Jessup, played by Jack Nicholson.  It’s a powerful exchange:

Jack Nicholson: “You want answers?”

Tom Cruise: “I think I’m entitled.”

Jack Nicholson: “You want answers?”

Tom Cruise: “I want the truth!”

Jack Nicholson:




After at least 30-40 viewings, this scene still gives me the chills.

I caught this scene just the other day and it struck me that in the context of data warehousing, this may be a conversation overheard between business users and the data warehouse delivery team (albeit without all of the drama and murder implications). However, the ability to deliver accurate, timely and complete data – in essence, “the truth” – is pretty much the definition of a successful data warehouse. Unfortunately, most data warehouses fail to achieve this definition of success. Most data warehouse deployments today cost too much, take too long to deliver, don’t scale effectively, and end users don’t trust the data they get from it.

The hard-to-handle truth is that integrating data is inherently a dirty business – data is scattered across the organization in different formats, in different applications, many times it has no metadata to provide any context to what it is, and a lot of the time there is only partial data. The IT landscapes of today’s modern corporations can be so complex that asking simple questions often times brings back multiple (and conflicting) answers. You combine these factors with limited budgets, decreased staffing and still only 24 hours in the day, and it’s no wonder why these are challenging to organizations.

End users want nothing more than to have confidence in their decision making, and that starts with how much they actually trust the information they are using to base those decisions. If the data warehouse can’t deliver trusted data, then decision makers won’t use the warehouse as a source of information. All too often, people cut corners when building the warehouse: they don’t take the time to profile the data before it gets loaded into the data warehouse, they don’t discover all of the relevant data domains to begin with, and they fail to instantiate data cleansing logic to ensure the accuracy and completeness of the data. So, in essence, what happens to the data warehouse is:







To truly deliver value to the organization, data warehousing and data integration teams need to see data quality as an essential step in the successful deployment of any next generation data warehouse. Click here for an example of how HealthNow New York recently deployed a next generation data warehouse and delivered trusted data wherever and when ever it is needed. Sometimes it gets ugly, sometimes it takes a lot of time to discover, profile and cleanse data – but the time spent on those tasks up front will generate greater trust in the data, ensure higher utilization rates of your data warehouse and ultimately deliver the “truth” that decision makers desire.


This entry was posted in Data Integration. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>