Monthly Archives: May 2007

Customer Data Integration (CDI) – for a single view of your customer

Rick Sherman

How profitable are your customers? What have they purchased lately? Are there opportunities to up-sell and cross-sell to them? You and everyone else in your organization want to know everything possible about your customers. You want a single view of the customer that everyone across the enterprise can use.

There’s nothing new about this. Businesses have been trying to get a single view of their customers and prospects for years.

With the goal of a single view in mind, software vendors initially developed customer oriented applications such as call center, campaign management and sales force automation (SFA) applications. Later, these vendors attempted to merge all customer-centric applications under the customer relationship management (CRM) umbrella.

The big idea was that the CRM software would provide a single view of the customer. It didn’t quite turn out that way, however. Businesses bought these applications from various vendors but also custom-built their own applications, resulting in scattered customer data across many application silos. The result: many views of the customer. Read More »

IQ in Internet and e-Business Information

Larry English

“In e-Business, the Information IS the Business”
Having just completed writing a chapter on “IQ in the Internet and e-Business Environments” in my forthcoming book, Information Quality Applied: Best Practices for Improving Business Information, Processes and Systems (John Wiley & Sons), I wanted to share a few excerpts from this chapter. This is one of ten chapters focused on applying sound quality principles to the unique quality issues in various information value “circles” such as “Prospect to Satisfied Customer,” “Order to Cash” Supply chain, for example.
There are three categories of information in the Internet environment to which quality principles must be applied:
* Web-Based Documents and Web Content
* Data “Shared” by Internal Processes and Internet Processes
* Information Collected or Created in e-Commerce and e-Business value chains, including third party business partners
The major problem with IQ in the Internet is that business is conducted in “cyberspace” with no person “minding the store” or monitoring the e-Business transactions.
Here I will address some problems and improvements in the first category.
Read More »

Data lineage – where did that data come from?

Rick Sherman

The word “pedigree” brings up visions of race horses and show dogs. But it should also make you think about your data. Your data has a pedigree, or lineage, too. Just as with fine animals, it shows where it came from – and probably what can be expected of it in the future, too.

When you’re working with financial data you need to know where it came from – its source systems, what systems processed it, how it was manipulated, and how it was changed. If the CFO asks you to substantiate a certain number, you sure want to know where it came from!

A former colleague of mine experienced first-hand one of the pitfalls of not knowing the lineage of a client’s data. The project was to replace a spreadsheet-based (data shadow system) budgeting, planning and forecasting system with a performance management solution at a multi-billion dollar company. He spent 10 hours in a client meeting – one of those painful marathon meetings – where they were discussing the design of the new system and poring over the data in the client’s spreadsheets to understand how the data was transformed and manipulated.

After the meeting not only ran through lunch but also dinner, a senior manager from the client said “Hey, wait a minute. This isn’t even the right set of spreadsheets!” Everything that the dozen people at the meeting – half from the client’s finance staff and the other half highly-paid consultants – had just done over the past 10 hours was a total waste of time.

The sad truth is that my colleagues and I see many clients who honestly have no idea how their data got into their reports or Microsoft PowerPoint slides. They know what enterprise application the data originated in and can get their IT staff to document how the data was loaded in their data warehouse. But then gets fuzzy!

How many enterprises can then trace and document what happens to data as it progresses through several potential stops in data marts, cubes, Microsoft Access databases and Microsoft Excel spreadsheets? How do they know what transformations and manipulations happened to that data? When everyone is using their own spreadsheets (data shadow systems), no one knows which version is right.

I see too many enterprises that only master the first steps of the data journey – enterprise applications and data warehouses. After that it’s a “black box.” Explain that to your CFO, your stockholders or government regulators. Your enterprise should know where the information on sales, expenses, customers and employees, for example, has come from and how it’s been transformed.

Enterprise Data Management (EDM) is not an esoteric topic anymore. Data needs to be managed at the point when it enters an enterprise until it is consumed in a report or analysis. An enterprise needs to adopt a holistic enterprise-wide data-management program to enable data lineage and audits.

EDM is not just for a competitive advantage anymore, but rather a business and financial necessity.

Automating Data Remediation is the ONLY Way to Go

Rick Sherman

I teach a graduate course on data warehousing at Northeastern University in Boston. Unlike the people I teach at clients’ sites or at conferences such as TDWI, most of my students have not actually worked in IT yet, never mind had hands-on experience with data warehousing and business intelligence.

This means I often have to go back to the basics. If I mention something like data remediation or rework, I’m sure to be asked what it is, why it matters, what causes it and what it has to do with enterprise data management (EDM).

The “what it is” is the easy part: business needs accurate information and that often requires going back to rework and fix data to eliminate data-quality issues. Data needs be checked for completeness, conformity, consistency, duplicates, integrity, and accuracy. A recent survey of more than 1,000 middle managers of large companies in the United States and United Kingdom conducted by Accenture revealed that “Middle managers spend more than a quarter of their time searching for information necessary to their jobs, and when they do find it, it is often wrong.” (The emphasis is mine.)The “why it matters” is tied into all the reasons why data quality matters. For my grad school students, I’d compare it to Six Sigma, something they’ve likely encountered in their management classes. The earlier in a process that you find defects the easier and less disruptive it is to fix them. YOU want to be the one to find the problems, not your customer. With auto manufacturing, defects found in design or manufacturing can be handled internally. But defects found by the customer mean expensive, embarrassing recalls that end up on the evening news. And even worse for the food or pharmaceutical industry, defects can be deadly to their customers.

It’s the same with data – if no one internally identifies quality problems and they then are found by the customer, you find yourself faced with a fire drill to fix them. You’re scrambling, your CIO is livid, and your image is blown along with the reputation of your CPM, BI or DW program.

That’s expensive and embarrassing, but what about the bad data no one finds that ends up being used to make business decisions? That could be damaging to your business.

External pressures like government regulations can put a lot of pressure on finance departments to start data remediation projects. The business risks of not doing it include all the problems associated with poor data quality. But if they try to do it manually (which many do, surprisingly) they’re apt to miss a lot. It takes longer, requires more time, and introduces complexities. In the scheme of EDM, data remediation should always be automated (e.g., with an ETL product) to improve the accuracy and overall business operations.

Enterprise Data Management (EDM) is the proactive approach to ensuring data is transformed into consistent, accurate and timely business information. And if there are data issues, you can discover them with auditing and data lineage rather than scrambling through macros in spreadsheets. A patch-work of stopgap measures is costly and you are not likely to solve data quality issues in a reactive manner, i.e. data remediation.

Let’s face it; you can’t fix a problem that you can’t find. And, it’s worth noting, that with data quality it’s finding the problems is paramount. You might not have to fix everything you find, but you sure better find it!

Data quality isn’t so much of a problem when you know where your problems lie. The time has come for both business and IT to address these issues through an EDM program.

Seeing is Believing: Thoughts on Informatica World 2007

Don Tirsell

As mentioned in my previous post, I was at Informatica World 2007 in Orlando last week. I stand corrected, this year was the 9th official conference; my customer event in 1997 didn’t officially count. I’m sure next year will be an even bigger celebration of “Integration Everywhere”. The customer and partner interaction was quite refreshing with ample time to network with peers and explore possible solutions provided by 3rd parties. The night life was vibrant and active.

One of the most compelling sessions I attended was on Data Integration performance optimization and a new technique available to leverage both a database engine and the traditional “ETL Server”. Called a “Hybrid ETLT Approach”, Stephen Brobst, CTO of Teradata, opened with a review of scenarios where processing in the database makes the most sense and where a data integration engine is best used in the end to end data processing and delivery lifecycle. Stephen also showed how design really isn’t changed at all, you build the same transformations and at run-time, the location of processing can be optimized. That way, metadata is still preserved and all the speed and compliance benefits of visual design approach pertain.
Read More »