blogs.informatica.com
informatica.com my.informatica.com Developer Network Worldwide Sites
Informatica: The Data Integration Company

<March 2008  |  HomeHome  |  May 2008>

April 2008 Archives

April 08, 2008

Profile Early, Profile Often

Posted by Informatica in: Data Quality

Dr. Claudia Imhoff, President & Founder, Intelligent Solutions and Ed Lindsey, National Product Specialist, Informatica answer some questions that were raised during our recent web seminar. If you missed the web seminar, you can listen to it by clicking the following link:

An Eye-opener for Your Business: How Data Profiling Can Build Support for Data Quality within Data Management Projects


Q: Who in the organization should be responsible for data quality? And who should sign off on the scope document?


Claudia Imhoff, Intelligent Solutions:
1. There are several organizations responsible for data quality – data stewards, database administrators, and data administrators. I suggest that you look into creating a data quality program in which all these groups have representation. The program manager should report to the CFO or COO.

2. The scope document should be signed by the IT and business sponsors. If there are high level influencers (like a VP of Sales, CFO or COO) then they should sign as well.

Q: Last year my company identified that we needed a data governance organization and policy. The challenge is in identifying data stewards as rolls and responsibilities are changed through company reorganization, or through people changing roles. What is the best way to identify data stewards and manage the changes over time?

Claudia Imhoff, Intelligent Solutions: Data Stewards should come from the business. Generally they are the few people who demonstrate a true interest in the data and information you are generating. I suggest you appoint (either formally or informally) one person per subject area to be the main steward for that subject area. They may choose to have others in their committee but they remain responsible for any issues within their area.

Q: What tools/methods are available to come up with ‘effort estimates’ for source system analysis?

Claudia Imhoff, Intelligent Solutions: I suggest using data profiling to give you an idea of the effort it will take to do the source system analysis. The profiling results give you the right information to let you know what to expect from the SSA.

Q: Can I do Data Profiling on a table (DB2, Oracle, SQL Server)? Or does it have to be a flat file?

Ed Lindsey, Informatica: Yes, we profile data in a RDBMS system via ODBC or native connectivity for DB2 load format, Informix, Oracle, Sybase and UDB.

Q: You talk about doing analysis on a database; can your tool also do analysis on files outside of a database? e.g. Excel, text files, etc.

Ed Lindsey, Informatica: Yes, we profile DB2 load format, delimited or fixed length and XML format.


Q: Can data profiling be done on mainframes (IMS)?

Ed Lindsey, Informatica: Normally we extract the IMS data to a flat file or stage it in a RDBMS and profile it there.

Q: How well does Informatica's offering support injecting business rules from Data Quality into the ETL process if PowerCenter is used?

Ed Lindsey, Informatica: Power Center can embed a Data Quality plan in a mapping and run in on the Power Center Server. This way you can have a business user work on the business rules and have the same rules executed as a DQ Scorecard, DQ process and the ETL process.

Q: What are best ways to fix identified DQ issues? When would you choose one over the other?

Ed Lindsey, Informatica: The DQ process is different from customer to customer. The best place to fix a data quality problem is at the source. However, many customers will not allow modifications of the data at the source for fear of breaking the original system. Also, a lot of the data is not under the control of the department using the feeds because it comes from outside the company or business unit. Most of the time the data is corrected as it enters the business unit as part of an operational data store, data warehouse or enterprise application. Over time, as DQ issues are corrected downstream, the data customer gives their feedback to the provider and hopefully they initiate their own DQ process so that over time quality ultimately finds it way back to the source system.

Q: Would you suggest doing preliminary data profiling across data sets which collects the same subject data into many different data warehouses when you need to move towards an Enterprise Datawarehouse?

Ed Lindsey, Informatica: I would profile the data from any source system before it is loaded into the Data Warehouse. The closer to the source system the better. However, one difference here is you must make sure that other downstream systems are not injecting DQ issues into the process. And yes, we have customers that profile the data before and after the ETL process to make sure that that process has not injected issues as well. As I said during our recent web seminar: Profile early, Profile often.