Tag Archives: data profiling
First off, let me get one thing off my chest. If you don’t pay close attention to your data, throughout the application consolidation or migration process, you are almost guaranteed delays and budget overruns. Data consolidation and migration is at least 30%-40% of the application go-live effort. We have learned this by helping customers deliver over 1500 projects of this type. What’s worse, if you are not super meticulous about your data, you can be assured to encounter unhappy business stakeholders at the end of this treacherous journey. The users of your new application expect all their business-critical data to be there at the end of the road. All the bells and whistles in your new application will matter naught if the data falls apart. Imagine if you will, students’ transcripts gone missing, or your frequent-flyer balance a 100,000 miles short! Need I say more? Now, you may already be guessing where I am going with this. That’s right, we are talking about the myths and realities related to your data! Let’s explore a few of these.
Myth #1: All my data is there.
Reality #1: It may be there… But can you get it? if you want to find, access and move out all the data from your legacy systems, you must have a good set of connectivity tools to easily and automatically find, access and extract the data from your source systems. You don’t want to hand-code this for each source. Ouch!
Myth #2: I can just move my data from point A to point B.
Reality #2: You can try that approach if you want. However you might not be happy with the results. Reality is that there can be significant gaps and format mismatches between the data in your legacy system and the data required by your new application. Additionally you will likely need to assemble data from disparate systems. You need sophisticated tools to profile, assemble and transform your legacy data so that it is purpose-fit for your new application.
Myth #3: All my data is clean.
Reality #3: It’s not. And here is a tip: better profile, scrub and cleanse your data before you migrate it. You don’t want to put a shiny new application on top of questionable data . In other words let’s get a fresh start on the data in your new application!
Myth #4: All my data will move over as expected
Reality #4: It will not. Any time you move and transform large sets of data, there is room for logical or operational errors and surprises. The best way to avoid this is to automatically validate that your data has moved over as intended.
Myth #5: It’s a one-time effort.
Reality #5: ‘Load and explode’ is formula for disaster. Our proven methodology recommends you first prototype your migration path and identify a small subset of the data to move over. Then test it, tweak your model, try it again and gradually expand. More importantly, your application architecture should not be a one-time effort. It is work in progress and really an ongoing journey. Regardless of where you are on this journey, we recommend paying close attention to managing your application’s data foundation.
As you can see, there is a multitude of data issues that can plague an application consolidation or migration project and lead to its doom. These potential challenges are not always recognized and understood early on. This perception gap is a root-cause of project failure. This is why we are excited to host Philip Russom, of TDWI, in our upcoming webinar to discuss data management best practices and methodologies for application consolidation and migration. If you are undertaking any IT modernization or rationalization project, such as consolidating applications or migrating legacy applications to the cloud or to ‘on-prem’ application, such as SAP, this webinar is a must-see.
So what’s your reality going to be like? Will your project run like a dream or will it escalate into a scary nightmare? Here’s hoping for the former. And also hoping you can join us for this upcoming webinar to learn more:
Webinar with TDWI:
Successful Application Consolidation & Migration: Data Management Best Practices.
Date: Tuesday March 10, 10 am PT / 1 pm ET
Don’t miss out, Register Today!
1) Gartner report titled “Best Practices Mitigate Data Migration Risks and Challenges” published on December 9, 2014
2) Harvard Business Review: ‘Why your IT project may be riskier than you think’.
The Easy Button
Basic data profiling, while significantly automating the process, is still a manually intensive effort to perform the detailed analysis. For example, normally you have to go through and select the table or tables that you want to participate in the analysis; configure the profile; and then run the profile, etc. At Informatica, we are going beyond basic data profiling other ways. The first is what I jokingly refer to as the Staples™ easy button. A new feature built on our advanced profiling is called Enterprise Discovery. This feature allows you to point at a schema or schemas and selectively run column profiling, primary key profiling, foreign key profiling and data domain discovery. So with a few clicks of the mouse you can run all your profiling requirements in one step against hundreds or thousands of tables. (more…)
This blog discusses going beyond basic data profiling. But for those of you who don’t know what basic data profiling is, let me summarize quickly. Basic data profiling is what I call three-dimensional analysis. I discuss this in some depth in my book, “Three Dimensional Analysis Data Profiling Techniques.”
Basic profiling includes column profiling, table profiling, and cross table profiling. Column profiling is automated discovery of the true metadata of your data. This is the process of identifying the accurate data type and precision, minimum value, maximum value, the number of nulls, percent null and more on a column by column basis. Table profiling attempts to infer a primary key based upon the data. You can also infer functional dependencies within each table. Cross table profiling is about finding primary key / foreign key relationships between the tables, as well as overlap analysis. This technology has been around since the late 90s. It boggles my mind that many data related projects still do not perform basic profiling before they embark on the project. But that, as they say, is another story. (more…)
So goes the line in the 1999 Oliver Stone film, Any Given Sunday. In the film, Al Pacino plays Tony D’Amato, a “been there, done that” football coach who, faced with a new set of challenges, has to re-evaluate his tried and true assumptions about everything he had learned through his career. In an attempt to rally his troops, D’Amato delivers a wonderful stump speech challenging them to look for ways to move the ball forward, treating every inch of the field as something sacred and encouraging them to think differently about how to do so.
In the world of big data, getting access to data and making sense of it is often times a more important consideration than managing sheer volume itself. Companies that are successful in unlocking true value from big data open themselves up to a world of insight for better understanding of things like customer preferences, satisfaction and regional purchasing differences. Doing this obviously is often harder than it seems due to the variety of information itself, leading to standardization and duplication issues. Ownership is often an issue as well, with departmental lines being the most common constraint to sharing important data across the enterprise. (more…)
Following up from my previous post on 2011 reflections, it’s now time to take a look at the year ahead and consider what key trends will likely impact the world of data quality as we know it. As I mentioned in my previous post, we saw continued interest in data quality across all industries and I expect that trend to only continue to pick up steam in 2012. Here are three areas in particular that I foresee will rise to the surface: (more…)
With just a few days remaining in what has been an eventful year, I thought I’d take some time to reflect on the world of data quality as I’ve observed it over the past twelve months. While the idea of data quality improvement in general didn’t change much, the way that companies are viewing and approaching it most certainly have. Here are three areas that seemed to come up quite frequently:
Data governance awareness grew
In thinking about all the customer interactions that I was involved in throughout the year, it’s hard to come up with one where the topic of data governance didn’t surface. Whereas before, the topic of data governance only seemed to come up for companies with more mature data management organizations, now it seems everyone is looking to build a governance framework in conjunction with their data quality efforts. Furthermore, while previously the conversation was largely driven by IT, now it’s both IT and business stakeholders that are looking for answers to how data governance can help them drive better business outcomes. In increasingly competitive market conditions, we can only expect this trend to continue. Whether it’s focused on increasing revenue, driving out cost or managing risk and compliance, data quality with data governance is where companies of all sizes are turning to create and sustain a differentiated edge. Trends like big data will only make this need more acute. (more…)
Informatica supports Agile Data Integration for Agile BI with best practices that encourage good data governance, facilitate business-IT collaboration, promote reuse & flexibility through data virtualization, and enable rapid prototyping and test-driven development. Organizations that want to successfully adopt Agile Data Integration should standardize on the following best practices and leverage Informatica 9.1 to streamline the data integration process, improve data governance, and provide a flexible data virtualization architecture.
1. The business and IT work efficiently and effectively to translate requirements and specifications into data services (more…)
I recently had the opportunity to meet with the board of directors for a large distribution company here in the U.S. On the table for discussion were data quality and data governance, and how a focus on both could help the organization gain competitive advantage in the market. While I was happy to see that this company had tied data quality and data governance to help meet their corporate objectives, that’s not what caught my attention. Instead, what impressed me the most was how the data quality and data governance champion had effectively helped the rest of the board see that there WAS a direct link, and that with careful focus they could drive better business outcomes than they could without a focus on data at all. As it turns out, the path to success for the champion was to focus on articulating the link between trusted data — governed effectively — and the company’s ability to excel financially, manage costs, limit its risk exposure and maintain trust with its customers. (more…)