Tag Archives: data profiling
The Easy Button
Basic data profiling, while significantly automating the process, is still a manually intensive effort to perform the detailed analysis. For example, normally you have to go through and select the table or tables that you want to participate in the analysis; configure the profile; and then run the profile, etc. At Informatica, we are going beyond basic data profiling other ways. The first is what I jokingly refer to as the Staples™ easy button. A new feature built on our advanced profiling is called Enterprise Discovery. This feature allows you to point at a schema or schemas and selectively run column profiling, primary key profiling, foreign key profiling and data domain discovery. So with a few clicks of the mouse you can run all your profiling requirements in one step against hundreds or thousands of tables. (more…)
This blog discusses going beyond basic data profiling. But for those of you who don’t know what basic data profiling is, let me summarize quickly. Basic data profiling is what I call three-dimensional analysis. I discuss this in some depth in my book, “Three Dimensional Analysis Data Profiling Techniques.”
Basic profiling includes column profiling, table profiling, and cross table profiling. Column profiling is automated discovery of the true metadata of your data. This is the process of identifying the accurate data type and precision, minimum value, maximum value, the number of nulls, percent null and more on a column by column basis. Table profiling attempts to infer a primary key based upon the data. You can also infer functional dependencies within each table. Cross table profiling is about finding primary key / foreign key relationships between the tables, as well as overlap analysis. This technology has been around since the late 90s. It boggles my mind that many data related projects still do not perform basic profiling before they embark on the project. But that, as they say, is another story. (more…)
So goes the line in the 1999 Oliver Stone film, Any Given Sunday. In the film, Al Pacino plays Tony D’Amato, a “been there, done that” football coach who, faced with a new set of challenges, has to re-evaluate his tried and true assumptions about everything he had learned through his career. In an attempt to rally his troops, D’Amato delivers a wonderful stump speech challenging them to look for ways to move the ball forward, treating every inch of the field as something sacred and encouraging them to think differently about how to do so.
In the world of big data, getting access to data and making sense of it is often times a more important consideration than managing sheer volume itself. Companies that are successful in unlocking true value from big data open themselves up to a world of insight for better understanding of things like customer preferences, satisfaction and regional purchasing differences. Doing this obviously is often harder than it seems due to the variety of information itself, leading to standardization and duplication issues. Ownership is often an issue as well, with departmental lines being the most common constraint to sharing important data across the enterprise. (more…)
Following up from my previous post on 2011 reflections, it’s now time to take a look at the year ahead and consider what key trends will likely impact the world of data quality as we know it. As I mentioned in my previous post, we saw continued interest in data quality across all industries and I expect that trend to only continue to pick up steam in 2012. Here are three areas in particular that I foresee will rise to the surface: (more…)
With just a few days remaining in what has been an eventful year, I thought I’d take some time to reflect on the world of data quality as I’ve observed it over the past twelve months. While the idea of data quality improvement in general didn’t change much, the way that companies are viewing and approaching it most certainly have. Here are three areas that seemed to come up quite frequently:
Data governance awareness grew
In thinking about all the customer interactions that I was involved in throughout the year, it’s hard to come up with one where the topic of data governance didn’t surface. Whereas before, the topic of data governance only seemed to come up for companies with more mature data management organizations, now it seems everyone is looking to build a governance framework in conjunction with their data quality efforts. Furthermore, while previously the conversation was largely driven by IT, now it’s both IT and business stakeholders that are looking for answers to how data governance can help them drive better business outcomes. In increasingly competitive market conditions, we can only expect this trend to continue. Whether it’s focused on increasing revenue, driving out cost or managing risk and compliance, data quality with data governance is where companies of all sizes are turning to create and sustain a differentiated edge. Trends like big data will only make this need more acute. (more…)
Informatica supports Agile Data Integration for Agile BI with best practices that encourage good data governance, facilitate business-IT collaboration, promote reuse & flexibility through data virtualization, and enable rapid prototyping and test-driven development. Organizations that want to successfully adopt Agile Data Integration should standardize on the following best practices and leverage Informatica 9.1 to streamline the data integration process, improve data governance, and provide a flexible data virtualization architecture.
1. The business and IT work efficiently and effectively to translate requirements and specifications into data services (more…)
I recently had the opportunity to meet with the board of directors for a large distribution company here in the U.S. On the table for discussion were data quality and data governance, and how a focus on both could help the organization gain competitive advantage in the market. While I was happy to see that this company had tied data quality and data governance to help meet their corporate objectives, that’s not what caught my attention. Instead, what impressed me the most was how the data quality and data governance champion had effectively helped the rest of the board see that there WAS a direct link, and that with careful focus they could drive better business outcomes than they could without a focus on data at all. As it turns out, the path to success for the champion was to focus on articulating the link between trusted data — governed effectively — and the company’s ability to excel financially, manage costs, limit its risk exposure and maintain trust with its customers. (more…)
Gartner recently released their 2011 Magic Quadrant for Data Quality Tools and I’m happy to announce that Informatica is positioned in the Leaders’ quadrant. We believe our position is a testament to the fact that customers like Station Casinos and U.S. Xpress continue to turn to Informatica to solve their most critical data quality challenges.
The publishing of the Magic Quadrant is often a great opportunity to reflect on the state of the data quality market. It should come as no surprise that data quality as a business imperative isn’t going away any time soon. We are continuing to see customers looking for help and expertise in solving a wide range of data quality problems, largely associated with data governance initiatives, master data management (MDM), business intelligence and application modernization. And the association of data quality in these areas is only getting stronger. (more…)