If you’ve spent some time studying and practicing data governance, you would agree that data governance is a challenging yet rewarding endeavor. Across industries, a growing number of organizations have put data governance programs in place so they can more effectively manage their data to drive the business value. But the reality is, data governance is a complex process, and most companies practicing data governance today are still at the early phase of this very long journey. In fact, according to the result from over 240 completed data governance assessments on http://governyourdata.com/, a community website dedicated to everything data governance, the average score for data governance maturity is only 1.6 out of 5. It’s no surprise that data governance was a hot topic at last week’s Informatica World 2015. Over a dozen presentations and panel discussions on data governance were delivered; practitioners across various industries shared their real-world stories on topics ranging from how to kick-start a data governance program, how to build business cases for data governance, frameworks and stewardship management, to the choice of technologies. For me, the key takeaways are:
- Old but still true – To do data governance the right way, you must start small and focus on achieving tangible results. Leverage the small victories to advance to the next phase.
- Be prepared to fail more than once while building a data governance program. But don’t quit, because your data will not.
- One-size doesn’t fit all when it comes to building a data governance framework, which is a challenge for organizations, as there is no magic formula that companies can immediately adopt. Should you build a centralized or federated data governance operation? Well, that really depends on what works within your existing environment.
In fact, when asked “what’s the most challenging area for your data governance effort” in our recent survey conducted at Informatica World 2015, “Identify roles and responsibilities” got the most mentions. Basic principle? – Choose a framework that blends well with your company‘s culture.
- Let’s face it, data governance is not an IT project, nor is it about fixing data problems. It is a business function that calls for people, process and technology working together to obtain the most value from your data. Our seasoned practitioners recommend a systematic approach: Your first priority should be people gathering – identifying the right people with the right skills and most importantly, those who have a passion for data; next is figuring out the process. Things to consider include: What’s the requirement for data quality? What metrics and measurements should be used for examining the data; how to handle exceptions and remediate data issues? How to quickly identify and apply security measures to the various data sets? Third priority is selecting the right technologies to implement and facilitate those processes to transform the data so it can be used to help meet business goals.
- “Engage your business early on” is another important tip from our customers who have achieved early success with their data governance program. A data governance program will not be sustainable without participation from the business. The reason is simple – the business owns the data, they are the consumers of the data and have specific requirements for the data they want to use. IT needs to work collaboratively with business to meet those requirements so the data is fit for use, and provides good value for the business.
- Scalability, flexibility and interoperability should be the key considerations when it comes to selecting data governance technologies. Your technology platform should be able to easily adapt to the new requirements arising from the changes in your data environment. A Big Data project, for example, introduces new data types, increased data speed and volume. Your data management solution should be agile enough to address those new challenges with minimum disruption to your workflow.
Data governance is HOT! The well-attended sessions at Informatica World, as well as some of our previously hosted webinars is testimony of the enthusiasm among our customers, partners, and our own employees on this topic. It’s an exciting time for us at Informatica because we are in a great position to help companies build an effective data governance program. In fact, many of our customers have been relying on our industry-leading data management tools to support their data governance program, and have achieved results in many business areas such as meeting compliance requirements, improving customer centricity and enabling advanced analytics projects. To continue the dialogue and facilitate further learning, I’d like to invite you to an upcoming webinar on May 28, to hear some insightful, pragmatic tips and tricks for building a holistic data governance program from industry expert David Loshin, Principal at Knowledge Integrity, Inc, and Informatica’s own data governance guru Rob Karel.
“Better data is everyone’s job” – well said by Terri Mikol, director of Data Governance at University of Pittsburgh Medical Center. For companies striving to leverage data to deliver business value, everyone within the company should treat data as a strategic asset and take on responsibilities for delivering clean, connected and safe data. Only then can your organization be considered truly “Data Ready”.
Let’s face it, building a Data Governance program is no overnight task. As one CDO puts it: ”data governance is a marathon, not a sprint”. Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative. Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.
Why bother then? Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company? Well, the drivers for data governance can vary for different organizations. Let’s take a close look at some of the motivations behind data governance program.
For companies in heavily regulated industries, establishing a formal data governance program is a mandate. When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors. Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.
A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business, you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.
Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.
Now that we have identified the drivers for data governance, how do we start? This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results. And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.
A rule of thumb? Start small, take one-step at a time and focus on producing something tangible.
Sounds easy, right? Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement a successful data governance practice that brings business impact to an enterprise organization.
If you are currently kicking the tires on setting up data governance practice in your organization, I’d like to invite you to visit a member-only website dedicated to Data Governance: http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics. I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark. More than 200 members have taken the assessment to gain better understanding of their current data governance program, so why not give it a shot?
Data Governance is a journey, likely a never-ending one. We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.
Time to Celebrate! Informatica is Once Again Positioned as a Leader in Gartner’s Magic Quadrant for Data Quality Tools!
It’s holiday season once again at Informatica and this one feels particularly special because we just received an early present from Gartner: Informatica has just been positioned as a leader in Gartner’s Magic Quadrant for Data Quality Tools report for 2014! Click here to download the full report.
And as it turns out, this is a gift that keeps on giving. For eight years in a row, Informatica has been ranked as a leader in Gartner’s Magic Quadrant for Data Quality Tools. In fact, for the past two years running, Informatica has been positioned highest and best for ability to execute and completeness of vision, the two dimensions Gartner measures in their report. These results once again validate our operational excellence as well as our prescience with our data quality products offerings. Yes folks, some days it’s hard to be humble.
Consistency and leadership are becoming hallmarks for Informatica in these and other analyst reports, and it’s hardly an accident. Those milestones are the result of our deep understanding of the market, continued innovation in product design, seamless execution on sales and marketing, and relentless dedication to customer success. Our customer loyalty has never been stronger with those essential elements in place. However, while celebrating our achievements, we are equally excited about the success our customers have achieved using our data quality products.
Managing and producing quality data is indispensable in today’s data-centric world. Gaining access to clean, trusted information should be one of a company’s most important tasks, and has previously been shown to be directly linked to growth and continued innovation.
We are truly living in a digital world – a world revolving around the Internet, gadgets and apps – all of which generate data, and lots of it. Should your organization take advantage of its increasing masses of data? You bet. But remember: only clean, trusted data has real value. Informatica’s mission is to help you excel by turning your data into valuable information assets that you can put to good use.
To see for yourself what the industry leading data quality tool can do, click here.
And from all of our team at Informatica, Happy holidays to you and yours.
In my last blog I promised I would report back my experience on using Informatica Data Quality, a software tool that helps automate the hectic, tedious data plumbing task, a task that routinely consumes more than 80% of the analyst time. Today, I am happy to share what I’ve learned in the past couple of months.
But first, let me confess something. The reason it took me so long to get here was that I was dreaded by trying the software. Never a savvy computer programmer, I was convinced that I would not be technical enough to master the tool and it would turn into a lengthy learning experience. The mental barrier dragged me down for a couple of months and I finally bit the bullet and got my hands on the software. I am happy to report that my fear was truly unnecessary – It took me one half day to get a good handle on most features in the Analyst Tool, a component of the Data Quality designed for analyst and business users, then I spent 3 days trying to figure out how to maneuver the Developer Tool, another key piece of the Data Quality offering mostly used by – you guessed it, developers and technical users. I have to admit that I am no master of the Developer Tool after 3 days of wrestling with it, but, I got the basics and more importantly, my hands-on interaction with the entire software helped me understand the logic behind the overall design, and see for myself how analyst and business user can easily collaborate with their IT counterpart within our Data Quality environment.
To break it all down, first comes to Profiling. As analyst we understand too well the importance of profiling as it provides an anatomy of the raw data we collected. In many cases, it is a must have first step in data preparation (especially when our raw data came from different places and can also carry different formats). A heavy user of Excel, I used to rely on all the tricks available in the spreadsheet to gain visibility of my data. I would filter, sort, build pivot table, make charts to learn what’s in my raw data. Depending on how many columns in my data set, it could take hours, sometimes days just to figure out whether the data I received was any good at all, and how good it was.
Switching to the Analyst Tool in Data Quality, learning my raw data becomes a task of a few clicks – maximum 6 if I am picky about how I want it to be done. Basically I load my data, click on a couple of options, and let the software do the rest. A few seconds later I am able to visualize the statistics of the data fields I choose to examine, I can also measure the quality of the raw data by using Scorecard feature in the software. No more fiddling with spreadsheet and staring at busy rows and columns. Take a look at the above screenshots and let me know your preference?
Once I decide that my raw data is adequate enough to use after the profiling, I still need to clean up the nonsense in it before performing any analysis work, otherwise bad things can happen — we call it garbage in garbage out. Again, to clean and standardize my data, Excel came to rescue in the past. I would play with different functions and learn new ones, write macro or simply do it by hand. It was tedious but worked if I worked on static data set. Problem however, was when I needed to incorporate new data sources in a different format, many of the previously built formula would break loose and become inapplicable. I would have to start all over again. Spreadsheet tricks simply don’t scale in those situation.
With Data Quality Analyst Tool, I can use the Rule Builder to create a set of logical rules in hierarchical manner based on my objectives, and test those rules to see the immediate results. The nice thing is, those rules are not subject to data format, location, or size, so I can reuse them when the new data comes in. Profiling can be done at any time so I can re-examine my data after applying the rules, as many times as I like. Once I am satisfied with the rules, they will be passed on to my peers in IT so they can create executable rules based on the logic I create and run them automatically in production. No more worrying about the difference in format, volume or other discrepancies in the data sets, all the complexity is taken care of by the software, and all I need to do is to build meaningful rules to transform the data to the appropriate condition so I can have good quality data to work with for my analysis. Best part? I can do all of the above without hassling my IT – feeling empowered is awesome!
Use the right tool for the right job will improve our results, save us time, and make our jobs much more enjoyable. For me, no more Excel for data cleansing after trying our Data Quality software, because now I can get a more done in less time, and I am no longer stressed out by the lengthy process.
I encourage my analyst friends to try Informatica Data Quality, or at least the Analyst Tool in it. If you are like me, feeling weary about the steep learning curve then fear no more. Besides, if Data Quality can cut down your data cleansing time by half (mind you our customers have reported higher numbers), how many more predictive models you can build, how much you will learn, and how much faster you can build your reports in Tableau, with more confidence?
In my last blog, I talked about the dreadful experience of cleaning raw data by hand as a former analyst a few years back. Well, the truth is, I was not alone. At a recent data mining Meetup event in San Francisco bay area, I asked a few analysts: “How much time do you spend on cleaning your data at work?” “More than 80% of my time” and “most my days” said the analysts, and “they are not fun”.
But check this out: There are over a dozen Meetup groups focused on data science and data mining here in the bay area I live. Those groups put on events multiple times a month, with topics often around hot, emerging technologies such as machine learning, graph analysis, real-time analytics, new algorithm on analyzing social media data, and of course, anything Big Data. Cools BI tools, new programming models and algorithms for better analysis are a big draw to data practitioners these days.
That got me thinking… if what analysts said to me is true, i.e., they spent 80% of their time on data prepping and 1/4 of that time analyzing the data and visualizing the results, which BTW, “is actually fun”, quoting a data analyst, then why are they drawn to the events focused on discussing the tools that can only help them 20% of the time? Why wouldn’t they want to explore technologies that can help address the dreadful 80% of the data scrubbing task they complain about?
Having been there myself, I thought perhaps a little self-reflection would help answer the question.
As a student of math, I love data and am fascinated about good stories I can discover from them. My two-year math program in graduate school was primarily focused on learning how to build fabulous math models to simulate the real events, and use those formula to predict the future, or look for meaningful patterns.
I used BI and statistical analysis tools while at school, and continued to use them at work after I graduated. Those software were great in that they helped me get to the results and see what’s in my data, and I can develop conclusions and make recommendations based on those insights for my clients. Without BI and visualization tools, I would not have delivered any results.
That was fun and glamorous part of my job as an analyst, but when I was not creating nice charts and presentations to tell the stories in my data, I was spending time, great amount of time, sometimes up to the wee hours cleaning and verifying my data, I was convinced that was part of my job and I just had to suck it up.
It was only a few months ago that I stumbled upon data quality software – it happened when I joined Informatica. At first I thought they were talking to the wrong person when they started pitching me data quality solutions.
Turns out, the concept of data quality automation is a highly relevant and extremely intuitive subject to me, and for anyone who is dealing with data on the regular basis. Data quality software offers an automated process for data cleansing and is much faster and delivers more accurate results than manual process. To put that in math context, if a data quality tool can reduce the data cleansing effort from 80% to 40% (btw, this is hardly a random number, some of our customers have reported much better results), that means analysts can now free up 40% of their time from scrubbing data, and use that times to do the things they like – playing with data in BI tools, building new models or running more scenarios, producing different views of the data and discovering things they may not be able to before, and do all of that with clean, trusted data. No more bored to death experience, what they are left with are improved productivity, more accurate and consistent results, compelling stories about data, and most important, they can focus on doing the things they like! Not too shabby right?
I am excited about trying out the data quality tools we have here at Informtica, my fellow analysts, you should start looking into them also. And I will check back in soon with more stories to share..