Tag Archives: Data Governance
Is this how you think about IT? Or do you think of IT in terms of the technology it deploys instead? I recently was interviewing a CIO at Fortune 50 Company about the changing role of CIOs. When I asked him about which technology issues were most important, this CIO said something that surprised me.
He said, “IT is all about data. Think about it. What we do in IT is all about the intake of data, the processing of data, the store of data, and the analyzing of data. And we need, from data, to increasingly provide the intelligence to make better decisions”.
How many view the function of the IT organization with such clarity?
This was the question that I had after hearing this CIO. And how many IT organizations view IT as really a data system? It must not be very many. Jeanne Ross from MIT CISR contends in her book that company data “one of its most important assets, is patchy, error-prone, and not up-to-date” (Enterprise Architecture as Strategy, Jeanne Ross, page 7). Jeanne contends as well that companies having a data centric view “have higher profitability, experience faster time to market, and get more value from their IT investments” (Enterprise Architecture as Strategy, Jeanne Ross, page 2).
What then do you need to do to get your data house in order?
What then should IT organizations do to move their data from something that is “patchy, error-prone, and not up-to-date” to something that is trustworthy and timely? I would contend our CIO friend had it right. We need to manage all four elements of our data process better.
1. Input Data Correctly
You need start by making sure that the data you produced is done so consistently and correctly. I liken the need here to a problem that I had with my electronic bill pay a few years ago. My bank when it changed bill payment service providers started sending my payments to a set of out of date payee addresses. This caused me to receive late fees and for my credit score to actually go down. The same kind of thing can happen to a business when there are duplicate customers or customer addresses are entered incorrectly. So much of marketing today is about increasing customer intimacy. It is hard to improve customer intimacy when you bug the same customer too much or never connect with a customer because you had a bad address for them.
2. Process Data to Produce Meaningful Results
You need to collect and manipulate data to derive meaningful information. This is largely about processing data so it results produce meaningful analysis. To do this well, you need to take out the data quality issues from the data that is produced. We want, in this step, to make data is “trustworthy” to business users.
With this, data can be consolidated into a single view of customer, financial account, etc. A CFO explained the importance of this step by saying the following:
“We often have redundancies in each system and within the chart of accounts the names and numbers can differ from system to system. And as you establish a bigger and bigger set of systems, you need to, in accounting parlance, to roll-up the charter of accounts”.
Once data is consistently put together, then you need to consolidate it so that it can be used by business users. This means that aggregates need to be created for business analysis. These should support dimensional analysis so that business users can truly answer why something happened. For finance organizations, timely aggregated data with supporting dimensional analysis enables them to establish themselves as “a business person versus a bean counting historically oriented CPA”. Having this data answers questions like the following:
- Why are sales not being achieved? Which regions or products are failing to be delivered?
- Or why is the projected income statement not in conformance with plan? Which expense categories should be we cut in order to ensure the income statement is in line with business expectation?
3. Store Data Where it is Most Appropriate
Data storage needs to be able to occur today in many ways. It can be in applications, a data warehouse, or even, a Hadoop cluster. You need here to have an overriding data architecture that considers the entire lifecycle of data. A key element of doing this well involves archiving data as it becomes inactive and protecting data across its entire lifecycle. The former can involve as well the disposing of information. And the latter requires the ability to audit, block, and dynamically mask sensitive production data to prevent unauthorized access.
4. Enable analysis including the discovery, testing, and putting of data together
Analysis today is not just about the analysis tools. It is about enabling users to discover, test, and put data together. CFOs that we have talked to say they want analysis to expose earlier potential business problems. They want, for example, to know about metrics like average selling price and gross margins by account or by product. They want as well to see when they have seasonality affects.
Increasingly, CFOs need to use this information to help predict what the future of their business will look like. CFOs say that they want at the same time to help their businesses make better decisions from data. Limiting them today from doing this are disparate systems that cannot talk to each other. CFOs complain about their enterprise hodgepodge of systems that do not talk to one another. And yet to report, CFOs need to traverse between front office to back office systems.
One CIO said to us that the end of any analysis layer should be the ability to trust data and make dependable business decisions. And once dependable data exists, business users say that they want “access to data when they need it. They want to get data when and where you need it”. One CIO likened what is need here to orchestration when he said:
“Users want to be able to self-service. They want to be able to assembly data and put it together and do it from different sources at different times. I want them to be able to have no preconceived process. I want them to be able discover data across all sources”.
So as we said at the beginning of this post, IT is all about the data. And with mobile systems of engagement, IT’s customers are wanting increasingly their data at their fingertips. This means that business users need to be able to trust that the data they use for business analysis is timely and accurate. This demands that IT organizations get better at managing their core function—data.
Competing on Analytics
The CFO Viewpoint of Data
Is Big Data Destined To Become Small And Vertical?
Big Data Why?
The Business Case for Better Data Connectivity
What is big data and why should your business care?
A few months ago, while addressing a room full of IT and business professional at an Information Governance conference, a CFO said – “… if we designed our systems today from scratch, they will look nothing like the environment we own.” He went on to elaborate that they arrived there by layering thousands of good and valid decisions on top of one another.
Similarly, Information Governance has also evolved out of the good work that was done by those who preceded us. These items evolve into something that only a few can envision today. Along the way, technology evolved and changed the way we interact with data to manage our daily tasks. What started as good engineering practices for mainframes gave way to data management.
Then, with technological advances, we encountered new problems, introduced new tasks and disciplines, and created Information Governance in the process. We were standing on the shoulders of data management, armed with new solutions to new problems. Now we face the four Vs of big data and each of those new data system characteristics have introduced a new set of challenges driving the need for Big Data Information Governance as a response to changing velocity, volume, veracity, and variety.
Before I answer this question, I must ask you “How comprehensive is the framework you are using today and how well does it scale to address the new challenges?”
While there are several frameworks out in the marketplace to choose from. In this blog, I will tell you what questions you need to ask yourself before replacing your old framework with a new one:
Q. Is it nimble?
The focus of data governance practices must allow for nimble responses to changes in technology, customer needs, and internal processes. The organization must be able to respond to emergent technology.
Q. Will it enable you to apply policies and regulations to data brought into the organization by a person or process?
- Public company: Meet the obligation to protect the investment of the shareholders and manage risk while creating value.
- Private company: Meet privacy laws even if financial regulations are not applicable.
- Fulfill the obligations of external regulations from international, national, regional, and local governments.
Q. How does it Manage quality?
For big data, the data must be fit for purpose; context might need to be hypothesized for evaluation. Quality does not imply cleansing activities, which might mask the results.
Q. Does it understanding your complete business and information flow?
Attribution and lineage are very important in big data. Knowing what is the source and what is the destination is crucial in validating analytics results as fit for purpose.
Q. How does it understanding the language that you use, and can the framework manage it actively to reduce ambiguity, redundancy, and inconsistency?
Big data might not have a logical data model, so any structured data should be mapped to the enterprise model. Big data still has context and thus modeling becomes increasingly important to creating knowledge and understanding. The definitions evolve over time and the enterprise must plan to manage the shifting meaning.
Q. Does it manage classification?
It is critical for the business/steward to classify the overall source and the contents within as soon as it is brought in by its owner to support of information lifecycle management, access control, and regulatory compliance.
Q. How does it protect data quality and access?
Your information protection must not be compromised for the sake of expediency, convenience, or deadlines. Protect not just what you bring in, but what you join/link it to, and what you derive. Your customers will fault you for failing to protect them from malicious links. The enterprise must formulate the strategy to deal with more data, longer retention periods, more data subject to experimentation, and less process around it, all while trying to derive more value over longer periods.
Q. Does it foster stewardship?
Ensuring the appropriate use and reuse of data requires the action of an employee. E.g., this role cannot be automated, and it requires the active involvement of a member of the business organization to serve as the steward over the data element or source.
Q. Does it manage long-term requirements?
Policies and standards are the mechanism by which management communicates their long-range business requirements. They are essential to an effective governance program.
Q. How does it manage feedback?
As a companion to policies and standards, an escalation and exception process enables communication throughout the organization when policies and standards conflict with new business requirements. It forms the core process to drive improvements to the policy and standard documents.
Q. Does it Foster innovation?
Governance must not squelch innovation. Governance can and should make accommodations for new ideas and growth. This is managed through management of the infrastructure environments as part of the architecture.
Q. How does it control third-party content?
Third-party data plays an expanding role in big data. There are three types and governance controls must be adequate for the circumstances. They must consider applicable regulations for the operating geographic regions; therefore, you must understand and manage those obligations.
Do We Really Need Another Information Framework?
The EIM Consortium is a group of nine companies that formed this year with the mission to:
“Promote the adoption of Enterprise Information Management as a business function by establishing an open industry reference architecture in order to protect and optimize the business value derived from data assets.”
That sounds nice, but we do really need another framework for EIM or Data Governance? Yes we do, and here’s why. (more…)
Recently, I had the opportunity to talk to a number of CFOs about their technology priorities. These discussions represent an opportunity for CIOs to hear what their most critical stakeholder considers important. The CFOs did not hesitate or need to think much about this question. They said three things make their priority list. They are better financial system reliability, better application integration, and better data security and governance. The top two match well with a recent KPMG study which found the biggest improvement finance executives want to see—cited by 91% of survey respondents—is in the quality of financial and performance insight obtained from the data they produce, followed closely by the finance and accounting organization’s ability to proactively analyze that information before it is stale or out of date”
CFOs want to know that their systems work and are reliable. They want the data collected from their systems to be analyzed in a timely fashion. Importantly, CFOs say they are worried not only about the timeliness of accounting and financial data. This is because they increasingly need to manage upward with information. For this reason, they want timely, accurate information produced for financial and business decision makers. Their goal is to drive out better enterprise decision making.
In manufacturing, for example, CFOs say they want data to span from the manufacturing systems to the distribution system. They want to be able to push a button and get a report. These CFOs complain today about the need to manually massage and integrate data from system after system before they get what they and their business decision makers want and need.
CFOs really feel the pain of systems not talking to each other. CFOs know firsthand that they have “disparate systems” and that too much manual integration is going on. For them, they see firsthand the difficulties in connecting data from the frontend to backend systems. They personally feel the large number of manual steps required to pull data. They want their consolidation of account information to be less manual and to be more timely. One CFO said that “he wants the integration of the right systems to provide the right information to be done so they have the right information to manage and make decisions at the right time”.
Data Security and Governance
CFOs, at the same time, say they have become more worried about data security and governance. Even though CFOs believe that security is the job of the CIO and their CISO, they have an important role to play in data governance. CFOs say they are really worried about getting hacked. One CFO told me that he needs to know that systems are always working properly. Security of data matters today to CFOs for two reasons. First, data has a clear material impact. Just take a look at the out of pocket and revenue losses coming from the breach at Target. Second, CFOs, which were already being audited for technology and system compliance, feel that their audit firms will be obligated to extend what they were doing in security and governance and go as a part of regular compliance audits. One CFO put it this way. “This is a whole new direction for us. Target scared a lot of folks and will be to many respects a watershed event for CFOs”.
So the message here is that CFOs prioritize three technology objectives for their CIOs– better IT reliability, better application integration, and improved data security and governance. Each of these represents an opportunity to make the CFOs life easier but more important to enable them to take on a more strategic role. The CFOs, that we talked to, want to become one of the top three decision makers in the enterprise. Fixing these things for CFOs will enable CIOs to build a closer CFO and business relationships.
Solution Brief: The Intelligent Data Platform
Solution Brief: Secure at Source
In my last blog I promised I would report back my experience on using Informatica Data Quality, a software tool that helps automate the hectic, tedious data plumbing task, a task that routinely consumes more than 80% of the analyst time. Today, I am happy to share what I’ve learned in the past couple of months.
But first, let me confess something. The reason it took me so long to get here was that I was dreaded by trying the software. Never a savvy computer programmer, I was convinced that I would not be technical enough to master the tool and it would turn into a lengthy learning experience. The mental barrier dragged me down for a couple of months and I finally bit the bullet and got my hands on the software. I am happy to report that my fear was truly unnecessary – It took me one half day to get a good handle on most features in the Analyst Tool, a component of the Data Quality designed for analyst and business users, then I spent 3 days trying to figure out how to maneuver the Developer Tool, another key piece of the Data Quality offering mostly used by – you guessed it, developers and technical users. I have to admit that I am no master of the Developer Tool after 3 days of wrestling with it, but, I got the basics and more importantly, my hands-on interaction with the entire software helped me understand the logic behind the overall design, and see for myself how analyst and business user can easily collaborate with their IT counterpart within our Data Quality environment.
To break it all down, first comes to Profiling. As analyst we understand too well the importance of profiling as it provides an anatomy of the raw data we collected. In many cases, it is a must have first step in data preparation (especially when our raw data came from different places and can also carry different formats). A heavy user of Excel, I used to rely on all the tricks available in the spreadsheet to gain visibility of my data. I would filter, sort, build pivot table, make charts to learn what’s in my raw data. Depending on how many columns in my data set, it could take hours, sometimes days just to figure out whether the data I received was any good at all, and how good it was.
Switching to the Analyst Tool in Data Quality, learning my raw data becomes a task of a few clicks – maximum 6 if I am picky about how I want it to be done. Basically I load my data, click on a couple of options, and let the software do the rest. A few seconds later I am able to visualize the statistics of the data fields I choose to examine, I can also measure the quality of the raw data by using Scorecard feature in the software. No more fiddling with spreadsheet and staring at busy rows and columns. Take a look at the above screenshots and let me know your preference?
Once I decide that my raw data is adequate enough to use after the profiling, I still need to clean up the nonsense in it before performing any analysis work, otherwise bad things can happen — we call it garbage in garbage out. Again, to clean and standardize my data, Excel came to rescue in the past. I would play with different functions and learn new ones, write macro or simply do it by hand. It was tedious but worked if I worked on static data set. Problem however, was when I needed to incorporate new data sources in a different format, many of the previously built formula would break loose and become inapplicable. I would have to start all over again. Spreadsheet tricks simply don’t scale in those situation.
With Data Quality Analyst Tool, I can use the Rule Builder to create a set of logical rules in hierarchical manner based on my objectives, and test those rules to see the immediate results. The nice thing is, those rules are not subject to data format, location, or size, so I can reuse them when the new data comes in. Profiling can be done at any time so I can re-examine my data after applying the rules, as many times as I like. Once I am satisfied with the rules, they will be passed on to my peers in IT so they can create executable rules based on the logic I create and run them automatically in production. No more worrying about the difference in format, volume or other discrepancies in the data sets, all the complexity is taken care of by the software, and all I need to do is to build meaningful rules to transform the data to the appropriate condition so I can have good quality data to work with for my analysis. Best part? I can do all of the above without hassling my IT – feeling empowered is awesome!
Use the right tool for the right job will improve our results, save us time, and make our jobs much more enjoyable. For me, no more Excel for data cleansing after trying our Data Quality software, because now I can get a more done in less time, and I am no longer stressed out by the lengthy process.
I encourage my analyst friends to try Informatica Data Quality, or at least the Analyst Tool in it. If you are like me, feeling weary about the steep learning curve then fear no more. Besides, if Data Quality can cut down your data cleansing time by half (mind you our customers have reported higher numbers), how many more predictive models you can build, how much you will learn, and how much faster you can build your reports in Tableau, with more confidence?
That second question is a killer because most people — no matter if they’re in marketing, sales or manufacturing — rely on incomplete, inaccurate or just plain wrong information. Regardless of industry, we’ve been fixated on historic transactions because that’s what our systems are designed to provide us.
“Moneyball: The Art of Winning an Unfair Game” gives a great example of what I mean. The book (not the movie) describes Billy Beane hiring MBAs to map out the factors that would win a baseball game. They discovered something completely unexpected: That getting more batters on base would tire out pitchers. It didn’t matter if batters had multi-base hits, and it didn’t even matter if they walked. What mattered was forcing pitchers to throw ball after ball as they faced an unrelenting string of batters. Beane stopped looking at RBIs, ERAs and even home runs, and started hiring batters who consistently reached first base. To me, the book illustrates that the most useful knowledge isn’t always what we’ve been programmed to depend on or what is delivered to us via one app or another.
For years, people across industries have turned to ERP, CRM and web analytics systems to forecast sales and acquire new customers. By their nature, such systems are transactional, forcing us to rely on history as the best predictor of the future. Sure it might be helpful for retailers to identify last year’s biggest customers, but that doesn’t tell them whose blogs, posts or Tweets influenced additional sales. Isn’t it time for all businesses, regardless of industry, to adopt a different point of view — one that we at Informatica call “Data-First”? Instead of relying solely on transactions, a data-first POV shines a light on interactions. It’s like having a high knowledge IQ about relationships and connections that matter.
A data-first POV changes everything. With it, companies can unleash the killer app, the killer sales organization and the killer marketing campaign. Imagine, for example, if a sales person meeting a new customer knew that person’s concerns, interests and business connections ahead of time? Couldn’t that knowledge — gleaned from Tweets, blogs, LinkedIn connections, online posts and transactional data — provide a window into the problems the prospect wants to solve?
That’s the premise of two startups I know about, and it illustrates how a data-first POV can fuel innovation for developers and their customers. Today, we’re awash in data-fueled things that are somehow attached to the Internet. Our cars, phones, thermostats and even our wristbands are generating and gleaning data in new and exciting ways. That’s knowledge begging to be put to good use. The winners will be the ones who figure out that knowledge truly is power, and wield that power to their advantage.
But it’s not as easy as a couple of queries. The reality is that the body of knowledge in question is seldom in a shape recognizable as a ‘body’. In most corporations, the data regulators are asking for is distributed throughout the organization. Perhaps a ‘Scattering of Knowledge’ is a more appropriate metaphor.
It is time to accept that data distribution is here to stay. The idea of a single ERP has long gone. Hype around Big Data is dying down, and being replaced by a focus on all data as a valuable asset. IT architectures are becoming more complex as additional data storage and data fueled applications are introduced. In fact, the rise of Data Governance’s profile within large organizations is testament to the acceptance of data distribution, and the need to manage it. Forrester has just released their first Forrester Wave ™ on data governance. They state it is time to address governance as “Data-driven opportunities for competitive advantage abound. As a consequence, the importance of data governance — and the need for tooling to facilitate data governance —is rising.” (Informatica is recognized as a Leader)
However, Data Governance Programs are not yet as widespread as they should be. Unfortunately it is hard to directly link strong Data Governance to business value. This means trouble getting a senior exec to sponsor the investment and cultural change required for strong governance. Which brings me back to the opportunity within Regulatory Compliance. My thinking goes like this:
- Regulatory compliance is often about gathering and submitting high quality data
- This is hard as the data is distributed, and the quality may be questionable
- Tools are required to gather, cleanse, manage and submit data for compliance
- There is a high overlap of tools & processes for Data Governance and Regulatory Compliance
So – why not use Regulatory Compliance as an opportunity to pilot Data Governance tools, process and practice?
Far too often compliance is a once-off effort with a specific tool. This tool collects data from disparate sources, with unknown data quality. The underlying data processes are not addressed. Strong Governance will have a positive effect on compliance – continually increasing data access and quality, and hence reducing the cost and effort of compliance. Since the cost of non-compliance is often measured in millions, getting exec sponsorship for a compliance-based pilot may be easier than for a broader Data Governance project. Once implemented, lessons learned and benefits realized can be leveraged to expand Data Governance into other areas.
Previously I likened Regulatory Compliance as a Buy One, Get One Free opportunity: Compliance + a free performance boost. If you use your compliance budget to pilot Data Governance – the boost will be larger than simply implementing Data Quality and MDM tools. The business case shouldn’t be too hard to build. Consider that EY’s research shows that companies that successfully use data are already outperforming their peers by as much as 20%.[i]
Data Governance Benefit = (Cost of non-compliance + 20% performance boost) – compliance budget
Yes, the equation can be considered simplistic. But it is compelling.