I recently got to talk to several senior IT leaders about their views on information governance and analytics. Participating were a telecom company, a government transportation entity, a consulting company, and a major retailer. Each shared openly in what was a free flow of ideas.
The CEO and Corporate Culture is critical to driving a fact based culture
I started this discussion by sharing the COBIT Information Life Cycle. Everyone agreed that the starting point for information governance needs to be business strategy and business processes. However, this caused an extremely interesting discussion about enterprise analytics readiness. Most said that they are in the midst of leading the proverbial horse to water—in this case the horse is the business. The CIO in the group said that he personally is all about the data and making factual decisions. But his business is not really there yet. I asked everyone at this point about the importance of culture and the CEO. Everyone agreed that the CEO is incredibly important in driving a fact based culture. Apparent, people like the new CEO of Target are in the vanguard and not the mainstream yet.
KPIs need to be business drivers
The above CIO said that too many of his managers are operationally, day-to-day focused and don’t understand the value of analytics or of predictive analytics. This CIO said that he needs to teach the business to think analytically and to understand how analytics can help drive the business as well as how to use Key Performance Indicators (KPIs). The enterprise architect in the group shared at this point that he had previously worked for a major healthcare organization. When organization was asked to determine a list of KPIs, they came back 168 KPIs. Obviously, this could not work so he explained to the business that an effective KPI must be a “driver of performance”. He stressed to the healthcare organization’s leadership the importance of having less KPIs and of having those that get produced being around business capabilities and performance drivers.
IT needs increasingly to understand their customers business models
I shared at this point that I visited a major Italian bank a few years ago. The key leadership had high definition displays that would roll by an analytic every five minutes. Everyone laughed at the absurdity of having so many KPIs. But with this said, everyone felt that they needed to get business buy in because only the business can derive the value from acting upon the data. According to this group of IT leaders, this causing them more and more to understand their customer’s business models.
Others said that they were trying to create an omni-channel view of customers. The retailer wanted to get more predictive. While Theodore Levitt said the job of marketing is to create and keep a customer. This retailer is focused on keeping and bringing back more often the customer. They want to give customers offers that use customer data that to increase sales. Much like what I described recently was happening at 58.com, eBay, and Facebook.
Most say they have limited governance maturity
We talked about where people are in their governance maturity. Even though, I wanted to gloss over this topic, the group wanted to spend time here and compare notes between each other. Most said that they were at stage 2 or 3 in in a five stage governance maturity process. One CIO said, gee does anyone ever at level 5. Like analytics, governance was being pushed forward by IT rather than the business. Nevertheless, everyone said that they are working to get data stewards defined for each business function. At this point, I asked about the elements that COBIT 5 suggests go into good governance. I shared that it should include the following four elements: 1) clear information ownership; 2) timely, correct information; 3) clear enterprise architecture and efficiency; and 4) compliance and security. Everyone felt the definition was fine but wanted specifics with each element. I referred them and you to my recent article in COBIT Focus.
CIO says they are the custodians of data only
At this point, one of the CIOs said something incredibly insightful. We are not data stewards. This has to be done by the business—IT is the custodians of the data. More specifically, we should not manage data but we should make sure what the business needs done gets done with data. Everyone agreed with this point and even reused the term, data custodians several times during the next few minutes. Debbie Lew of COBIT said just last week the same thing. According to her, “IT does not own the data. They facilitate the data”. From here, the discussion moved to security and data privacy. The retailer in the group was extremely concerned about privacy and felt that they needed masking and other data level technologies to ensure a breach minimally impacts their customers. At this point, another IT leader in the group said that it is the job of IT leadership to make sure the business does the right things in security and compliance. I shared here that one my CIO friends had said that “the CIOs at the retailers with breaches weren’t stupid—it is just hard to sell the business impact”. The CIO in the group said, we need to do risk assessments—also a big thing for COBIT 5–that get the business to say we have to invest to protect. “It is IT’s job to adequately explain the business risk”.
Is mobility a driver of better governance and analytics?
Several shared towards the end of the evening that mobility is an increasing impetus for better information governance and analytics. Mobility is driving business users and business customers to demand better information and thereby, better governance of information. Many said that a starting point for providing better information is data mastering. These attendees felt as well that data governance involves helping the business determine its relevant business capabilities and business processes. It seems that these should come naturally, but once again, IT for these organizations seems to be pushing the business across the finish line.
Blogs and Articles:
I recently got to meet with a very enlightened insurance company which was actively turning their SWOTT analysis (with the second T being trends) into concrete action. They shared with me that they view their go forward “right to win” being determined by the quality of customer experience they deliver to customers through their traditional channels and increasingly through “digital channels”. One marketing leader joked early on that “it’s no longer about the money; it is about the experience”. The marketing and business leaders that I met with made it extremely clear that they have a sense of urgency to respond to what it saw as significant market changes on the horizon. What this company wanted to achieve was a single view of customer across each of its distribution channel as well as their agent population. Typical of many businesses today, they had determined that they needed an automated, holistic view into things like its customer history. Smartly, this business wanted to put together its existing customer data with its customer leads.
Using Data to Accelerate the Percentage Customers that are Cross Sold
Taking this step was seen as allowing them to understand when an existing customer is also a lead for another product. With this knowledge, they wanted to provide them with special offers to accelerate their conversion from lead to being a customer with more than one product. What they wanted to do here reminded me of the capabilities of 58.com, eBay, and other Internet pure plays. The reason for doing this well was described recently by Gartner. Gartner suggests that increasing business success is determined by what they call “business moments”. Without a first rate experience that builds upon what this insurance company already knows about its customers, this insurance company worries it could be increasing at risk by Internet pure plays. As important, like many businesses, the degree of cross sell is for many businesses a major determinant of whether a customer is profitable or not.
Getting Customer Data Right is Key to Developing a Winning Digital Experience
To drive a first rate digital experience, this insurance company wanted to apply advanced analytics to a single view of customer and prospect data. This would allow them to do things like conduct nearest neighbor predictive analysis and modeling. In this form of analysis, “the goal is to predict whether a new customer will respond to an offer based on how other similar customers have responded” (Data Science for Business, Foster Provost, O’Reilly, 2013, page 147).
What has limiting this business like so many others is that their customer data is scattered across many enterprise systems. For just for one division, they have more than one Salesforce instance. Yet this company’s marketing team knew to keep its customers, it needed to be able to service them omnichannel and establish a single unified customer experience. To make this happen, they needed to for the first to share holistic customer information across their ecosystems. At the same time, they knew that they would needed to protect their customer’s privacy—i.e. only certain people would be able to see certain information. They wanted by role that the ability to selective mask data and protect their customer in particular consumers by only allowing certain users in defense parlance, with a need to know, to see a subset of the holistic set of information collected. When asked about the need for a single view of customer, the digital marketing folks openly shared that they perceived the potential for external digital market entrants—ala Porter’s five forces of competition. This firm saw them either as taking market share from them or effectively disintermediating them over time them from their customers as more and more customers move their insurance purchasing of Insurance to the Web. Given the risk, their competitive advantage needed to move to knowing better their customer and being able to respond better to them on the web. This clearly included new customers that are trying to win in the language of Theodore Levitt.
Competing on Customer Experience
In sum, this insurance company smartly felt that they needed to compete on customer experience to pull out a new phrase for me and this required superior knowledge of existing and new customers. This means they needed as complete and correct view of customers as possible including addresses, connection preferences, and increasingly social media responses. This means competitively responding directly to those that have honed their skills in web design, social presence, and advanced analytics. To do this, they will create predictive capabilities that will make use of their superior customer data. Clearly, without this prescience of thinking, this moment will not be like the strategic collision of Starbucks and Fast Food Vendors where the desire to grow forced competition between the existing player and new entrants wanting to claim a portion of the existing market player’s business.
Blogs and Articles
As I have shared within the posts of this series, businesses are using analytics to improve their internal and external facing business processes and to strengthen their “right to win” within the markets that they operate. Like healthcare institutions across the country, UPMC is striving to improve its quality of care and business profitability. One educational healthcare CEO put it to me this way–“if we can improve our quality of service, we can reduce costs while we increase our pricing power”. In UPMC’s case, they believe that the vast majority of their costs are in a fraction of their patients, but they want to prove this with real data and then use this information drive their go forward business strategies.
Getting more predictive to improved outcomes and reduce cost
Armed with this knowledge, UPMC’s leadership wanted to use advanced analytic and predictive modeling to improve clinical and financial decision making. And taking this action was seen as producing better patient outcomes and reducing costs. A focus area for analysis involved creating “longitudinal records” for the complete cost of providing particular types of care. For those that aren’t versed in time series analysis, longitudinal analysis uses a series of observations obtained from many respondents over time to derive a relevant business insight. When I was also involved in healthcare, I used this type of analysis to interrelate employee and patient engagement results versus healthcare outcomes. In UPMC’s case, they wanted to use this type of analysis to understand for example the total end to end cost of a spinal surgery. UPMC wanted to look beyond the cost of surgery and account for the pre-surgery care and recovery-related costs. However, to do this for the entire hospital meant that it needed to bring together data from hundreds of sources across UPMC and outside entities, including labs and pharmacies. However, by having this information, UPMC’s leadership saw the potential to create an accurate and comprehensive view which could be used to benchmark future procedures. Additionally, UPMC saw the potential to automate the creation of patient problem lists or examine clinical practice variations. But like the other case studies that we have reviewed, these steps required trustworthy and authoritative data to be accessed with agility and ease.
UPMC’s starts with a large, multiyear investment
In October 2012, UPMC made a $100 million investment to establish an enterprise analytics initiative to bring together for the first time, clinical, financial, administrative, genomic and other information together in one place. Tom Davenport, the author of Competing on Analytics, suggests in his writing that establishing an enterprise analytics capability represents a major step forward because it allows enterprises to answer the big questions, to better tie strategy and analytics, and to finally rationalize applications interconnect and business intelligence spending. As UPMC put its plan together, it realized that it needed to impact more than 1200 applications. As well it realized that it needed one system manage with data integration, master data management, and eventually complex event processing capabilities. At the same time, it created the people side of things by creating a governance team to manage data integrity improvements, ensuring that trusted data populates enterprise analytics and provides transparency into data integrity challenges. One of UPMC’s goals was to provide self-service capabilities. According to Terri Mikol, a project leader, “We can’t have people coming to IT for every information request. We’re never going to cure cancer that way.” Here is an example of the promise that occurred within the first eight months of this project. Researchers were able to integrate—for the first time ever– clinical and genomic information on 140 patients previously treated for breast cancer. Traditionally, these data have resided in separate information systems, making it difficult—if not impossible—to integrate and analyze dozens of variables. The researchers found intriguing molecular differences in the makeup of pre-menopausal vs. post-menopausal breast cancer, findings which will be further explored. For UPMC, this initial cancer insight is just the starting point of their efforts to mine massive amounts of data in the pursuit of smarter medicines.
Building the UPMC Enterprise Analytics Capability
To create their enterprise analytics platform, UPMC determined it was critical to establish “a single, unified platform for data integration, data governance, and master data management,” according to Terri Mikol. The solution required a number of key building blocks. The first was data integration to collect and cleanses data from hundreds of sources and organizes them into repositories that would enable fast, easy analysis and reporting by and for end users.
Specifically, the UPMC enterprise analytics capability pulls clinical and operational data from a broad range of sources, including systems for managing hospital admissions, emergency room operations, patient claims, health plans, electronic health records, as well as external databases that hold registries of genomic and epidemiological data needed for crafting personalized and translational medicine therapies. UPMC has integrated quality checked source data in accordance with industry-standard healthcare information models. This effort included putting together capabilities around data integration, data quality and master data management to manage transformations and enforce consistent definitions of patients, providers, facilities and medical terminology.
As said, the cleansed and harmonized data is organized into specialized genomics databases, multidimensional warehouses, and data marts. The approach makes use of traditional data warehousing approaches as well as big data capabilities to handle unstructured data and natural language processing. UPMC has also deployed analytical tools that allow end users to exploit the data enabled from the Enterprise Analytics platform. The tools drive everything from predictive analytics, cohort tracking, and business and compliance reporting. And UPMC did not stop here. If their data had value then it needed to be secured. UPMC created data audits and data governance practices. As well, they implemented a dynamic data masking solution ensures data security and privacy.
As I have discussed, many firms are pushing point silo solutions into their environments, but as UPMC shows this limits their ability to ask the bigger business questions or in UPMC’s case to discover things that can change people’s live. Analytics are more and more a business enabler if they are organized as an enterprise analytics capability. As well, I have come to believe that analytics have become foundational capability to all firms’ right to win. It informs a coherent set of capabilities and establishes a firm’s go forward right to win. For this, UPMC is a shining example of getting things right.
Author Twitter: @MylesSuer
Recently, I got to attend the Predictive Analytics Summit in San Diego. It felt great to be in a room full of data scientists from around the world—all my hidden statistics, operations research, and even modeling background came back to me instantly. I was most interested to learn what this vanguard was doing as well as any lessons learned that could be shared with the broader analytics audience. Presenters ranged from Internet leaders to more traditional companies like Scotts Miracle Gro. Brendan Hodge of Scotts Miracle Gro in fact said, as 125 year old company, he feels like “a dinosaur at a mammal convention”. So in the space that follows, I will share my key take-aways from some of the presenters.
Fei Long from 58.com
58.com is the Craigslist, Yelp, and Monster of China. Fei shared that 58.com is using predictive analytics to recommend resumes to employers and to drive more intelligent real time bidding for its products. Fei said that 58.com has 300 million users—about the number of people in the United States. Most interesting, Fei said that predictive analytics has driven a 10-20% increase in 58.com’s click through rate.
Ian Zhao from eBay
Ian said that eBay is starting to increase the footprint of its data science projects. He said that historical the focus for eBay’s data science was marketing, but today eBay is applying data science to sales and HR. Provost and Fawcett agree in “Data Science for Business” by saying that “the widest applications of data mining techniques are in marketing for tasks such as target marketing, online advertising, and recommendations for cross-selling”.
Ian said that in the non-marketing areas, they are finding a lot less data. The data is scattered across data sources, and requires a lot more cleansing. Ian is using things like time series and ARIMA to look at employee attrition. One thing that Ian found that was particularly interesting is that there is strong correlation between attrition and bonus payouts. Ian said it is critical to leave ample time for data prep. He said that it is important to start the data prep process by doing data exploration and discovery. This includes confirming that data is available for hypothesis testing. Sometimes, Ian said that this the data prep process can include inputting data that is not available in the data set and validating data summary statistics. With this, Ian said that data scientists need to dedicate time and resources for determining what things are drivers. He said with the business, data scientist should talk about likelihood because business people in general do not understand statistics. It is important as well that data scientist ask business people the so what questions. Data scientist should narrow things down to a dollar impact.
Barkha Saxena from Poshmark
Barkha is trying to model the value of user growth. Barkha said that this matters because Poshmark wants to be the #1 community driven marketplace. They want to use data to create a “personal boutique experience”. With 700,000 transactions a day, they are trying to measure customer lifetime value by implementing a cohort analysis. What was the most interesting in Barkha’s data is she discovered repeatable performance across cohorts. In their analysis, different models work better based upon the data—so a lot of time goes into procedurally determining the best model fit.
Meagan Huth from Google
Meagan said that Google is creating something that they call People Analytics. They are trying to make all people decisions by science and data. They want to make it cheaper and easier to work at Google. They have found through their research that good managers lower turnover, increase performance, and increase workplace happiness. The most interesting thing that she says they have found is the best predictor of being a good manager is being a good coach. They have developed predictive models around text threads including those that occur in employee surveys to ensure they have the data to needed to improve.
Hobson Lane from Sharp Labs
Hobson reminded everyone of the importance Nyquist (you need to sample data twice as fast as the fastest data event). This is especially important for organizations moving to the so called Internet of Things. Many of these devices have extremely large data event rates. Hobson, also, discussed the importance of looking at variance against the line that gets drawn in a regression analysis. Sometimes, multiple lines can be drawn. He, also, discussed the problem of not having enough data to support the complexity of the decision that needs to be made.
Ravi Iyer from Ranker
Ravi started by saying Ranker is a Yelp for everyone else. He then discussed the importance of have systematic data. A nice quote from him is as follows: “better data=better predictions”. Ravi discussed as well the topic of response bias. He said that asking about Coke can lead to different answer when you ask about Coke or Coke at a movie. He discussed interesting how their research shows that millennials are really all about “the best”. I see this happening every time that I take my children out to dinner—there is no longer a cheap dinner out.
Ranjan Sinha at eBay
Ranjan discussed the importance of customer centric commerce and creating predictive models around it. At eBay, they want to optimize the customer experience and improve their ability to make recommendations. eBay is finding customer expectations are changing. For this reason, they want customer context to be modeled by looking at transactions, engagement, intent, account, and inferred social behaviors. With modeling completed, they are using complex event processing to drive a more automated response to data. An amazing example given was for Valentine Day’s where they use a man’s partner’s data to predict the items that the man should get for his significant other.
Andrew Ahn from LinkedIn
Andrew is using analytics to create what he calls an economic graph and to make professionals more productive. One area that he personally is applying predictive analytics to is with LinkedIn’s sales solutions. In LinkedIn Sales Navigator, they display potential customers based upon the sales person’s demographic data—effectively the system makes lead recommendations. However, they want to de-risk this potential interaction for sale professionals and potential customers. Andrews says at the same time that they have found through data analysis that small changes in a LinkedIn profile can lead to big changes. To put this together, they have created something that they call the social selling index. It looks at predictors that they have determined are statistically relevant including member demographic, site engagement, and social network. The SSI score is viewed as a predictive index. Andrew says that they are trying to go from serendipity to data science.
Robert Wilde from Slacker Radio
Robert discussed the importance of simplicity and elegance in model building. He then went through a set of modeling issues to avoid. He said that modelers need to own the discussion of causality and cause and effect and how this can bias data interpretation. In addition, looking at data variance was stressed because what does one do when a line doesn’t have a single point fall on it. Additionally, Robert discussed what do you do when correlation is strong, weak, or mistaken. Is it X or Y that has the relationship. Or worse yet what do you do when there is coincidental correlation. This led to a discussion of forward and reverse causal inference. For this reason, Robert argued strongly for principal component analysis. This eliminates regression causational bias. At the same time, he suggested that models should be valued by complexity versus error rates.
Parsa Bakhtary from Facebook
Parsa has been looking at what games generate revenue and what games do not generate revenue for Facebook—Facebook amazingly has over 1,000 revenue bearing game. For this reason, Facebook wants to look at the Lifetime Value of Customers for Facebook Games—ithe dollar value of a relationship. Parsa said, however, there is a problem, only 20% pay for their games. Parsa argued that customer life time value (which was developed in the 1950s) doesn’t really work for apps where everyones lifetime is not the same. Additionally, social and mobile gamers are not particularly loyalty. He says that he, therefore, has to model individual games for their first 90 days across all periods of joining and then look at the cumulative revenue curves.
So we have seen here a wide variety of predictive analytics techniques being used by today’s data scientists. To me this says that predictive analytical approaches are alive and kicking. This is good news and shows that data scientists are trying to enable businesses to make better use of their data. Clearly, a key step that holds data scientist back today is data prep. While it is critical to leave ample time for data prep, it is also essential to get quality data to ensure models are working appropriately. At the same time, data prep needs to support inputting data that is not available within the original data set.
Solution Brief: Data Prep
Author Twitter: @MylesSuer
According to Strategy and Business, the “CFO role is expanding to include being the company’s premier champion of strategic discipline.” It is no wonder that financial transformations are so much in vogue these days. According to The Conference Board, 81% of the companies that it surveyed are involved in a major financial transformation initiative. However, only 27% of these firms claim that they had achieved the benefits that were defined within their business case. Of the reasons for failure, the most interesting is thinking the transformation would be some kind of big bang. The problem is this type of thinking is unrealistic for today’s hyper competitive business environment. Financial strategy today needs be an enabler of business strategy. This means that it needs to be able to support the increasingly shorter duration of business strategy.
Financial Transformation needs to Enable Business Agility
I have discovered the same thing in my discussions with IT organizations. In other words, enabling business strategies increasingly need to be built with a notion of agility. This means for financial strategies that they to need to first and foremost make organizations more agile and enable more continuous business change. Think about the impact of an organization that has as part of its strategy inorganic acquisition. This all means that thinking that a multi-year ERP implementations will on it’s own deliver financial transformation alone is unrealistic.
While it is absolutely fair to determine what at the manual tasks financial teams can eliminate, it does not make sense to think that they are done once an ERP implementation is completed. Recently, I was talking with a large accounting consulting and integration firm, they let me know that they really liked doing large ERP implementations and re-implementations, but they also knew that they would soon break under the weight of financial and business change unless flexibility was built in from the start. Financial transformation must start by creating business flexibility and agility to work in today’s business environment.
Does Your Close Process Get in the Way?
But achieving better financial agility and profitability improvement capabilities is often limited by the timeliness, trustworthiness of data. This why CFOs say that they spend so much of their time on the close process. According to the MIT CFO Summit Survey, nearly half of the organizations surveyed are feeling pressure from senior leadership to become more data driven and analytical. Data clearly limits the finance function ability to guide corporate executives, business-unit managers, and sales and marketing functions in ways to ensures business profitable and growth.
Financial Transformations Need to Fit Business Strategy
At the same time, it cannot be stressed enough that successful financial transformations need to be designed to fit with the company’s larger business strategy. The Conference Board suggests financial organizations should put real emphasis upon transformations that grow the business. Jonathan Brynes at the MIT Sloan School has suggested “the most important issue facing most managers …is making more money from their existing businesses without costly new initiatives”. In Brynes’ cross industry research, he found that 30% or higher of each company’s businesses are unprofitable. Brynes claims these business losses are offset by what are “islands of high profitability”. The root cause of this issue he asserts is the inability of current financial and management control systems to surface profitability problems and opportunities for investment to accelerate growth. For this reason, financial transformations should as a business goal make it easier to evaluate business profitability.
In a survey from CFO magazine, they found that nearly all the survey respondents said their companies are striving to improve profitability over the next year. 87% said their companies needed to analyze financial and performance data much more quickly if they were to meet business targets. However, only 12% said their finance organizations can respond to requests for financial reports and analysis from business managers in real or near-real time. At the same time, business managers are expecting finance staff to be able to tell the story behind the numbers — to integrate financial and operational data in ways that get at the drivers of improvement.
We Are Talking About More than Financial Decision Making
This means not just worrying about financial decision making, but ensuring that the right questions and the right insights are being provided for the business. As Geoffrey Moore has indicated economies of scale and market clout are no longer the formidable barriers to entry that they once were. The allocation of resources must be focused on a company’s most distinctive business capabilities—those things that provide the enterprise its “right to win”. To be a strategic, CFOs need to become a critical champion of the capabilities system, making sure it gets the investment and consideration it needs. This accentuates your ongoing role as a voice of reason in M&A—favoring acquisitions that fit well with the company’s capabilities system, and recommending divestitures of products and services that don’t.
Today, the CFO role is being transformed to increasingly be a catalyst for change. This involves increasingly helping companies focus upon the business capabilities that drive value. CFOs are uniquely positioned to take on this challenge. They are the company leader that combines strategic insight with a line of sight into business execution. Moreover, unlike other change agents, CFOs have the power of the purse. However, to do this their financial transformations need to ensure business agility and improve their and the businesses ability to get and use data.
Today, I am going to share on what few others have so far been willing to share in public regarding big data. Before doing so, I need to first bring you up to speed on what I have already written on the topic. I shared previously that the term big data probably is not very useful or descriptive. Thomas Davenport, the author of Competing on Analytics, has found as well that “over 80 percent of the executives surveyed thought the term was overstated, confusing, or misleading”. At the same time, the CIOs that I have talking to have suggested that I tell our reps never to open a meeting with the big data topic but instead to talk about the opportunity to relate the volumes of structured data to even larger volumes of unstructured data.
I believe that the objective of these discussions increasingly should be to discover how it is possible to solve even bigger and meatier business problems. It is important, as I said recently; to take a systems view of big data and this includes recognizing that business stakeholders will not use any data unless it is trustworthy, regardless of cost. Having made these points previously, I would like to bring to the forefront another set of issues that businesses should consider before beginning a big data implementation. I have come to my point of view here by listening to the big data vanguard. Many of these early adopters have told me that they jumped onto the big data bandwagon because they heard big data would be cheaper than traditional business intelligence implementations.
However, these enterprises soon discovered that they couldn’t leer away from their Silicon Valley jobs those practiced in the fine arts of HADOOP and MapReduce. They found as well that hand coding approaches and the primitive data collection tools provided by the HADOOP vendors were not ready for prime time and did not by themselves save cost. These early pioneers found that they needed a way to automate the movement and modification of data for analysis. What they determine was needed is an automated, non-script based way to pull and transform the data that populates their HADOOP or other big data type systems. This included real time solutions like Hana and Vertica.
A new architecture for business intelligence
But as I have looked further into the needs of these early adopters, it became clear that they needed an architecture that could truly manage their end to end business intelligence requirements. They needed an architecture that would handle their entire data use lifecycle from the collection, inspection, connection, perfection, and protection of data.
Architecture requires a Data Lake
Obviously, I have already been discussing what could be called the collection phase. But to be clear, big data should be just one element of a larger collection scheme. No one is suggesting for example that existing business intelligence systems be replaced in wholesale fashion with the so called newer approaches. Given this, business architecture needs to start by establishing a data lake approach that over arches the new data storage approaches and effectively sits side by side with existing business intelligence assets.
Data discovery starts by testing data relationships
Once new forms of data are collected using HADOOP or other forms of big data storage within an overarching data lake, users and analysts need to inspect the data collected as whole and surface interrelationships with new and existing forms of data. What is needed in addition to data movement is a lake approach to deploy data and evaluate data relationships. Today, this involves enabling business intelligence users to self-service. One CIO that heard about this lit up and said “this is like orchestration. Users can assembly data and put it together and do it from different sources at different times. It doesn’t just have to be a preconceived process.”
Data Enrichment enables business decision making
Historically users needed to know what data they wanted for analyze prior to building a business intelligence system. An advantage of HADOOP plus an overarching data lake is that you can put data in a place prior to knowing if the data has an interesting business use case or not. Once data is captured, it needs tooling to evaluate and put together data and test the strength of potential data relationships. This includes enabling business users to evaluate the types of analytics that could potentially have value to them. I shared recently on just how important it was to visualize data in a way that culturally fits and derives the most potential business value.
Once data has been evaluated and relevant data relationships have been determined, then it is important to have a way to siphon off data that has been determined to have potential business interest and do what you always did to this data. This includes adding meaningful additional structure and relationship to the data and fixing the quality of data that needs to be related and created within an analytic. This can include things like data mastering. This can mean that one of two things takes place. First is data relationships are extended and data quality and consistency are improved. In this data perfection stage, it can for finance mean integrating and then consolidating data for a total view of the financial picture. For marketing people, it can involving creating an integrated customer record fusing together existing customer master data with external customer datasets to improve cross sell and customer service. With this accomplished it becomes an analysis decision and a cost decision whether data continues to be housed in HADOOP or managed in an existing traditional data warehouse structure.
Valuable data needs to be protected
Once data is created that can be used for business decision making, then we need to take the final step of protecting the data that often cost millions of dollars to create, refine, and analyze. Recently, I was with a CIO and asked about various hacks that have captured so much media attention. This CIO said that the CIOs at the companies that had been hacked were not stupid. It is hard to justify the business value of protecting the value of the data that has been created. It seems clear to me at least that we need to protect data as an asset and as well the external access to it given the brand and business impacts of being hacked.
It seems clear that we need an architecture that is built to last and deliver sustaining value to the business. So here is the cycle again–collection, inspection, connection, perfection, and protection of data. Each step matters to big data but as well to the data architecture that big data is adding onto.
Author Twitter: @MylesSuer
In my discussions with CIOs, their opinions differ widely about the go forward nature of the CIO role. While most feel the CIO role will remain an important function, they also feel a sea state change is in process. According to Tim Crawford, a former CIO and strategic advisor to CIOs, “CIOs are getting out of the data center business”. In my discussions, not all yet see the complete demise for their data centers. However, it is becoming more common for CIOs to see themselves “becoming an orchestrator of business services versus a builder of new operational services”. One CIO put it this way, “the building stuff is now really table stakes. Cloud and loosely oriented partnerships are bringing vendor management to the forefront”.
As more and more of the service portfolio are provided by third parties in either infrastructure as a service (IaaS) or software as a service (SaaS) modes, the CIO needs to take on what will become an increasingly important role –the service broker. An element of the service broker role that will have increasingly importance is the ability to glue together business systems w6hether they are on premise, cloud managed (Iaas), or software as a service (Saas). Regardless of who creates or manages the applications of the enterprise, it is important to remember that integration is to a large degree the nervous system that connects applications into business capabilities. As such, the CIO’s team has a critical and continuing role in managing this linkage. For example, spaghetti code integrations can easily touch 20 or more systems for ERP or expense management systems.
Brokering integration services
As CIOs start to consider the move to cloud, they need to determine how this nervous system is connected, maintained, and improved. In particular, they need to determine maybe for the first time how to integrate their cloud systems to the rest of their enterprise systems. They clearly can continue to do so by building and maintaining hand coding or by using their existing ETL tools. This can work where one takes on an infrastructure as a service model. But it falls apart when looking at the total cost of ownership of managing the change of a SaaS model. This fact begs an interesting question. Shouldn’t the advantages of SaaS occur as well for integration? Shouldn’t there be Cloud Data Management (Integration as a Service)options? The answer is yes. Instead of investing in maintain integrations of SaaS systems which because of agile methodologies can change more frequently than traditional software development, couldn’tsomeone else manage this mess for me.
The advantage of the SaaS model is total cost of ownership and faster time to value. Instead of managing, integration between SaaS and historical environments, the integration between SaaS applications and historical applications can be maintained by the cloud data Management vendor. This would save both cost and time. As well, it would free you to focus your team’s energy upon cleaning up the integrations between historical systems and each other. This is a big advantage for organizations trying to get on the SaaS bandwagon but not incur significantly increased costs as a result.
Infrastructure as a Service (IaaS)—Provides processor, databases, etc. remotely but you control and maintain what goes on them
Software as a Service (Saas)—Provides software applications and underling infrastructure as a Service
Cloud Data Management—Provides Integration of applications in particular SaaS applications as a service
CIOs are embarking upon big changes. Building stuff is becoming less and less relevant. However, even as more and more services are managed remotely (even by other parties), it remains critical that CIOs and their teams manage the glue between applications. With SaaS application in particular, this is where Cloud Data Management can really help you control integrations with less time and cost.
Author Twitter: @MylesSuer
A month ago, I shared that Frank Friedman believes CFOs are “the logical choice to own analytics and put them to work to serve the organization’s needs”. Even though many CFOs are increasingly taking on what could be considered an internal CEO or COO role, many readers protested my post which focused on reviewing Frank Friedman’s argument. At the same time, CIOs have been very clear with me that they do not want to personally become their company’s data steward. So the question becomes should companies be creating a CDO or CAO role to lead this important function? And if yes, how common are these two roles anyway?
Regardless of eventual ownership, extracting value out of data is becoming a critical business capability. It is clear that data scientists should not be shoe horned into the traditional business analyst role. Data Scientists have the unique ability to derive mathematical models “for the extraction of knowledge from data “(Data Science for Business, Foster Provost, 2013, pg 2). For this reason, Thomas Davenport claims that data scientists need to be able to network across an entire business and be able to work at the intersection of business goals, constraints, processes, available data and analytical possibilities. Given this, many organizations today are starting to experiment with the notion of having either a chief data officers (CDOs) or chief analytics officers (CAOs). The open questions is should an enterprise have a CDO or a CAO or both? And as important in the end, it is important to determine where should each of these roles report in the organization?
Data policy versus business questions
In my opinion, it is the critical to first look into the substance of each role before making a decision with regards to the above question. The CDO should be about ensuring that information is properly secured, stored, transmitted or destroyed. This includes, according to COBIT 5, that there are effective security and controls over information systems. To do this, procedures need to be defined and implemented to ensure the integrity and consistency of information stored in databases, data warehouses, and data archives. According to COBIT 5, data governance requires the following four elements:
- Clear information ownership
- Timely, correct information
- Clear enterprise architecture and efficiency
- Compliance and security
To me, these four elements should be the essence of the CDO role. Having said this, the CAO is related but very different in terms of the nature of the role and the business skills require. The CRISP model points out just how different the two roles are. According to CRISP, the CAO role should be focused upon business understanding, data understanding, data preparation, data modeling, and data evaluation. As such the CAO is focused upon using data to solve business problems while the CDO is about protecting data as a business critical asset. I was living in in Silicon Valley during the “Internet Bust”. I remember seeing very few job descriptions and few job descriptions that existed said that they wanted a developer who could also act as a product manager and do some marketing as a part time activity. This of course made no sense. I feel the same way about the idea of combining the CDO and CAO. One is about compliance and protecting data and the other is about solving business problems with data. Peanut butter and chocolate may work in a Reese’s cup but it will not work here—the orientations are too different.
So which business leader should own the CDO and CAO?
Clearly, having two more C’s in the C-Suite creates a more crowded list of corporate officers. Some have even said that this will extended what is called senior executive bloat. And what of course how do these new roles work with and impact the CIO? The answer depends on organization’s culture, of course. However, where there isn’t an executive staff office, I suggest that these roles go to different places. Clearly, many companies already have their CIO function already reporting to finance. Where this is the case, it is important determine whether a COO function is in place. The COO clearly could own the CDO and CAO functions because they have a significant role in improving process processes and capabilities. Where there isn’t a COO function and the CIO reports to the CEO, I think you could have the CDO report to the CIO even though CIOs say they do not want to be a data steward. This could be a third function in parallel the VP of Ops and VP of Apps. And in this case, I would put the CAO report to one of the following: the CFO, Strategy, or IT. Again this all depends on current organizational structure and corporate culture. Regardless of where it reports, the important thing is to focus the CAO on an enterprise analytics capability.
Author Twitter: @MylesSuer
According Michelle Fox of CNBC and Stephen Schork, the oil industry is in ‘dire straits’. U.S. crude posted its ninth-straight weekly loss this week, landing under $50 a barrel. The news is bad enough that it is now expected to lead to major job losses. The Dallas Federal Reserve anticipates that the Texas could lose about 125,000 jobs by the end of June. Patrick Jankowski, an economist and vice president of research at the Greater Houston Partnership, expects exploration budgets will be cut 30-35 percent, which will result in approximately 9,000 fewer wells being drilled. The problem is “if oil prices keep falling, at some point it’s not profitable to pull it out of the ground” (“When, and where, oil is too cheap to be profitable”, CNBC, John W. Schoen).
This means that a portion of the world’s oil supply will become unprofitable to produce. According to Wood Mackenzie, “once the oil price reaches these levels, producers have a sometimes complex decision to continue producing, losing money on every barrel produced, or to halt production, which will reduce supply”. The question are these the only answers?
Major Oil Company Uses Analytics to Gain Business Advantage
A major oil company that we are working with has determined that data is a success enabler for their business. They are demonstrating what we at Informatica like to call a “data ready business”—a business that is ready for any change in market conditions. This company is using next generation analytics to ensure their businesses survival and to make sure they do not become what Jim Cramer likes to call a “marginal producer”. This company has said to us that their success is based upon being able to extract oil more efficiently than its competitors.
Historically data analysis was pretty simple
Traditionally oil producers would get oil by drilling a new hole in the ground. And in 6 months they would start getting the oil flowing commercially and be in business. This meant it would typically take them 6 months or longer before they could get any meaningful results including data that could be used to make broader production decisions.
Drilling from data
Today, oil is, also, produced from shale or fracking techniques. This process can take only 30-60 days before oil producers start seeing results. It is based not just on innovation in the refining of oil, but also on innovation in the refining of data from operational business decisions can be made. The benefits of this approach including the following:
Improved fracking process efficiency
Fracking is a very technical process. Producers can have two wells on the same field that are performing at very different levels of efficiency. To address this issue, the oil company that we have been discussing throughout this piece is using real-time data to optimize its oil extraction across an entire oil field or region. Insights derived from these allow them to compare wells in the same region for efficiency or productivity and even switch off certain wells if the oil price drops below profitability thresholds. This ability is especially important as the price of oil continues to drop. At $70/barrel, many operators go into the red while more efficient data driven operators can remain profitable at $40/barrel. So efficiency is critical across a system of wells.
Using data to decide where to build wells in the first place
When constructing a fracking or sands well, you need more information on trends and formulas to extract oil from the ground. On a site with 100+ wells for example, each one is slightly different because of water tables, ground structure, and the details of the geography. You need the right data, the right formula, and the right method to extract the oil at the best price and not impact the environment at the very same time.
The right technology delivers the needed business advantage
Of course, technology is never been simple to implement. The company we are discussing has 1.2 Petabytes of data they were processing and this volume is only increasing. They are running fiber optic cables down into wells to gather data in real time. As a result, they are receiving vast amounts of real time data but cannot store and analyze the volume of data efficiently in conventional systems. Meanwhile, the time to aggregate and run reports can miss the window of opportunity while increasing cost. Making matters worse, this company had a lot of different varieties of data. It also turns out that quite of bit of the useful information in their data sets was in the comments section of their source application. So traditional data warehousing would not help them to extract the information they really need. They decided to move to new technology, Hadoop. But even seemingly simple problems, like getting access to data were an issue within Hadoop. If you didn’t know the right data analyst, you might not get the data you needed in a timely fashion. Compounding things, a lack of Hadoop skills in Oklahoma proved to be a real problem.
The right technology delivers the right capability
The company had been using a traditional data warehousing environment for years. But they needed help to deal with their Hadoop environment. This meant dealing with the volume, variety and quality of their source well data. They needed a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn the internals of Hadoop. Early adopters of Hadoop and other Big Data technologies have had no choice but to hand-code using Java or scripting languages such as Pig or Hive. Hiring and retaining big data experts proved time consuming and costly. This is because data scientists and analysts can spend only 20 percent of their time on data analysis and the rest on the tedious mechanics of data integration such as accessing, parsing, and managing data. Fortunately for this oil producer, it didn’t have to be this way. They were able to get away with none of the specialized coding required to scale performance on distributed computing platforms like Hadoop. Additionally, they were able “Map Once, Deploy Anywhere,” knowing that even as technologies change they can run data integration jobs without having to rebuild data processing flows.
It seems clear that we live in an era where data is at the center of just about every business. Data-ready enterprises are able to adapt and win regardless of changing market conditions. These businesses invested in building their enterprise analytics capability before market conditions change. In this case, these oil producers will be able to produce oil at lower costs than others within their industry. Analytics provides three benefits to oil refiners.
- Better margins and lower costs from operations
- Lowers risk of environmental impact
- Lower time to build a successful well
In essence, those that build analytics as a core enterprise capability will continue to have a right to win within a dynamic oil pricing environment.
Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”
Several months ago, I was talking to some CIOs about their business problems. During these conversations, I asked them about their interest in Big Data. One sophisticated CIO recoiled almost immediately saying that he believes most vendors are really having a problem discussing “Big Data” with customers like him. It would just be so much easier if you guys would talk to me about helping my company with our structured data and unstructured data. At the same time, Gartner has found that 64% of enterprises surveyed say they’re deploying or planning to deploy a Big Data project. The problem is that 56% of those surveyed by Gartner are still struggling to determine how to get value out of big data projects and 23% are struggling with the definition of what is Big Data and what is not Big Data.
Clearly, this says the term does not work with market and industry participants. To me this raises a question about the continued efficacy of the term. And now, Thomas Davenport, the author of “Competing on Analytics”, has suggested that we retire the term all together. Tom says that in his research “nobody likes the term”. He claims in particular that executives yearn for a better way to communicate what they are doing with data and analytics.
Tom suggests in particular that “Big Data” has five significant flaws:
1) Big is relative. What is big today will not be so large tomorrow. Will we have to tall call the future version Big Big Data?
2) Big is only one aspect of what is distinctive about the data in big data. Like my CIO friend said it is not as much about the size of data as it is about the nature of the data. Tom says bigness demands more powerful services, but a lack of structure demands different approaches to process the data.
3) Big data is defined as having volume, variety, and velocity. But what do you call data that has variety and velocity but the data set is not “big”.
4) What do you call the opposite of big data? Is it small data? Nobody likes this term either.
5) Too many people are using “big data” incorrectly to mean any use of analytics, reporting, or conventional business intelligence.
Tom goes onto say, “I saw recently, over 80 percent of the executives surveyed thought the term was overstated, confusing, or misleading”. So Tom asks why don’t we just stop using it. In the end, Tom struggles with ceasing his use of the term because the world noticed the name Big Data unlike other technological terms. Tom has even written a book on the subject—“Big Data at Work”. The question I have is do we in the IT industry want to really lose all the attention. It feels great to be in the cool crowd. However, CIOs that I have talked to say they are really worried about what will happen if their teams oversell Big Data and do not deliver tangible business outcomes. The reality Tom says it would be more helpful than saying, we are cool and we are working on big data to instead say instead we’re extracting customer transaction data from our log files in order to help marketing understand the factors leading to customer attrition”. I tend to agree with this thought but I would like to hear what you think? Should we as an industry retire the term Big Data?
Author Twitter: @MylesSuer