Tag Archives: Analytics
As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.
Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.
As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.
Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.
We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?
Talking to architects about analytics at a recent event, I kept hearing the familiar theme; data scientists are spending 80% of their time on “data wrangling” leaving only 20% for delivering the business insights that will drive the company’s innovation. It was clear to everybody that I spoke to that the situation will only worsen. The coming growth everybody sees in data volume and complexity, will only lengthen the time to value.
Gartner recently predicted that:
“by 2015, 50% of organizations will give up on managing growth and will redirect funds to improve classification and analytics.”
Some of the details of this study are interesting. In the end, many organizations are coming to two conclusions:
- It’s risky to delete data, so they keep it around as insurance.
- All data has potential business value, so more organizations are keeping it around for potential analytical purposes.
The other mega-trend here is that more and more organizations are looking to compete on analytics – and they need data to do it, both internal data and external data.
From an architect’s perspective, here are several observations:
- The floodgates are open and analytics is a top priority. Given that, the emphasis should be on architecting to manage the dramatic increases in both data quantity and data complexity rather than on trying to stop it.
- The immediate architectural priority has to be on simplifying and streamlining your current enterprise data architecture. Break down those data silos and standardize your enterprise data management tools and processes as much as possible. As discussed in other blogs, data integration is becoming the biggest bottleneck to business value delivery in your environment. Gartner has projected that “by 2018, more than half the cost of implementing new large systems will be spent on integration.” The more standardized your enterprise data management architecture is, the more efficient it will be.
- With each new data type, new data tool (Hive, Pig, etc.), and new data storage technology (Hadoop, NoSQL, etc.) ask first if your existing enterprise data management tools can handle the task before people go out and create a new “data silo” based on the cool, new technologies. Sometimes it will be necessary, but not always.
- The focus needs to be on speeding value delivery for the business. And the key bottleneck is highly likely to be your enterprise data architecture.
Rather than focusing on managing data growth, the priority should be on managing it in the most standardized and efficient way possible. It is time to think about enterprise data management as a function with standard processes, skills and tools (just like Finance, Marketing or Procurement.)
Several of our leading customers have built or are building a central “Data as a Service” platform within their organizations. This is a single, central place where all developers and analysts can go to get trustworthy data that is managed by IT through a standard architecture and served up for use by all.
For more information, see “The Big Big Data Workbook”
*Gartner Predicts 2015: Managing ‘Data Lakes’ of Unprecedented Enormity, December 2014 http://www.gartner.com/document/2934417#
The signs that healthcare is becoming a more consumer (think patients, members, providers) driven industry are evident all around us. I see provider and payer organizations clamoring for more data, specifically data that is actionable, relatable and has integrity. Armed with this data, healthcare organizations are able to differentiate around a member/patient-centric view.
These consumer-centric views convey the total value of the relationships healthcare organizations have with consumers. Understanding the total value creates a more comprehensive understanding of consumers because they deliver a complete picture of an individual’s critical relationships including: patient to primary care provider, member to household, provider to network and even members to legacy plans. This is the type of knowledge that informs new treatments, targets preventative care programs and improves outcomes.
Payer organizations are collecting and analyzing data to identify opportunities for more informed care management and segmentation to reach new, high value customers in individual markets. By segmenting and targeting messaging to specific populations, health plans generate increased member satisfaction and cost effectively expands and manages provider networks.
How will they accomplish this? Enabling members to interact in health and wellness forums, analyzing member behavior and trends and informing care management programs with a 360 view of members… to name a few . Payers will also drive new member programs, member retention and member engagement marketing and sales programs by investigating complete views of member households and market segments.
In the provider space, this relationship building can be a little more challenging because often consumers as patients do not interact with their doctor unless they are sick, creating gaps in data. When provider organizations have a better understanding of their patients and providers, they can increase patient satisfaction and proactively offer preventative care to the sickest (and most likely to engage) of patients before an episode occurs. These activities result in increased market share and improved outcomes.
Where can providers start? By creating a 360 view of the patient, organizations can now improve care coordination, open new patient service centers and develop patient engagement programs.
Analyzing populations of patients, and fostering patient engagement based on Meaningful Use requirements or Accountable Care requirements, building out referral networks and developing physician relationships are essential ingredients in consumer engagement. Knowing your patients and providing a better patient experience than your competition will differentiate provider organizations.
You may say “This all sounds great, but how does it work?” An essential ingredient is clean, safe and connected data. Clean, safe and connected data requires an investment in data as an asset… just like you invest in real estate and human capital, you must invest in the accessibility and quality of your data. To be successful, arm your team with tools to govern data –ensuring ongoing integrity and quality of data, removes duplicate records and dynamically incorporates data validation/quality rules. These tools include master data management, data quality, metadata management and are focused on information quality. Tools focused on information quality support a total customer relationship view of members, patients and providers.
You Say Big Dayta, I say Big Dahta
Some say Big Data is a great challenge while others say Big Data creates new opportunities. Where do you stand? For most companies concerned with their Big Data challenges, it shouldn’t be so difficult – at least on paper. Computing costs (both hardware and software) have vastly shrunk. Databases and storage techniques have become more sophisticated and scale massively, and companies such as Informatica have made connecting and integrating all the “big” and disparate data sources much easier and have helped companies achieve a sort of “big data synchronicity”. As it is.
In the process of creating solutions to Big Data problems, humans (and the supra-species known as IT Sapiens) have a tendency to use theories based on linear thinking and the scientific method. There is data as our systems know it and data as our systems don’t. The reality, in my opinion, is that “Really Big Data” problems now and in the future will have complex correlations and unintuitive relationships that need to utilize mathematical disciplines, data models and algorithms that haven’t even been discovered or invented yet and when eventually discovered, will make current database science positively primordial.
At some point in the future, machines will be able to predict, based on big, perhaps unknown data types when someone is having a bad day or a good day, or more importantly whether a person may behave in a good or bad way. Many people do this now when they take a glance at someone across a room and infer how that person is feeling or what they will do next. They see eyes that are shiny or dull, crinkles around eyes or sides of mouths, then hear the “tone” in a voice and then their neurons put it altogether that this is a person that is having a bad day and needs a hug. Quickly. No one knows exactly how the human brain does this, but it does what it does and we go with it and we are usually right.
And some day, Big Data will be able to derive this and it will be an evolution point and it will also be a big business opportunity. Through bigger and better data ingestion and integration techniques and more sophisticated math and data models, a machine will do this fast and relatively speaking, cheaply. The vast majority won’t understand why or how it’s done, but it will work and it will be fairly accurate.
And my question to you all is this.
Do you see any other alternate scenarios regarding the future of big data? Is contextual computing an important evolution and will big data integration be more or less of a problem in the future.
PS. Oh yeah, one last thing to chew on concerning Big Data… If Big Data becomes big enough, does that spell the end of modelling as we know it?
I’ve been having some interesting conversations with work colleagues recently about the Big Data hubbub and I’ve come to the conclusion that “Big Data” as hyped is neither, really. In fact, both terms are relative. “Big” 20 years ago to many may have been 1 terabyte. “Data” 20 years ago may have been Flat files, Sybase, Oracle, Informix, SQL Server or DB2 tables. Fast forward to today and “Big” is now Exabytes (or millions of terabytes). “Data” are now expanded to include events, sensors, messages, RFID, telemetry, GPS, accelerometers, magnetometers, IoT / M2M and other new and evolving data classifications.
And then there’s social and search data.
Surely you would classify Google data as really really big data – I can tell when I do a search, and get 487,464,685 answers within fractions of a second that they appear to have gotten a handle on their big data speeds and feeds. However, it’s also telling that nearly all of those bazillion results are actually not relevant to what I am searching for.
My conclusion is that if you have the right algorithms, invest in and use the right hardware and software technology and make sure to measure the pertinent data sources, harnessing big data can yield speedy &“big”results.
So what’s the rub then?
It usually boils down to having larger and more sophisticated data stores and still not understanding its structure, OR it can’t be integrated into cohesive formats, OR there is important hidden meaning in the data that we don’t have the wherewithal to derive, see or understand a la Google? So how DO you find the timely and important information out of your company’s big data (AKA the needle in the haystack)?
More to the point, how do you better ingest, integrate, parse, analyze, prepare, and cleanse your data to get the speed, but also the relevancy in a Big Data world?
Hadoop related tools are one of the current technologies of choice when it comes to solving Big Data related problems, and as an Informatica customer, you can leverage these tools, regardless of whether it’s Big Data or Not So Big Data, fast data or slow data. In fact, it actually astounds me that many IT professionals would want to go back to hand coding with a Hadoop tool just because they don’t know that the tools to do so are right under their nose, installed and running in their familiar Informatica User Interface (AND that work with Hadoop right out of the box.)
So what does your company get out of using Informatica in conjunction with Hadoop tools? Namely, better customer service and responsiveness, better operational efficiencies, more effective supply chains, better governance, service assurance, and the ability to discover previously unknown opportunities as well as stopping problems when they are an issue – not after the fact. In other words, Big Data done right can be a great advantage to many of today’s organizations.
Much more to say on this this subject as I delve into the future of Big Data. For more, see Part 2.
In a previous life, I was a pastry chef in a now-defunct restaurant. One of the things I noticed while working there (and frankly while cooking at home) is that the better the ingredients, the better the final result. If we used poor quality apples in the apple tart, we ended up with a soupy, flavorless mess with a chewy crust.
The same analogy can be applied to Data Analytics. With poor quality data, you get poor results from your analytics projects. We all know that companies that can implement fantastic analytic solutions that can provide near real-time access to consumer trends are the same companies that can do successful targeted marketing campaigns that are of the minute. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year.
The business impact of poor data quality cannot be underestimated. If not identified and corrected early on, defective data can contaminate all downstream systems and information assets, jacking up costs, jeopardizing customer relationships, and causing imprecise forecasts and poor decisions.
- To help you quantify: Let’s say your company receives 2 million claims per month with 377 data elements per claim. Even at an error rate of .001, the claims data contains more than 754,000 errors per month and more than 9.04 million errors per year! If you determine that 10 percent of the data elements are critical to your business decisions and processes, you still must fix almost 1 million errors each year!
- What is your exposure to these errors? Let’s estimate the risk at $10 per error (including staff time required to fix the error downstream after a customer discovers it, the loss of customer trust and loyalty and erroneous payouts. Your company’s risk exposure to poor quality claims data is $10 million a year.
Once your company values quality data as a critical resource – it is much easier to perform high-value analytics that have an impact on your bottom line. Start with creation of a Data Quality program. Data is a critical asset in the information economy, and the quality of a company’s data is a good predictor of its future success.
How Do You Like It? How Do You Like It? More, More More!
Chiefmartec came out with their 2015 Marketing Technology Landscape, and if there’s one word that comes to mind, it’s MORE. 1,876 corporate logos dot the page, up from 947 in 2014. That’s definitely more, more, more – just about double to be exact. I’m honestly not sure it’s possible to squeeze any more in a single image?
But it’s strangely fitting, because this is the reality that we marketers live in. There are an infinite number of new technologies, approaches, social media platforms, operations tools, and vendors that we have to figure out. New, critical categories of technology roll out constantly. New vendors enter and exit the landscape. As Chiefmartec says “at least on the immediate horizon, I don’t think we’re going to see a dramatic shrinking of this landscape. The landscape will change, for sure. What qualifies as “marketing” and “technology” under the umbrella of marketing technology will undoubtedly morph. But if mere quantity is the metric we’re measuring, I think it’s going to be a world of 1,000+ marketing technology companies — perhaps even a world of 2,000+ of them — for some time to come.”
Middleware: I’m Coming Up So You’d Better Get This Party Started!
One thing you’ll notice if you look carefully between last year’s and this year’s version, is the arrival of the middleware layer. Chiefmartec spends quite a bit of time talking about middleware, pointing out that great tools in the category are making the marketing technology landscape easier to manage – particularly those that handle a hybrid of on premise and cloud.
Marketers have long since cared about the things on the top – the red “Marketing Experiences” and the orange “Marketing Operations”. They’ve also put a lot of focus in the dark gray/black/blue layer “Backbone Platforms” like marketing autionation & e-commerce. But only recently has that yellow middleware layer become front and center for marketers. Data integration, data management platforms, connectivity, data quality, and API’s are definitely not new to the technology landscape, and have been a critical domain of IT for decades. But as marketers are becoming more and more skilled and reliant on analytics and focused customer experience management, data is entering the forefront.
Marketers cannot focus exclusively on their Salesforce CRM, their Marketo automation, or their Adobe Experience Manager web management. Data Ready marketers realize that each of these applications can no longer be run in a silo, they need to be looked at collectively as a powerful set of tools designed to engage the customer and push them through the buying cycle, as critical pieces to the same puzzle. And to do that, they need to be looking at connecting their data sources, powering them with great data, analyzing and measuring their results, and then deciding what to do.
If you squint, you can see Informatica in the yellow Middleware layer. (I could argue that it belongs in several of these yellow boxes, not just Cloud integration, but I’ll save that for another blog!) Some might say that’s not very exciting, but I would argue that Informtaica is in a tremendous place to help marketers succeed with great data. And it all comes down to two words… complexity and change.
Why You Have to Go and Make Things So Complicated?
Ok, admittedly terrible grammar, but you get the picture. Marketers live in a trendounsly complex world. Sure you don’t have all 1,876 of the logos on the Technology Landscape in house. You probably don’t eve have one from each of the 43 categories. But you definitely have a lot of different tecnology solutions that you rely upon on a day-to-day basis. According to a September article by ChiefMarTech, most marketers already regularly rely on more than 100 software programs.
Data ready marketers realize that their environments are complicated, and that they need a foundation. They need a platform of great data that all of their various applications and tools can leverage, and that can actually connect all of their various applications and tools together. They need to be able to connect to just about anything from just about anything. They need a complete view of all of their interactions their customers. In short, they need to make their extremely complicated world more simple, streamlined, and complete.
Ch-Ch-Ch-Ch-Changes. Turn and Face the Strange!
I have a tendency to misunderstand lyrics, so I have to confess that until I looked up this song today, I thought the lyric was “time to face the pain” (Bowie fans, I hang my head in shame!). But quite honestly, “turn and face the strange” illustrates my point just as well!
There is no question that marketing has changed dramatically in the past few years. Your most critical marketing tools and processes two years ago are almost certainly different than those this year, and will almost certainly be different from what you see two years from now. Marketers realize this. The Marketing Technology Landscape illustrates this every year!
The data ready marketer understands that their toolbox will change, but that their data will be the foundation for whatever new piece of the technology puzzle they embrace or get rid of. Building a foundation of great data will power any technology solution or new approach.
Data ready marketers also work with their IT counterparts to engineer for change. They make sure that no matter what technology or data source they want to add – no matter how strange or unthinkable it is today – they never have to start from scratch. They can connect to what they want, when they want, leveraging great data, and ultimately making great decisions.
Get Ready ‘Cause Here I Come. The Era of the Data Ready Marketer is Here
Now that you have a few catchy tunes stuck in your head, it’s time to ask yourself, are you data ready? Are you ready to embrace the complexity of marketing technology landscape? Are you ready to think about change as a competitive weapon?
I encourage you to take our survey about data ready marketing. The results are coming out soon so don’t miss your chance to be a part. You can find the link here.
Also, follow me on twitter – The Data Ready Marketer (@StephanieABest) for some of the latest & greatest news and insights on the world of data ready marketing.
And stay tuned because we have several new Data Ready Marketing pieces coming out soon – InfoGraphics, eBooks, SlideShares, and more!
To level set, let’s make sure you understand my definition of dark data. I prefer using visualizations when I can so, picture this: the end of the first Indiana Jones movie, Raiders of the Lost Ark. In this scene, we see the Ark of the Covenant, stored in a generic container, being moved down the aisle in a massive warehouse full of other generic containers. What’s in all those containers? It’s pretty much anyone’s guess. There may be a record somewhere, but, for all intents and purposes, the materials stored in those boxes are useless.
Applying this to data, once a piece of data gets shoved into some generic container and is stored away, just like the Arc, the data becomes essentially worthless. This is dark data.
Opening up a government agency to all its dark data can have significant impacts, both positive and negative. Here are couple initial tips to get you thinking in the right direction:
- Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
- Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
- Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
- Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
- Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
- Store it – it’s interesting how often agencies think this is the first step
- Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
- Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.
Clearly, there’s more to shining the light on dark data than I can offer in this post. If you’d like to take the next step to learning what is possible, I suggest you download the eBook, The Dark Data Imperative.
A month ago, I shared that Frank Friedman believes CFOs are “the logical choice to own analytics and put them to work to serve the organization’s needs”. Even though many CFOs are increasingly taking on what could be considered an internal CEO or COO role, many readers protested my post which focused on reviewing Frank Friedman’s argument. At the same time, CIOs have been very clear with me that they do not want to personally become their company’s data steward. So the question becomes should companies be creating a CDO or CAO role to lead this important function? And if yes, how common are these two roles anyway?
Regardless of eventual ownership, extracting value out of data is becoming a critical business capability. It is clear that data scientists should not be shoe horned into the traditional business analyst role. Data Scientists have the unique ability to derive mathematical models “for the extraction of knowledge from data “(Data Science for Business, Foster Provost, 2013, pg 2). For this reason, Thomas Davenport claims that data scientists need to be able to network across an entire business and be able to work at the intersection of business goals, constraints, processes, available data and analytical possibilities. Given this, many organizations today are starting to experiment with the notion of having either a chief data officers (CDOs) or chief analytics officers (CAOs). The open questions is should an enterprise have a CDO or a CAO or both? And as important in the end, it is important to determine where should each of these roles report in the organization?
Data policy versus business questions
In my opinion, it is the critical to first look into the substance of each role before making a decision with regards to the above question. The CDO should be about ensuring that information is properly secured, stored, transmitted or destroyed. This includes, according to COBIT 5, that there are effective security and controls over information systems. To do this, procedures need to be defined and implemented to ensure the integrity and consistency of information stored in databases, data warehouses, and data archives. According to COBIT 5, data governance requires the following four elements:
- Clear information ownership
- Timely, correct information
- Clear enterprise architecture and efficiency
- Compliance and security
To me, these four elements should be the essence of the CDO role. Having said this, the CAO is related but very different in terms of the nature of the role and the business skills require. The CRISP model points out just how different the two roles are. According to CRISP, the CAO role should be focused upon business understanding, data understanding, data preparation, data modeling, and data evaluation. As such the CAO is focused upon using data to solve business problems while the CDO is about protecting data as a business critical asset. I was living in in Silicon Valley during the “Internet Bust”. I remember seeing very few job descriptions and few job descriptions that existed said that they wanted a developer who could also act as a product manager and do some marketing as a part time activity. This of course made no sense. I feel the same way about the idea of combining the CDO and CAO. One is about compliance and protecting data and the other is about solving business problems with data. Peanut butter and chocolate may work in a Reese’s cup but it will not work here—the orientations are too different.
So which business leader should own the CDO and CAO?
Clearly, having two more C’s in the C-Suite creates a more crowded list of corporate officers. Some have even said that this will extended what is called senior executive bloat. And what of course how do these new roles work with and impact the CIO? The answer depends on organization’s culture, of course. However, where there isn’t an executive staff office, I suggest that these roles go to different places. Clearly, many companies already have their CIO function already reporting to finance. Where this is the case, it is important determine whether a COO function is in place. The COO clearly could own the CDO and CAO functions because they have a significant role in improving process processes and capabilities. Where there isn’t a COO function and the CIO reports to the CEO, I think you could have the CDO report to the CIO even though CIOs say they do not want to be a data steward. This could be a third function in parallel the VP of Ops and VP of Apps. And in this case, I would put the CAO report to one of the following: the CFO, Strategy, or IT. Again this all depends on current organizational structure and corporate culture. Regardless of where it reports, the important thing is to focus the CAO on an enterprise analytics capability.
Author Twitter: @MylesSuer
According Michelle Fox of CNBC and Stephen Schork, the oil industry is in ‘dire straits’. U.S. crude posted its ninth-straight weekly loss this week, landing under $50 a barrel. The news is bad enough that it is now expected to lead to major job losses. The Dallas Federal Reserve anticipates that the Texas could lose about 125,000 jobs by the end of June. Patrick Jankowski, an economist and vice president of research at the Greater Houston Partnership, expects exploration budgets will be cut 30-35 percent, which will result in approximately 9,000 fewer wells being drilled. The problem is “if oil prices keep falling, at some point it’s not profitable to pull it out of the ground” (“When, and where, oil is too cheap to be profitable”, CNBC, John W. Schoen).
This means that a portion of the world’s oil supply will become unprofitable to produce. According to Wood Mackenzie, “once the oil price reaches these levels, producers have a sometimes complex decision to continue producing, losing money on every barrel produced, or to halt production, which will reduce supply”. The question are these the only answers?
Major Oil Company Uses Analytics to Gain Business Advantage
A major oil company that we are working with has determined that data is a success enabler for their business. They are demonstrating what we at Informatica like to call a “data ready business”—a business that is ready for any change in market conditions. This company is using next generation analytics to ensure their businesses survival and to make sure they do not become what Jim Cramer likes to call a “marginal producer”. This company has said to us that their success is based upon being able to extract oil more efficiently than its competitors.
Historically data analysis was pretty simple
Traditionally oil producers would get oil by drilling a new hole in the ground. And in 6 months they would start getting the oil flowing commercially and be in business. This meant it would typically take them 6 months or longer before they could get any meaningful results including data that could be used to make broader production decisions.
Drilling from data
Today, oil is, also, produced from shale or fracking techniques. This process can take only 30-60 days before oil producers start seeing results. It is based not just on innovation in the refining of oil, but also on innovation in the refining of data from operational business decisions can be made. The benefits of this approach including the following:
Improved fracking process efficiency
Fracking is a very technical process. Producers can have two wells on the same field that are performing at very different levels of efficiency. To address this issue, the oil company that we have been discussing throughout this piece is using real-time data to optimize its oil extraction across an entire oil field or region. Insights derived from these allow them to compare wells in the same region for efficiency or productivity and even switch off certain wells if the oil price drops below profitability thresholds. This ability is especially important as the price of oil continues to drop. At $70/barrel, many operators go into the red while more efficient data driven operators can remain profitable at $40/barrel. So efficiency is critical across a system of wells.
Using data to decide where to build wells in the first place
When constructing a fracking or sands well, you need more information on trends and formulas to extract oil from the ground. On a site with 100+ wells for example, each one is slightly different because of water tables, ground structure, and the details of the geography. You need the right data, the right formula, and the right method to extract the oil at the best price and not impact the environment at the very same time.
The right technology delivers the needed business advantage
Of course, technology is never been simple to implement. The company we are discussing has 1.2 Petabytes of data they were processing and this volume is only increasing. They are running fiber optic cables down into wells to gather data in real time. As a result, they are receiving vast amounts of real time data but cannot store and analyze the volume of data efficiently in conventional systems. Meanwhile, the time to aggregate and run reports can miss the window of opportunity while increasing cost. Making matters worse, this company had a lot of different varieties of data. It also turns out that quite of bit of the useful information in their data sets was in the comments section of their source application. So traditional data warehousing would not help them to extract the information they really need. They decided to move to new technology, Hadoop. But even seemingly simple problems, like getting access to data were an issue within Hadoop. If you didn’t know the right data analyst, you might not get the data you needed in a timely fashion. Compounding things, a lack of Hadoop skills in Oklahoma proved to be a real problem.
The right technology delivers the right capability
The company had been using a traditional data warehousing environment for years. But they needed help to deal with their Hadoop environment. This meant dealing with the volume, variety and quality of their source well data. They needed a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn the internals of Hadoop. Early adopters of Hadoop and other Big Data technologies have had no choice but to hand-code using Java or scripting languages such as Pig or Hive. Hiring and retaining big data experts proved time consuming and costly. This is because data scientists and analysts can spend only 20 percent of their time on data analysis and the rest on the tedious mechanics of data integration such as accessing, parsing, and managing data. Fortunately for this oil producer, it didn’t have to be this way. They were able to get away with none of the specialized coding required to scale performance on distributed computing platforms like Hadoop. Additionally, they were able “Map Once, Deploy Anywhere,” knowing that even as technologies change they can run data integration jobs without having to rebuild data processing flows.
It seems clear that we live in an era where data is at the center of just about every business. Data-ready enterprises are able to adapt and win regardless of changing market conditions. These businesses invested in building their enterprise analytics capability before market conditions change. In this case, these oil producers will be able to produce oil at lower costs than others within their industry. Analytics provides three benefits to oil refiners.
- Better margins and lower costs from operations
- Lowers risk of environmental impact
- Lower time to build a successful well
In essence, those that build analytics as a core enterprise capability will continue to have a right to win within a dynamic oil pricing environment.
Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”