Category Archives: Business Impact / Benefits
Get connected. Be connected. Make connections. Find connections. The Internet of Things (IoT) is all about connecting people, processes, data and, as the name suggests, things. The recent social media frenzy surrounding the ALS Ice Bucket Challenge has certainly reminded everyone of the power of social media, the Internet and a willingness to answer a challenge. Fueled by personal and professional connections, the craze has transformed fund raising for at least one charity. Similarly, IoT may potentially be transformational to the business of the public sector, should government step up to the challenge.
Government is struggling with the concept and reality of how IoT really relates to the business of government, and perhaps rightfully so. For commercial enterprises, IoT is far more tangible and simply more fun. Gaming, televisions, watches, Google glasses, smartphones and tablets are all about delivering over-the-top, new and exciting consumer experiences. Industry is delivering transformational innovations, which are connecting people to places, data and other people at a record pace.
It’s time to accept the challenge. Government agencies need to keep pace with their commercial counterparts and harness the power of the Internet of Things. The end game is not to deliver new, faster, smaller, cooler electronics; the end game is to create solutions that let devices connecting to the Internet interact and share data, regardless of their location, manufacturer or format and make or find connections that may have been previously undetectable. For some, this concept is as foreign or scary as pouring ice water over their heads. For others, the new opportunity to transform policy, service delivery, leadership, legislation and regulation is fueling a transformation in government. And it starts with one connection.
One way to start could be linking previously siloed systems together or creating a golden record of all citizen interactions through a Master Data Management (MDM) initiative. It could start with a big data and analytics project to determine and mitigate risk factors in education or linking sensor data across multiple networks to increase intelligence about potential hacking or breaches. Agencies could stop waste, fraud and abuse before it happens by linking critical payment, procurement and geospatial data together in real time.
This is the Internet of Things for government. This is the challenge. This is transformation.
You probably know this already, but I’m going to say it anyway: It’s time you changed your infrastructure. I say this because most companies are still running infrastructure optimized for ERP, CRM and other transactional systems. That’s all well and good for running IT-intensive, back-office tasks. Unfortunately, this sort of infrastructure isn’t great for today’s business imperatives of mobility, cloud computing and Big Data analytics.
Virtually all of these imperatives are fueled by information gleaned from potentially dozens of sources to reveal our users’ and customers’ activities, relationships and likes. Forward-thinking companies are using such data to find new customers, retain existing ones and increase their market share. The trick lies in translating all this disparate data into useful meaning. And to do that, IT needs to move beyond focusing solely on transactions, and instead shine a light on the interactions that matter to their customers, their products and their business processes.
They need what we at Informatica call a “Data First” perspective. You can check out my first blog first about being Data First here.
A Data First POV changes everything from product development, to business processes, to how IT organizes itself and —most especially — the impact IT has on your company’s business. That’s because cloud computing, Big Data and mobile app development shift IT’s responsibilities away from running and administering equipment, onto aggregating, organizing and improving myriad data types pulled in from internal and external databases, online posts and public sources. And that shift makes IT a more-empowering force for business change. Think about it: The ability to connect and relate the dots across data from multiple sources finally gives you real power to improve entire business processes, departments and organizations.
I like to say that the role of IT is now “big I, little t,” with that lowercase “t” representing both technology and transactions. But that role requires a new set of priorities. They are:
- Think about information infrastructure first and application infrastructure second.
- Create great data by design. Architect for connectivity, cleanliness and security. Check out the eBook Data Integration for Dummies.
- Optimize for speed and ease of use – SaaS and mobile applications change often. Click here to try Informatica Cloud for free for 30 days.
- Make data a team sport. Get tools into your users’ hands so they can prepare and interact with it.
I never said this would be easy, and there’s no blueprint for how to go about doing it. Still, I recognize that a little guidance will be helpful. In a few weeks, Informatica’s CIO Eric Johnson and I will talk about how we at Informatica practice what we preach.
Malcolm Gladwell wrote an article in The New Yorker magazine in January, 2007 entitled “Open Secrets.” In the article, he pointed out that a national-security expert had famously made a distinction between puzzles and mysteries.
Osama bin Laden’s whereabouts were, for many years, a puzzle. We couldn’t find him because we didn’t have enough information. The key to the puzzle, it was assumed, would eventually come from someone close to bin Laden, and until we could find that source, bin Laden would remain at large. In fact, that’s precisely what happened. Al-Qaida’s No. 3 leader, Khalid Sheikh Mohammed, gave authorities the nicknames of one of bin Laden’s couriers, who then became the linchpin to the CIA’s efforts to locate Bin Laden.
By contrast, the problem of what would happen in Iraq after the toppling of Saddam Hussein was a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much.
This was written before “Big Data” was a household word and it begs the very interesting question of whether organizations and corporations that are, by anyone’s standards, totally deluged with data, are facing puzzles or mysteries. Consider the amount of data that a company like Western Union deals with.
Western Union is a 160-year old company. Having built scale in the money transfer business, the company is in the process of evolving its business model by enabling the expansion of digital products, growth of web and mobile channels, and a more personalized online customer experience. Sounds good – but get this: the company processes more than 29 transactions per seconds on average. That’s 242 million consumer-to-consumer transactions and 459 million business payments in a year. Nearly a billion transactions – a billion! As my six-year-old might say, that number is big enough “to go to the moon and back.” Layer on top of that the fact that the company operates in 200+ countries and territories, and conducts business in 120+ currencies. Senior Director and Head of Engineering Abhishek Banerjee has said, “The data is speaking to us. We just need to react to it.” That implies a puzzle, not a mystery – but only if data scientists are able to conduct statistical modeling and predictive analysis, systematically noting trends in sending and receiving behaviors. Check out what Banerjee and Western Union CTO Sanjay Saraf have to say about it here.
Or consider General Electric’s aggressive and pioneering move into what’s dubbed as the industrial internet. In a white paper entitled “The Case for an Industrial Big Data Platform: Laying the Groundwork for the New Industrial Age,” GE reveals some of the staggering statistics related to the industrial equipment that it manufactures and supports (services comprise 75% of GE’s bottom line):
- A modern wind turbine contains approximately 50 sensors and control loops which collect data every 40 milliseconds.
- A farm controller then receives more than 30 signals from each turbine at 160-millisecond intervals.
- At every one-second interval, the farm monitoring software processes 200 raw sensor data points with various associated properties with each turbine.
Phew! I’m no electricity operations expert, and you probably aren’t either. And most of us will get no further than simply wrapping our heads around the simple fact that GE turbines are collecting a LOT of data. But what the paper goes on to say should grab your attention in a big way: “The key to success for this wind farm lies in the ability to collect and deliver the right data, at the right velocity, and in the right quantities to a wide set of well-orchestrated analytics.” And the paper goes on to recommend that anyone involved in the Industrial Internet revolution strongly consider its talent requirements, with the suggestion that Chief Data officers and/or Data Scientists may be the next critical hires.
Which brings us back to Malcolm Gladwell. In the aforementioned article, Gladwell goes on to pull apart the Enron debacle, and argues that it was a prime example of the perils of too much information. “If you sat through the trial of (former CEO) Jeffrey Skilling, you’d think that the Enron scandal was a puzzle. The company, the prosecution said, conducted shady side deals that no one quite understood. Senior executives withheld critical information from investors…We were not told enough—the classic puzzle premise—was the central assumption of the Enron prosecution.” But in fact, that was not true. Enron employed complicated – but perfectly legal–accounting techniques used by companies that engage in complicated financial trading. Many journalists and professors have gone back and looked at the firm’s regulatory filings, and have come to the conclusion that, while complex and difficult to identify, all of the company’s shenanigans were right there in plain view. Enron cannot be blamed for covering up the existence of its side deals. It didn’t; it disclosed them. As Gladwell summarizes:
“Puzzles are ‘transmitter-dependent’; they turn on what we are told. Mysteries are ‘receiver dependent’; they turn on the skills of the listener.”
I would argue that this extremely complex, fast moving and seismic shift that we call Big Data will favor those who have developed the ability to attune, to listen and make sense of the data. Winners in this new world will recognize what looks like an overwhelming and intractable mystery, and break that mystery down into small and manageable chunks and demystify the landscape, to uncover the important nuggets of truth and significance.
A mid-sized insurer recently approached our team for help. They wanted to understand how they fell short in making their case to their executives. Specifically, they proposed that fixing their customer data was key to supporting the executive team’s highly aggressive 3-year growth plan. (This plan was 3x today’s revenue). Given this core organizational mission – aside from being a warm and fuzzy place to work supporting its local community – the slam dunk solution to help here is simple. Just reducing the data migration effort around the next acquisition or avoiding the ritual annual, one-off data clean-up project already pays for any tool set enhancing data acquisitions, integration and hygiene. Will it get you to 3x today’s revenue? It probably won’t. What will help are the following:
Hard cost avoidance via software maintenance or consulting elimination is the easy part of the exercise. That is why CFOs love it and focus so much on it. It is easy to grasp and immediate (aka next quarter).
Soft cost reduction, like staff redundancies are a bit harder. Despite them being viable, in my experience very few decision makers want work on a business case to lay off staff. My team had one so far. They look at these savings as freed up capacity, which can be re-deployed more productively. Productivity is also a bit harder to quantify as you typically have to understand how data travels and gets worked on between departments.
However, revenue effects are even harder and esoteric to many people as they include projections. They are often considered “soft” benefits, although they outweigh the other areas by 2-3 times in terms of impact. Ultimately, every organization runs their strategy based on projections (see the insurer in my first paragraph).
The hardest to quantify is risk. Not only is it based on projections – often from a third party (Moody’s, TransUnion, etc.) – but few people understand it. More often, clients don’t even accept you investigating this area if you don’t have an advanced degree in insurance math. Nevertheless, risk can generate extra “soft” cost avoidance (beefing up reserve account balance creating opportunity cost) but also revenue (realizing a risk premium previously ignored). Often risk profiles change due to relationships, which can be links to new “horizontal” information (transactional attributes) or vertical (hierarchical) from parent-child relationships of an entity and the parent’s or children’s transactions.
Given the above, my initial advice to the insurer would be to look at the heartache of their last acquisition, use a benchmark for IT productivity from improved data management capabilities (typically 20-26% – Yankee Group) and there you go. This is just the IT side so consider increasing the upper range by 1.4x (Harvard Business School) as every attribute change (last mobile view date) requires additional meetings on a manager, director and VP level. These people’s time gets increasingly more expensive. You could also use Aberdeen’s benchmark of 13hrs per average master data attribute fix instead.
You can also look at productivity areas, which are typically overly measured. Let’s assume a call center rep spends 20% of the average call time of 12 minutes (depending on the call type – account or bill inquiry, dispute, etc.) understanding
- Who the customer is
- What he bought online and in-store
- If he tried to resolve his issue on the website or store
- How he uses equipment
- What he cares about
- If he prefers call backs, SMS or email confirmations
- His response rate to offers
- His/her value to the company
If he spends these 20% of every call stringing together insights from five applications and twelve screens instead of one frame in seconds, which is the same information in every application he touches, you just freed up 20% worth of his hourly compensation.
Then look at the software, hardware, maintenance and ongoing management of the likely customer record sources (pick the worst and best quality one based on your current understanding), which will end up in a centrally governed instance. Per DAMA, every duplicate record will cost you between $0.45 (party) and $0.85 (product) per transaction (edit touch). At the very least each record will be touched once a year (likely 3-5 times), so multiply your duplicated record count by that and you have your savings from just de-duplication. You can also use Aberdeen’s benchmark of 71 serious errors per 1,000 records, meaning the chance of transactional failure and required effort (% of one or more FTE’s daily workday) to fix is high. If this does not work for you, run a data profile with one of the many tools out there.
If standardization of records (zip codes, billing codes, currency, etc.) is the problem, ask your business partner how many customer contacts (calls, mailing, emails, orders, invoices or account statements) fail outright and/or require validation because of these attributes. Once again, if you apply the productivity gains mentioned earlier, there are you savings. If you look at the number of orders that get delayed in form of payment or revenue recognition and the average order amount by a week or a month, you were just able to quantify how much profit (multiply by operating margin) you would be able to pull into the current financial year from the next one.
The same is true for speeding up the introduction or a new product or a change to it generating profits earlier. Note that looking at the time value of funds realized earlier is too small in most instances especially in the current interest environment.
If emails bounce back or snail mail gets returned (no such address, no such name at this address, no such domain, no such user at this domain), e(mail) verification tools can help reduce the bounces. If every mail piece (forget email due to the miniscule cost) costs $1.25 – and this will vary by type of mailing (catalog, promotion post card, statement letter), incorrect or incomplete records are wasted cost. If you can, use fully loaded print cost incl. 3rd party data prep and returns handling. You will never capture all cost inputs but take a conservative stab.
If it was an offer, reduced bounces should also improve your response rate (also true for email now). Prospect mail response rates are typically around 1.2% (Direct Marketing Association), whereas phone response rates are around 8.2%. If you know that your current response rate is half that (for argument sake) and you send out 100,000 emails of which 1.3% (Silverpop) have customer data issues, then fixing 81-93% of them (our experience) will drop the bounce rate to under 0.3% meaning more emails will arrive/be relevant. This in turn multiplied by a standard conversion rate (MarketingSherpa) of 3% (industry and channel specific) and average order (your data) multiplied by operating margin gets you a benefit value for revenue.
If product data and inventory carrying cost or supplier spend are your issue, find out how many supplier shipments you receive every month, the average cost of a part (or cost range), apply the Aberdeen master data failure rate (71 in 1,000) to use cases around lack of or incorrect supersession or alternate part data, to assess the value of a single shipment’s overspend. You can also just use the ending inventory amount from the 10-k report and apply 3-10% improvement (Aberdeen) in a top-down approach. Alternatively, apply 3.2-4.9% to your annual supplier spend (KPMG).
You could also investigate the expediting or return cost of shipments in a period due to incorrectly aggregated customer forecasts, wrong or incomplete product information or wrong shipment instructions in a product or location profile. Apply Aberdeen’s 5% improvement rate and there you go.
Consider that a North American utility told us that just fixing their 200 Tier1 suppliers’ product information achieved an increase in discounts from $14 to $120 million. They also found that fixing one basic out of sixty attributes in one part category saves them over $200,000 annually.
So what ROI percentages would you find tolerable or justifiable for, say an EDW project, a CRM project, a new claims system, etc.? What would the annual savings or new revenue be that you were comfortable with? What was the craziest improvement you have seen coming to fruition, which nobody expected?
Next time, I will add some more “use cases” to the list and look at some philosophical implications of averages.
This post was written by guest author Dale Kim, Director of Industry Solutions at MapR Technologies, a valued Informatica partner that provides a distribution for Apache Hadoop that ensures production success for its customers.
Apache Hadoop is growing in popularity as the foundation for an enterprise data hub. An Enterprise Data Hub (EDH) extends and optimizes the traditional data warehouse model by adding complementary big data technologies. It focuses your data warehouse on high value data by reallocating less frequently used data to an alternative platform. It also aggregates data from previously untapped sources to give you a more complete picture of data.
So you have your data, your warehouses, your analytical tools, your Informatica products, and you want to deploy an EDH… now what about Hadoop?
Requirements for Hadoop in an Enterprise Data Hub
Let’s look at characteristics required to meet your EDH needs for a production environment:
You already expect these from your existing enterprise deployments. Shouldn’t you hold Hadoop to the same standards? Let’s discuss each topic:
Enterprise-grade is about the features that keep a system running, i.e., high availability (HA), disaster recovery (DR), and data protection. HA helps a system run even when components (e.g., computers, routers, power supplies) fail. In Hadoop, this means no downtime and no data loss, but also no work loss. If a node fails, you still want jobs to run to completion. DR with remote replication or mirroring guards against site-wide disasters. Mirroring needs to be consistent to ensure recovery to a known state. Using file copy tools won’t cut it. And data protection, using snapshots, lets you recover from data corruption, especially from user or application errors. As with DR replicas, snapshots must be consistent, in that they must reflect the state of the data at the time the snapshot was taken. Not all Hadoop distributions can offer this guarantee.
Hadoop interoperability is an obvious necessity. Features like a POSIX-compliant, NFS-accessible file system let you reuse existing, file system-based applications on Hadoop data. Support for existing tools lets your developers get up to speed quickly. And integration with REST APIs enables easy, open connectivity with other systems.
You should be able to logically divide clusters to support different use cases, job types, user group, and administrators as needed. To avoid a complex, multi-cluster setup, choose a Hadoop distribution with multi-tenancy capabilities to simplify the architecture. This gives you less risk for error and no data/effort duplication.
Security should be a priority to protect against the exposure of confidential data. You should assess how you’ll handle authentication (with or without Kerberos), authorization (access controls), over-the-network encryption, and auditing. Many of these features should be native to your Hadoop distribution, and there are also strong security vendors that provide technologies for securing Hadoop.
Any large scale deployment needs fast read, write, and update capabilities. Hadoop can support the operational requirements of an EDH with integrated, in-Hadoop databases like Apache HBase™ and Accumulo™, as well as MapR-DB (the MapR NoSQL database). This in-Hadoop model helps to simplify the overall EDH architecture.
Using Hadoop as a foundation for an EDH is a powerful option for businesses. Choosing the correct Hadoop distribution is the key to deploying a successful EDH. Be sure not to take shortcuts – especially in a production environment – as you will want to hold your Hadoop platform to the same high expectations you have of your existing enterprise systems.
“Inaccurate, inconsistent and disconnected supplier information prohibits us from doing accurate supplier spend analysis, leveraging discounts, comparing and choosing the best prices, and enforcing corporate standards.”
This is quotation from a manufacturing company executive. It illustrates the negative impact that poorly managed supplier information can have on a company’s ability to cut costs and achieve revenue targets.
Many supply chain and procurement teams at large companies struggle to see the total relationship they have with suppliers across product lines, business units and regions. Why? Supplier information is scattered across dozens or hundreds of Enterprise Resource Planning (ERP) and Accounts Payable (AP) applications. Too much valuable time is spent manually reconciling inaccurate, inconsistent and disconnected supplier information in an effort to see the big picture. All this manual effort results in back office administrative costs that are higher than they should be.
Do these quotations from supply chain leaders and their teams sound familiar?
“We have 500,000 suppliers. 15-20% of our supplier records are duplicates. 5% are inaccurate.”
“I get 100 e-mails a day questioning which supplier to use.”
“To consolidate vendor reporting for a single supplier between divisions is really just a guess.”
“Every year 1099 tax mailings get returned to us because of invalid addresses, and we play a lot of Schedule B fines to the IRS.”
“Two years ago we spent a significant amount of time and money cleansing supplier data. Now we are back where we started.”
Please join me and Naveen Sharma, Director of the Master Data Management (MDM) Practice at Cognizant for a Webinar, Supercharge Your Supply Chain Applications with Better Supplier Information, on Tuesday, July 29th at 11 am PT.
During the Webinar, we’ll explain how better managing supplier information can help you achieve the following goals:
- Accelerate supplier onboarding
- Mitiate the risk of supply disruption
- Better manage supplier performance
- Streamline billing and payment processes
- Improve supplier relationship management and collaboration
- Make it easier to evaluate non-compliance with Service Level Agreements (SLAs)
- Decrease costs by negotiating favorable payment terms and SLAs
I hope you can join us for this upcoming Webinar!
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.
1) It’s difficult to find and retain resource skills to staff big data projects
The biggest challenge by far that we see with Big Data is that it is difficult to find and retain the resources skills to staff Big Data projects. The fact that “Its expertise is scarce and expensive” is the #1 concern about using Big Data according to an Information Week survey of 541 business technology professionals (1). And according to Gartner by 2015, only a third of the 4.4 million big data related jobs will be filled (5)
2) It takes too long to deploy Big Data projects from ‘proof-of-concept’ to production
At Hadoop Summit in June 2014, one of the largest Big Data conferences in the world, Gartner stated in their keynote that only about 30% of Hadoop implementations are in production (4). This observation highlights the second challenge which is that it takes too long to deploy Big Data projects from the ‘proof-of-concept’ phase into production.
3) Big data technologies are evolving too quickly to adapt
With the related market projected to grow from $28.5 billion in 2014 to $50.1 billion in 2015 according to Wikibon (6), Big Data technologies are emerging and evolving extremely fast. This in turn becomes a barrier to innovation since these technologies evolve much too quickly for most organizations to adopt before the next big thing comes along.
4) Big Data projects fail to deliver the expected value
Too many Big Data projects start off as science experiments and fail to deliver the expected value primarily because of inaccurate scope. They underestimate what it takes to integrate, operationalize, and deliver actionable information at production scale. According to an InfoChimp survey of 300 IT professionals “55% of big data projects don’t get completed and many others fall short of their objectives” (3)
4) It’s difficult to make Big Data fit-for-purpose, assess trust, and ensure security
Uncertainty is inherent to Big Data when dealing with a wide variety of large data sets coming from external data sources such as social, mobile, and sensor devices. Therefore, organizations often struggle to make their data fit-for-purpose, assessing the level of trust, and ensuring data level security. According to Gartner, “Business leaders recognize that big data can help deliver better business results through valuable insights. Without an understanding of the trust implicit in the big data (and applying information trust models), organizations maybe be taking risks that undermine the value they seek.” (2)
For more information on “How Informatica Tackles the Top 5 Big Data Challenges,” see the blog post here.
- InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals
- Big Data Governance From Truth to Trust, Gartner Research Note, July 2013
- “CIOs & Big Data: What Your IT Team Wants You to Know,“ – Infochimps conducted its survey of 300 IT staffers with assistance from enterprise software community site SSWUG.ORG. http://visual.ly/cios-big-data
- Gartner presentation, Hadoop Summit 2014
- Predicts 2013: Big Data and Information Infrastructure, Gartner, November 2012
- Wikibon Big Data Vendor Revenue and Market Forecast 2013-2017
“Not only do we underestimate the cost for projects up to 150%, but we overestimate the revenue it will generate.” This quotation from an Energy & Petroleum (E&P) company executive illustrates the negative impact of inaccurate, inconsistent and disconnected well data and asset data on revenue potential.
“Operational Excellence” is a common goal of many E&P company executives pursuing higher growth targets. But, inaccurate, inconsistent and disconnected well data and asset data may be holding them back. It obscures the complete picture of the well information lifecycle, making it difficult to maximize production efficiency, reduce Non-Productive Time (NPT), streamline the oilfield supply chain, calculate well by-well profitability, and mitigate risk.
To explain how E&P companies can better manage well data and asset data, we hosted a webinar, “Attention E&P Executives: Streamlining the Well Information Lifecycle.” Our well data experts Stephanie Wilkin, Senior Principal Consultant at Noah Consulting, and Stephan Zoder, Director of Value Engineering at Informatica shared some advice. E&P companies should reevaluate “throwing more bodies at a data cleanup project twice a year.” This approach does not support the pursuit of operational excellence.
In this interview, Stephanie shares details about the award-winning collaboration between Noah Consulting and Devon Energy to create a single trusted source of well data, which is standardized and mastered.
Q. Congratulations on winning the 2014 Innovation Award, Stephanie!
A. Thanks Jakki. It was really exciting working with Devon Energy. Together we put the technology and processes in place to manage and master well data in a central location and share it with downstream systems on an ongoing basis. We were proud to win the 2014 Innovation Award for Best Enterprise Data Platform.
Q. What was the business need for mastering well data?
A. As E&P companies grow so do their needs for business-critical well data. All departments need clean, consistent and connected well data to fuel their applications. We implemented a master data management (MDM) solution for well data with the goals of improving information management, business productivity, organizational efficiency, and reporting.
Q. How long did it take to implement the MDM solution for well data?
A. The Devon Energy project kicked off in May of 2012. Within five months we built the complete solution from gathering business requirements to development and testing.
Q. What were the steps in implementing the MDM solution?
A: The first and most important step was securing buy-in on a common definition for master well data or Unique Well Identifier (UWI). The key was to create a definition that would meet the needs of various business functions. Then we built the well master, which would be consistent across various systems, such as G&G, Drilling, Production, Finance, etc. We used the Professional Petroleum Data Management Association (PPDM) data model and created more than 70 unique attributes for the well, including Lahee Class, Fluid Direction, Trajectory, Role and Business Interest.
As part of the original go-live, we had three source systems of well data and two target systems connected to the MDM solution. Over the course of the next year, we added three additional source systems and four additional target systems. We did a cross-system analysis to make sure every department has the right wells and the right data about those wells. Now the company uses MDM as the single trusted source of well data, which is standardized and mastered, to do analysis and build reports.
Q. What’s been the traditional approach for managing well data?
A. Typically when a new well is created, employees spend time entering well data into their own systems. For example, one person enters well data into the G&G application. Another person enters the same well data into the Drilling application. A third person enters the same well data into the Finance application. According to statistics, it takes about 30 minutes to enter wells into a particular financial application.
So imagine if you need to add 500 new wells to your systems. This is common after a merger or acquisition. That translates to roughly 250 hours or 6.25 weeks of employee time saved on the well create process! By automating across systems, you not only save time, you eliminate redundant data entry and possible errors in the process.
Q. That sounds like a painfully slow and error-prone process.
A. It is! But that’s only half the problem. Without a single trusted source of well data, how do you get a complete picture of your wells? When you compare the well data in the G&G system to the well data in the Drilling or Finance systems, it’s typically inconsistent and difficult to reconcile. This leads to the question, “Which one of these systems has the best version of the truth?” Employees spend too much time manually reconciling well data for reporting and decision-making.
Q. So there is a lot to be gained by better managing well data.
A. That’s right. The CFO typically loves the ROI on a master well data project. It’s a huge opportunity to save time and money, boost productivity and get more accurate reporting.
Q: What were some of the business requirements for the MDM solution?
A: We couldn’t build a solution that was narrowly focused on meeting the company’s needs today. We had to keep the future in mind. Our goal was to build a framework that was scalable and supportable as the company’s business environment changed. This allows the company to add additional data domains or attributes to the well data model at any time.
Q: Why did you choose Informatica MDM?
A: The decision to use Informatica MDM for the MDM Trust Framework came down to the following capabilities:
- Match and Merge: With Informatica, we get a lot of flexibility. Some systems carry the API or well government ID, but some don’t. We can match and merge records differently based on the system.
- X-References: We keep a cross-reference between all the systems. We can go back to the master well data and find out where that data came from and when. We can see where changes have occurred because Informatica MDM tracks the history and lineage.
- Scalability: This was a key requirement. While we went live after only 5 months, we’ve been continually building out the well master based on the requiremets of the target systems.
- Flexibility: Down the road, if we want to add an additional facet or classification to the well master, the framework allows for that.
- Simple Integration: Instead of building point-to-point integrations, we use the hub model.
In addition to Informatica MDM, our Noah Consulting MDM Trust Framework includes Informatica PowerCenter for data integration, Informatica Data Quality for data cleansing and Informatica Data Virtualization.
Q: Can you give some examples of the business value gained by mastering well data?
A: One person said to me, “I’m so overwhelmed! We’ve never had one place to look at this well data before.” With MDM centrally managing master well data and fueling key business applications, many upstream processes can be optimized to achieve their full potential value.
People spend less time entering well data on the front end and reconciling well data on the back end. Well data is entered once and it’s automatically shared across all systems that need it. People can trust that it’s consistent across systems. Also, because the data across systems is now tied together, it provides business value they were unable to realize before, such as predictive analytics.
Q. What’s next?
A. There’s a lot of insight that can be gained by understanding the relationships between the well, and the people, equipment and facilities associated with it. Next, we’re planning to add the operational hierarchy. For example, we’ll be able to identify which production engineer, reservoir engineer and foreman are working on a particular well.
We’ve also started gathering business requirements for equipment and facilities to be tied to each well. There’s a lot more business value on the horizon as the company streamlines their well information lifecycle and the valuable relationships around the well.
If you missed the webinar, you can watch the replay now: Attention E&P Executives: Streamlining the Well Information Lifecycle.
In my last blog, I talked about the dreadful experience of cleaning raw data by hand as a former analyst a few years back. Well, the truth is, I was not alone. At a recent data mining Meetup event in San Francisco bay area, I asked a few analysts: “How much time do you spend on cleaning your data at work?” “More than 80% of my time” and “most my days” said the analysts, and “they are not fun”.
But check this out: There are over a dozen Meetup groups focused on data science and data mining here in the bay area I live. Those groups put on events multiple times a month, with topics often around hot, emerging technologies such as machine learning, graph analysis, real-time analytics, new algorithm on analyzing social media data, and of course, anything Big Data. Cools BI tools, new programming models and algorithms for better analysis are a big draw to data practitioners these days.
That got me thinking… if what analysts said to me is true, i.e., they spent 80% of their time on data prepping and 1/4 of that time analyzing the data and visualizing the results, which BTW, “is actually fun”, quoting a data analyst, then why are they drawn to the events focused on discussing the tools that can only help them 20% of the time? Why wouldn’t they want to explore technologies that can help address the dreadful 80% of the data scrubbing task they complain about?
Having been there myself, I thought perhaps a little self-reflection would help answer the question.
As a student of math, I love data and am fascinated about good stories I can discover from them. My two-year math program in graduate school was primarily focused on learning how to build fabulous math models to simulate the real events, and use those formula to predict the future, or look for meaningful patterns.
I used BI and statistical analysis tools while at school, and continued to use them at work after I graduated. Those software were great in that they helped me get to the results and see what’s in my data, and I can develop conclusions and make recommendations based on those insights for my clients. Without BI and visualization tools, I would not have delivered any results.
That was fun and glamorous part of my job as an analyst, but when I was not creating nice charts and presentations to tell the stories in my data, I was spending time, great amount of time, sometimes up to the wee hours cleaning and verifying my data, I was convinced that was part of my job and I just had to suck it up.
It was only a few months ago that I stumbled upon data quality software – it happened when I joined Informatica. At first I thought they were talking to the wrong person when they started pitching me data quality solutions.
Turns out, the concept of data quality automation is a highly relevant and extremely intuitive subject to me, and for anyone who is dealing with data on the regular basis. Data quality software offers an automated process for data cleansing and is much faster and delivers more accurate results than manual process. To put that in math context, if a data quality tool can reduce the data cleansing effort from 80% to 40% (btw, this is hardly a random number, some of our customers have reported much better results), that means analysts can now free up 40% of their time from scrubbing data, and use that times to do the things they like – playing with data in BI tools, building new models or running more scenarios, producing different views of the data and discovering things they may not be able to before, and do all of that with clean, trusted data. No more bored to death experience, what they are left with are improved productivity, more accurate and consistent results, compelling stories about data, and most important, they can focus on doing the things they like! Not too shabby right?
I am excited about trying out the data quality tools we have here at Informtica, my fellow analysts, you should start looking into them also. And I will check back in soon with more stories to share..