Category Archives: Data Aggregation
That tag line got your attention – did it not? Last week I talked about how companies are trying to squeeze more value out of their asset data (e.g. equipment of any kind) and the systems that house it. I also highlighted the fact that IT departments in many companies with physical asset-heavy business models have tried (and often failed) to create a consistent view of asset data in a new ERP or data warehouse application. These environments are neither equipped to deal with all life cycle aspects of asset information, nor are they fixing the root of the data problem in the sources, i.e. where the stuff is and what it look like. It is like a teenager whose parents have spent thousands of dollars on buying him the latest garments but he always wears the same three outfits because he cannot find the other ones in the pile he hoardes under her bed. And now they bought him a smart phone to fix it. So before you buy him the next black designer shirt, maybe it would be good to find out how many of the same designer shirts he already has, what state they are in and where they are.
Recently, I had the chance to work on a like problem with a large overseas oil & gas company and a North American utility. Both are by definition asset heavy, very conservative in their business practices, highly regulated, very much dependent on outside market forces such as the oil price and geographically very dispersed; and thus, by default a classic system integration spaghetti dish.
My challenge was to find out where the biggest opportunities were in terms of harnessing data for financial benefit.
The initial sense in oil & gas was that most of the financial opportunity hidden in asset data was in G&G (geophysical & geological) and the least on the retail side (lubricants and gas for sale at operated gas stations). On the utility side, the go to area for opportunity appeared to be maintenance operations. Let’s say that I was about right with these assertions but that there were a lot more skeletons in the closet with diamond rings on their fingers than I anticipated.
After talking extensively with a number of department heads in the oil company; starting with the IT folks running half of the 400 G&G applications, the ERP instances (turns out there were 5, not 1) and the data warehouses (3), I queried the people in charge of lubricant and crude plant operations, hydrocarbon trading, finance (tax, insurance, treasury) as well as supply chain, production management, land management and HSE (health, safety, environmental).
The net-net was that the production management people said that there is no issue as they already cleaned up the ERP instance around customer and asset (well) information. The supply chain folks also indicated that they have used another vendor’s MDM application to clean up their vendor data, which funnily enough was not put back into the procurement system responsible for ordering parts. The data warehouse/BI team was comfortable that they cleaned up any information for supply chain, production and finance reports before dimension and fact tables were populated for any data marts.
All of this was pretty much a series of denial sessions on your 12-step road to recovery as the IT folks had very little interaction with the business to get any sense of how relevant, correct, timely and useful these actions are for the end consumer of the information. They also had to run and adjust fixes every month or quarter as source systems changed, new legislation dictated adjustments and new executive guidelines were announced.
While every department tried to run semi-automated and monthly clean up jobs with scripts and some off-the-shelve software to fix their particular situation, the corporate (holding) company and any downstream consumers had no consistency to make sensible decisions on where and how to invest without throwing another legion of bodies (by now over 100 FTEs in total) at the same problem.
So at every stage of the data flow from sources to the ERP to the operational BI and lastly the finance BI environment, people repeated the same tasks: profile, understand, move, aggregate, enrich, format and load.
Despite the departmental clean-up efforts, areas like production operations did not know with certainty (even after their clean up) how many well heads and bores they had, where they were downhole and who changed a characteristic as mundane as the well name last and why (governance, location match).
Marketing (Trading) was surprisingly open about their issues. They could not process incoming, anchored crude shipments into inventory or assess who the counterparty they sold to was owned by and what payment terms were appropriate given the credit or concentration risk associated (reference data, hierarchy mgmt.). As a consequence, operating cash accuracy was low despite ongoing improvements in the process and thus, incurred opportunity cost.
Operational assets like rig equipment had excess insurance coverage (location, operational data linkage) and fines paid to local governments for incorrectly filing or not renewing work visas was not returned for up to two years incurring opportunity cost (employee reference data).
A big chunk of savings was locked up in unplanned NPT (non-production time) because inconsistent, incorrect well data triggered incorrect maintenance intervals. Similarly, OEM specific DCS (drill control system) component software was lacking a central reference data store, which did not trigger alerts before components failed. If you add on top a lack of linkage of data served by thousands of sensors via well logs and Pi historians and their ever changing roll-up for operations and finance, the resulting chaos is complete.
One approach we employed around NPT improvements was to take the revenue from production figure from their 10k and combine it with the industry benchmark related to number of NPT days per 100 day of production (typically about 30% across avg depth on & offshore types). Then you overlay it with a benchmark (if they don’t know) how many of these NPT days were due to bad data, not equipment failure or alike, and just fix a portion of that, you are getting big numbers.
When I sat back and looked at all the potential it came to more than $200 million in savings over 5 years and this before any sensor data from rig equipment, like the myriad of siloed applications running within a drill control system, are integrated and leveraged via a Hadoop cluster to influence operational decisions like drill string configuration or asmyth.
Next time I’ll share some insight into the results of my most recent utility engagement but I would love to hear from you what your experience is in these two or other similar industries.
Recommendations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
As a Tesla owner, I recently had the experience of calling Tesla service after a yellow warning message appeared on the center console of my car.” Check tire pressure system. Call Tesla Service.” While still on the freeway, I voice dialed Tesla with my iPhone and was in touch with a service representative within minutes.
|Me: A yellow warning message just appeared on my dash and also the center console.
Tesla rep: Yes, I see – is it the tire pressure warning?
Me: Yes – do I need to pull into a gas station? I haven’t had to visit a gas station since I purchased the car.
Tesla rep: Well, I also see that you are traveling on a freeway that has some steep elevation – it’s possible the higher altitude is affecting your car’s tires temporarily until the pressure equalizes. Let me check your tire pressure monitoring sensor in a half hour. If the sensor still detects a problem, I will call you and give further instructions.
As it turned out, the warning message disappeared after ten minutes and everything was fine for the rest of the trip. However, the episode served as a reminder that the world will be much different with the advent of the Internet of Things. Just as humans connected with mobile phones become more productive, machines and devices connected to the network become more useful. In this case, a connected automobile allowed the remote service rep to remotely access vehicle data, read the tire pressure sensor as well as the vehicle location/elevation and was able to suggest a course of action. This example is fairly basic compared to the opportunities afforded by networked devices/machines.
In addition to remote servicing, there are several other use case categories that offer great potential, including:
- Preventative Maintenance – monitor usage data and increase the overall uptime for machines/devices while decreasing the cost of upkeep. e.g., Tesla runs remote diagnostics on vehicles and has the ability to identify vehicle problems before they occur.
- Realtime Product Enhancements – analyze product usage data and deliver improvements quickly in response. e.g., Tesla delivers software updates that improve the usability of the vehicle based on analysis of owner usage.
- Higher Efficiency in Business Operations – analyze consolidated enterprise transaction data with machine data to identify opportunities to achieve greater operational efficiency. e.g., Tesla deployed waves of new fast charging stations (known as superchargers) based upon analyzing the travel patterns of its vehicle owners.
- Differentiated Product/Service Offerings – deliver new class of applications that operate on correlated data across a broad spectrum of sources (HINT for Tesla: a trip planning application that estimates energy consumption and recommends charging stops would be really cool…)
In each case, machine data is integrated with other data (traditional enterprise data, vehicle owner registration data, etc.) to create business value. Just as important to the connectivity of the devices and machines is the ability to integrate the data. Several Informatica customers have begun investing in M2M (aka Internet of Things) infrastructure and Informatica technology has been critical to their efforts. US Xpress utilizes mobile censors on its vast fleet of trucks and Informatica delivers the ability to consolidate, cleanse and integrate the data they collect.
My recent episode with Tesla service was a simple, yet eye-opening experience. With increasingly more machines and devices getting wireless connected and the ability to integrate the tremendous volumes of data being generated, this example is only a small hint of more interesting things to come.
The term “big data” has been bandied around so much in recent months that arguably, it’s lost a lot of meaning in the IT industry. Typically, IT teams have heard the phrase, and know they need to be doing something, but that something isn’t being done. As IDC pointed out last year, there is a concerning shortage of trained big data technology experts, and failure to recognise the implications that not managing big data can have on the business is dangerous. In today’s information economy, as increasingly digital consumers, customers, employees and social networkers we’re handing over more and more personal information for businesses and third parties to collate, manage and analyse. On top of the growth in digital data, emerging trends such as cloud computing are having a huge impact on the amount of information businesses are required to handle and store on behalf of their customers. Furthermore, it’s not just the amount of information that’s spiralling out of control: it’s also the way in which it is structured and used. There has been a dramatic rise in the amount of unstructured data, such as photos, videos and social media, which presents businesses with new challenges as to how to collate, handle and analyse it. As a result, information is growing exponentially. Experts now predict a staggering 4300% increase in annual data generation by 2020. Unless businesses put policies in place to manage this wealth of information, it will become worthless, and due to the often extortionate costs to store the data, it will instead end up having a huge impact on the business’ bottom line. Maxed out data centres Many businesses have limited resource to invest in physical servers and storage and so are increasingly looking to data centres to store their information in. As a result, data centres across Europe are quickly filling up. Due to European data retention regulations, which dictate that information is generally stored for longer periods than in other regions such as the US, businesses across Europe have to wait a very long time to archive their data. For instance, under EU law, telecommunications service and network providers are obliged to retain certain categories of data for a specific period of time (typically between six months and two years) and to make that information available to law enforcement where needed. With this in mind, it’s no surprise that investment in high performance storage capacity has become a key priority for many. Time for a clear out So how can organisations deal with these storage issues? They can upgrade or replace their servers, parting with lots of capital expenditure to bring in more power or more memory for Central Processing Units (CPUs). An alternative solution would be to “spring clean” their information. Smart partitioning allows businesses to spend just one tenth of the amount required to purchase new servers and storage capacity, and actually refocus how they’re organising their information. With smart partitioning capabilities, businesses can get all the benefits of archiving the information that’s not necessarily eligible for archiving (due to EU retention regulations). Furthermore, application retirement frees up floor space, drives the modernisation initiative, allows mainframe systems and older platforms to be replaced and legacy data to be migrated to virtual archives. Before IT professionals go out and buy big data systems, they need to spring clean their information and make room for big data. Poor economic conditions across Europe have stifled innovation for a lot of organisations, as they have been forced to focus on staying alive rather than putting investment into R&D to help improve operational efficiencies. They are, therefore, looking for ways to squeeze more out of their already shrinking budgets. The likes of smart partitioning and application retirement offer businesses a real solution to the growing big data conundrum. So maybe it’s time you got your feather duster out, and gave your information a good clean out this spring?
I told the head of the Enterprise Data Warehouse at a large bank, “you don’t have a data warehouse, you have 50,000 tables.” The issue is that the bank built the EDW without the necessary fundamentals in place. It wasn’t for lack of money; in fact the EDW was one of the biggest “money sinks” in the bank. The problem is that it was sitting on a sinking foundation.
One version of the truth isn’t achieved by putting all your data in one big system or one big database – that’s impossible. An enterprise data warehouse is indeed part of the solution, but it needs to be built on a solid foundation. What does a solid foundation look like? Here are five pillars for one version of the truth. (more…)
Remote Data Collection and Transformation – with Ultra Messaging Cache Option and B2B Data Transformation
Sometimes when I drive past an electronic tollway collection sensor, I wonder about the amount of data it must generate. I’m no expert on such technology, but at a minimum, the RFID sensor has to read the chip in your car, and log the date and time plus your RFID info, and then a camera takes a picture to catch any potential violators. Now multiply that data times the hundreds of thousands of cars that drive such roads every day, times the number of sensors they pass, and I’m quite sure this number exceeds several million messages per day. (more…)
This fall, I have the fantastic privilege of moderating a series of informative Webcasts, called “Hadoop Tuesdays,” co-sponsored by Informatica and Cloudera, on the phenomenon sweeping the data management space known as Hadoop.
Big Data may be the problem, but Hadoop is the answer. Hadoop is an open-source software framework that enables applications to run across large arrays of nodes, accessing petabytes’ worth of data. It was originally created by Doug Cutting to support the open-source Nutch search engine project, which is now part of the Apache Lucene text-search library. ‘Hadoop’ was actually named after Cutting’s son’s toy elephant – a fitting analogy for the Big Data challenges that lie ahead.
The series kicks off on September 22nd with a “TweetJam” over the Twitter network – simply check in at Noon Eastern Time that day with hashtags #Hadoop or #infatj. (more…)
As we discussed in my last blog post, when building a SOA, data abstraction is the single most important approach and enabling technology when it comes to managing data within a SOA.
“Data abstraction is the key. It allows you to fix issues with the existing physical databases within the data service itself. Moreover, you can combine many different databases, and even unstructured information, into a single unified view of the data that is more representative of the business.”
Let’s walk further down this road. When using the approach of data abstraction, and data abstraction using data services, you’re able to emulate the desired data and data structure without having to recreate and restructure the physical databases, nor having to statically bind the services to the physical data. The core value of this is agility, but there are many other advantages as well, including: (more…)