Category Archives: Data Aggregation
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
Murphy’s First Law of Bad Data – If You Make A Small Change Without Involving Your Client – You Will Waste Heaps Of Money
I have not used my personal encounter with bad data management for over a year but a couple of weeks ago I was compelled to revive it. Why you ask? Well, a complete stranger started to receive one of my friend’s text messages – including mine – and it took days for him to detect it and a week later nobody at this North American wireless operator had been able to fix it. This coincided with a meeting I had with a European telco’s enterprise architecture team. There was no better way to illustrate to them how a customer reacts and the risk to their operations, when communication breaks down due to just one tiny thing changing – say, his address (or in the SMS case, some random SIM mapping – another type of address).
In my case, I moved about 250 miles within the United States a couple of years ago and this seemingly common experience triggered a plethora of communication screw ups across every merchant a residential household engages with frequently, e.g. your bank, your insurer, your wireless carrier, your average retail clothing store, etc.
For more than two full years after my move to a new state, the following things continued to pop up on a monthly basis due to my incorrect customer data:
- In case of my old satellite TV provider they got to me (correct person) but with a misspelled last name at my correct, new address.
- My bank put me in a bit of a pickle as they sent “important tax documentation”, which I did not want to open as my new tenants’ names (in the house I just vacated) was on the letter but with my new home’s address.
- My mortgage lender sends me a refinancing offer to my new address (right person & right address) but with my wife’s as well as my name completely butchered.
- My wife’s airline, where she enjoys the highest level of frequent flyer status, continually mails her offers duplicating her last name as her first name.
- A high-end furniture retailer sends two 100-page glossy catalogs probably costing $80 each to our address – one for me, one for her.
- A national health insurer sends “sensitive health information” (disclosed on envelope) to my new residence’s address but for the prior owner.
- My legacy operator turns on the wrong premium channels on half my set-top boxes.
- The same operator sends me a SMS the next day thanking me for switching to electronic billing as part of my move, which I did not sign up for, followed by payment notices (as I did not get my invoice in the mail). When I called this error out for the next three months by calling their contact center and indicating how much revenue I generate for them across all services, they counter with “sorry, we don’t have access to the wireless account data”, “you will see it change on the next bill cycle” and “you show as paper billing in our system today”.
Ignoring the potential for data privacy law suits, you start wondering how long you have to be a customer and how much money you need to spend with a merchant (and they need to waste) for them to take changes to your data more seriously. And this are not even merchants to whom I am brand new – these guys have known me and taken my money for years!
One thing I nearly forgot…these mailings all happened at least once a month on average, sometimes twice over 2 years. If I do some pigeon math here, I would have estimated the postage and production cost alone to run in the hundreds of dollars.
However, the most egregious trespass though belonged to my home owner’s insurance carrier (HOI), who was also my mortgage broker. They had a double whammy in store for me. First, I received a cancellation notice from the HOI for my old residence indicating they had cancelled my policy as the last payment was not received and that any claims will be denied as a consequence. Then, my new residence’s HOI advised they added my old home’s HOI to my account.
After wondering what I could have possibly done to trigger this, I called all four parties (not three as the mortgage firm did not share data with the insurance broker side – surprise, surprise) to find out what had happened.
It turns out that I had to explain and prove to all of them how one party’s data change during my move erroneously exposed me to liability. It felt like the old days, when seedy telco sales people needed only your name and phone number and associate it with some sort of promotion (back of a raffle card to win a new car), you never took part in, to switch your long distance carrier and present you with a $400 bill the coming month. Yes, that also happened to me…many years ago. Here again, the consumer had to do all the legwork when someone (not an automatic process!) switched some entry without any oversight or review triggering hours of wasted effort on their and my side.
We can argue all day long if these screw ups are due to bad processes or bad data, but in all reality, even processes are triggered from some sort of underlying event, which is something as mundane as a database field’s flag being updated when your last purchase puts you in a new marketing segment.
Now imagine you get married and you wife changes her name. With all these company internal (CRM, Billing, ERP), free public (property tax), commercial (credit bureaus, mailing lists) and social media data sources out there, you would think such everyday changes could get picked up quicker and automatically. If not automatically, then should there not be some sort of trigger to kick off a “governance” process; something along the lines of “email/call the customer if attribute X has changed” or “please log into your account and update your information – we heard you moved”. If American Express was able to detect ten years ago that someone purchased $500 worth of product with your credit card at a gas station or some lingerie website, known for fraudulent activity, why not your bank or insurer, who know even more about you? And yes, that happened to me as well.
Tell me about one of your “data-driven” horror scenarios?
That tag line got your attention – did it not? Last week I talked about how companies are trying to squeeze more value out of their asset data (e.g. equipment of any kind) and the systems that house it. I also highlighted the fact that IT departments in many companies with physical asset-heavy business models have tried (and often failed) to create a consistent view of asset data in a new ERP or data warehouse application. These environments are neither equipped to deal with all life cycle aspects of asset information, nor are they fixing the root of the data problem in the sources, i.e. where the stuff is and what it look like. It is like a teenager whose parents have spent thousands of dollars on buying him the latest garments but he always wears the same three outfits because he cannot find the other ones in the pile he hoardes under her bed. And now they bought him a smart phone to fix it. So before you buy him the next black designer shirt, maybe it would be good to find out how many of the same designer shirts he already has, what state they are in and where they are.
Recently, I had the chance to work on a like problem with a large overseas oil & gas company and a North American utility. Both are by definition asset heavy, very conservative in their business practices, highly regulated, very much dependent on outside market forces such as the oil price and geographically very dispersed; and thus, by default a classic system integration spaghetti dish.
My challenge was to find out where the biggest opportunities were in terms of harnessing data for financial benefit.
The initial sense in oil & gas was that most of the financial opportunity hidden in asset data was in G&G (geophysical & geological) and the least on the retail side (lubricants and gas for sale at operated gas stations). On the utility side, the go to area for opportunity appeared to be maintenance operations. Let’s say that I was about right with these assertions but that there were a lot more skeletons in the closet with diamond rings on their fingers than I anticipated.
After talking extensively with a number of department heads in the oil company; starting with the IT folks running half of the 400 G&G applications, the ERP instances (turns out there were 5, not 1) and the data warehouses (3), I queried the people in charge of lubricant and crude plant operations, hydrocarbon trading, finance (tax, insurance, treasury) as well as supply chain, production management, land management and HSE (health, safety, environmental).
The net-net was that the production management people said that there is no issue as they already cleaned up the ERP instance around customer and asset (well) information. The supply chain folks also indicated that they have used another vendor’s MDM application to clean up their vendor data, which funnily enough was not put back into the procurement system responsible for ordering parts. The data warehouse/BI team was comfortable that they cleaned up any information for supply chain, production and finance reports before dimension and fact tables were populated for any data marts.
All of this was pretty much a series of denial sessions on your 12-step road to recovery as the IT folks had very little interaction with the business to get any sense of how relevant, correct, timely and useful these actions are for the end consumer of the information. They also had to run and adjust fixes every month or quarter as source systems changed, new legislation dictated adjustments and new executive guidelines were announced.
While every department tried to run semi-automated and monthly clean up jobs with scripts and some off-the-shelve software to fix their particular situation, the corporate (holding) company and any downstream consumers had no consistency to make sensible decisions on where and how to invest without throwing another legion of bodies (by now over 100 FTEs in total) at the same problem.
So at every stage of the data flow from sources to the ERP to the operational BI and lastly the finance BI environment, people repeated the same tasks: profile, understand, move, aggregate, enrich, format and load.
Despite the departmental clean-up efforts, areas like production operations did not know with certainty (even after their clean up) how many well heads and bores they had, where they were downhole and who changed a characteristic as mundane as the well name last and why (governance, location match).
Marketing (Trading) was surprisingly open about their issues. They could not process incoming, anchored crude shipments into inventory or assess who the counterparty they sold to was owned by and what payment terms were appropriate given the credit or concentration risk associated (reference data, hierarchy mgmt.). As a consequence, operating cash accuracy was low despite ongoing improvements in the process and thus, incurred opportunity cost.
Operational assets like rig equipment had excess insurance coverage (location, operational data linkage) and fines paid to local governments for incorrectly filing or not renewing work visas was not returned for up to two years incurring opportunity cost (employee reference data).
A big chunk of savings was locked up in unplanned NPT (non-production time) because inconsistent, incorrect well data triggered incorrect maintenance intervals. Similarly, OEM specific DCS (drill control system) component software was lacking a central reference data store, which did not trigger alerts before components failed. If you add on top a lack of linkage of data served by thousands of sensors via well logs and Pi historians and their ever changing roll-up for operations and finance, the resulting chaos is complete.
One approach we employed around NPT improvements was to take the revenue from production figure from their 10k and combine it with the industry benchmark related to number of NPT days per 100 day of production (typically about 30% across avg depth on & offshore types). Then you overlay it with a benchmark (if they don’t know) how many of these NPT days were due to bad data, not equipment failure or alike, and just fix a portion of that, you are getting big numbers.
When I sat back and looked at all the potential it came to more than $200 million in savings over 5 years and this before any sensor data from rig equipment, like the myriad of siloed applications running within a drill control system, are integrated and leveraged via a Hadoop cluster to influence operational decisions like drill string configuration or asmyth.
Next time I’ll share some insight into the results of my most recent utility engagement but I would love to hear from you what your experience is in these two or other similar industries.
Recommendations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
As a Tesla owner, I recently had the experience of calling Tesla service after a yellow warning message appeared on the center console of my car.” Check tire pressure system. Call Tesla Service.” While still on the freeway, I voice dialed Tesla with my iPhone and was in touch with a service representative within minutes.
|Me: A yellow warning message just appeared on my dash and also the center console.
Tesla rep: Yes, I see – is it the tire pressure warning?
Me: Yes – do I need to pull into a gas station? I haven’t had to visit a gas station since I purchased the car.
Tesla rep: Well, I also see that you are traveling on a freeway that has some steep elevation – it’s possible the higher altitude is affecting your car’s tires temporarily until the pressure equalizes. Let me check your tire pressure monitoring sensor in a half hour. If the sensor still detects a problem, I will call you and give further instructions.
As it turned out, the warning message disappeared after ten minutes and everything was fine for the rest of the trip. However, the episode served as a reminder that the world will be much different with the advent of the Internet of Things. Just as humans connected with mobile phones become more productive, machines and devices connected to the network become more useful. In this case, a connected automobile allowed the remote service rep to remotely access vehicle data, read the tire pressure sensor as well as the vehicle location/elevation and was able to suggest a course of action. This example is fairly basic compared to the opportunities afforded by networked devices/machines.
In addition to remote servicing, there are several other use case categories that offer great potential, including:
- Preventative Maintenance – monitor usage data and increase the overall uptime for machines/devices while decreasing the cost of upkeep. e.g., Tesla runs remote diagnostics on vehicles and has the ability to identify vehicle problems before they occur.
- Realtime Product Enhancements – analyze product usage data and deliver improvements quickly in response. e.g., Tesla delivers software updates that improve the usability of the vehicle based on analysis of owner usage.
- Higher Efficiency in Business Operations – analyze consolidated enterprise transaction data with machine data to identify opportunities to achieve greater operational efficiency. e.g., Tesla deployed waves of new fast charging stations (known as superchargers) based upon analyzing the travel patterns of its vehicle owners.
- Differentiated Product/Service Offerings – deliver new class of applications that operate on correlated data across a broad spectrum of sources (HINT for Tesla: a trip planning application that estimates energy consumption and recommends charging stops would be really cool…)
In each case, machine data is integrated with other data (traditional enterprise data, vehicle owner registration data, etc.) to create business value. Just as important to the connectivity of the devices and machines is the ability to integrate the data. Several Informatica customers have begun investing in M2M (aka Internet of Things) infrastructure and Informatica technology has been critical to their efforts. US Xpress utilizes mobile censors on its vast fleet of trucks and Informatica delivers the ability to consolidate, cleanse and integrate the data they collect.
My recent episode with Tesla service was a simple, yet eye-opening experience. With increasingly more machines and devices getting wireless connected and the ability to integrate the tremendous volumes of data being generated, this example is only a small hint of more interesting things to come.
The term “big data” has been bandied around so much in recent months that arguably, it’s lost a lot of meaning in the IT industry. Typically, IT teams have heard the phrase, and know they need to be doing something, but that something isn’t being done. As IDC pointed out last year, there is a concerning shortage of trained big data technology experts, and failure to recognise the implications that not managing big data can have on the business is dangerous. In today’s information economy, as increasingly digital consumers, customers, employees and social networkers we’re handing over more and more personal information for businesses and third parties to collate, manage and analyse. On top of the growth in digital data, emerging trends such as cloud computing are having a huge impact on the amount of information businesses are required to handle and store on behalf of their customers. Furthermore, it’s not just the amount of information that’s spiralling out of control: it’s also the way in which it is structured and used. There has been a dramatic rise in the amount of unstructured data, such as photos, videos and social media, which presents businesses with new challenges as to how to collate, handle and analyse it. As a result, information is growing exponentially. Experts now predict a staggering 4300% increase in annual data generation by 2020. Unless businesses put policies in place to manage this wealth of information, it will become worthless, and due to the often extortionate costs to store the data, it will instead end up having a huge impact on the business’ bottom line. Maxed out data centres Many businesses have limited resource to invest in physical servers and storage and so are increasingly looking to data centres to store their information in. As a result, data centres across Europe are quickly filling up. Due to European data retention regulations, which dictate that information is generally stored for longer periods than in other regions such as the US, businesses across Europe have to wait a very long time to archive their data. For instance, under EU law, telecommunications service and network providers are obliged to retain certain categories of data for a specific period of time (typically between six months and two years) and to make that information available to law enforcement where needed. With this in mind, it’s no surprise that investment in high performance storage capacity has become a key priority for many. Time for a clear out So how can organisations deal with these storage issues? They can upgrade or replace their servers, parting with lots of capital expenditure to bring in more power or more memory for Central Processing Units (CPUs). An alternative solution would be to “spring clean” their information. Smart partitioning allows businesses to spend just one tenth of the amount required to purchase new servers and storage capacity, and actually refocus how they’re organising their information. With smart partitioning capabilities, businesses can get all the benefits of archiving the information that’s not necessarily eligible for archiving (due to EU retention regulations). Furthermore, application retirement frees up floor space, drives the modernisation initiative, allows mainframe systems and older platforms to be replaced and legacy data to be migrated to virtual archives. Before IT professionals go out and buy big data systems, they need to spring clean their information and make room for big data. Poor economic conditions across Europe have stifled innovation for a lot of organisations, as they have been forced to focus on staying alive rather than putting investment into R&D to help improve operational efficiencies. They are, therefore, looking for ways to squeeze more out of their already shrinking budgets. The likes of smart partitioning and application retirement offer businesses a real solution to the growing big data conundrum. So maybe it’s time you got your feather duster out, and gave your information a good clean out this spring?
I told the head of the Enterprise Data Warehouse at a large bank, “you don’t have a data warehouse, you have 50,000 tables.” The issue is that the bank built the EDW without the necessary fundamentals in place. It wasn’t for lack of money; in fact the EDW was one of the biggest “money sinks” in the bank. The problem is that it was sitting on a sinking foundation.
One version of the truth isn’t achieved by putting all your data in one big system or one big database – that’s impossible. An enterprise data warehouse is indeed part of the solution, but it needs to be built on a solid foundation. What does a solid foundation look like? Here are five pillars for one version of the truth. (more…)
Remote Data Collection and Transformation – with Ultra Messaging Cache Option and B2B Data Transformation
Sometimes when I drive past an electronic tollway collection sensor, I wonder about the amount of data it must generate. I’m no expert on such technology, but at a minimum, the RFID sensor has to read the chip in your car, and log the date and time plus your RFID info, and then a camera takes a picture to catch any potential violators. Now multiply that data times the hundreds of thousands of cars that drive such roads every day, times the number of sensors they pass, and I’m quite sure this number exceeds several million messages per day. (more…)
This fall, I have the fantastic privilege of moderating a series of informative Webcasts, called “Hadoop Tuesdays,” co-sponsored by Informatica and Cloudera, on the phenomenon sweeping the data management space known as Hadoop.
Big Data may be the problem, but Hadoop is the answer. Hadoop is an open-source software framework that enables applications to run across large arrays of nodes, accessing petabytes’ worth of data. It was originally created by Doug Cutting to support the open-source Nutch search engine project, which is now part of the Apache Lucene text-search library. ‘Hadoop’ was actually named after Cutting’s son’s toy elephant – a fitting analogy for the Big Data challenges that lie ahead.
The series kicks off on September 22nd with a “TweetJam” over the Twitter network – simply check in at Noon Eastern Time that day with hashtags #Hadoop or #infatj. (more…)