Category Archives: Data Integration
For some of you “old timers” in the IT industry, you will remember the days when we used to hand-code our own Database Management Systems. Of course today we just go out and buy a general purpose DBMS like MySQL, Oracle, dBASE, or IBM DB2 to name a few. Or, if we wind the clock back further, there was a time when we used to write our own operating systems. Today it comes with the hardware or we can buy an OS like UNIX, iOS, Linux, OS X, Windows, and IBM z/OS. And I can still remember hand-coding network protocols in the days before TCP/IP became ubiquitous. Today we select from UDP, HTTP, POP3, FTP, IMAP, RMI, SOAP and others. (more…)
Do you know what year the first steam engine locomotive was invented? 1804. It traveled 9 miles in two hours. Now, you and I would be pretty upset of we boarded a train and it took 2 hours to go 9 miles. But, 200 years ago, this was a huge innovation and led to the invention of the modern day train and railway.
Tremendous Growth In Demand for Rail Travel Puts Pressure on Rail Infrastructure
Today, Britain is experiencing tremendous growth in demand for rail travel. One million more trains and 500 million more passengers travel by train than just 5 years ago. Over the next 30 years passenger demand for rail will more than double and freight demand is expected to go up by 140%. This puts tremendous pressure on the rail infrastructure.
Network Rail is in the modern-day rail business. Employees work day and night running, maintaining and updating Britain’s rail infrastructure, including millions of assets, such as 22,000 miles of track, 6,500 crossings, 43,000 bridges, viaducts and tunnels. Improving the rail network provides faster, more frequent and more reliable journeys between Britain’s towns and cities.
Network Rail is investing more in the rail infrastructure than in Victorian times. In the last six months, they spent about $25 million a day! In a recent news release, Patrick Bucher, group finance director said, “We continue to invest record amounts to deliver a bigger, better railway for passengers and businesses across Britain. We are also driving down the cost of running Britain’s railway to help make it more affordable in the years ahead.”
Employees Need to Trust Asset Information to Pinpoint and Fix Problems Quickly
To pinpoint and fix problems quickly, keep their operating costs low and maintain a strong safety record, Network Rail’s employees need to trust their mission-critical asset information, such as:
- What is the problem?
- Where is it?
- What equipment, tools and skills are needed to fix it?
- Who is closest to the problem that could fix it?
Difficult to Make Sense of Asset Information Scattered across Applications
Similar to many companies their size, Network Rail’s mission-critical asset information was scattered across many applications, which made it difficult for employees to make sense of asset information and the interaction between assets.
The asset information team recognized the limitations of employees depending on an application-centric view of their business. To operate more efficiently and effectively, they needed clean asset information, consistent asset information, and connected asset information.
Investing in Rail Infrastructure AND the Information Infrastructure to Support It
Network Rail now uses a combination of data integration, data quality, and master data management (MDM) to manage their mission-critical asset information in a central location on an ongoing basis, to:
- make sense of asset information,
- understand the relationships between assets, and
- track changes to asset information.
In a news release, Patrick Bossert Director of Network Rail’s Asset Information services business said, “With more accurate and reliable information about assets and their condition our team can make better business decisions, enable innovation in our asset management policy, planning and execution, and improve rail-system-wide investment decisions that benefit the rail industry as a whole.”
If you work for a company that revolves around mission-critical asset information, ask yourself these questions:
- Can our employees makes sense of our asset information?
- Can they easily see relationships between assets and how they interact?
- Can they see the history of changes to asset information over time?
Or are are they limited by an application-centric view of the business because asset information is scattered across in multiple systems?
Have a similar story about how you are managing your mission-critical asset information? Please share it in the comments below.
The challenge for supermarkets today is balancing the needs of the customer against their ability to serve those needs. How are supermarkets and food manufacturers preparing their business for e-readiness? What about more customer centricity?
Currently, brands are not particularly good at serving consistent product information across in-store and online environments, leading to lower conversions and poor customer satisfaction. This shortfall is also preventing these brands from moving forward and innovating with new technologies. As a result, Product Information Management (PIM) is becoming a significant focus in effective omnichannel initiatives.
Consider the large range of products that can be seen at the average grocery store. The sheer number of categories is staggering, before you even consider the quantity of items in each category. There’s little wonder of local brands are struggling to replicate this level of product data anywhere else but on their store shelves.
Furthermore, consider the various kinds of information supermarkets are expected to include. Then, add to this the kinds of information supermarkets could include in order to present a competitive advantage over and above the rest. Information types currently possible are: Ingredients, additives, Images and videos, marketing copy, gene manipulation information, references, product seals, allergens, nutritional facts, translations, product categories, expiration/use-by dates, variants, region-specific information, GSDN information and more.
Ultimately, supermarkets are already on the path of improving consumers’ shopping experience and a few of the emerging technologies indicate the way this industry will continue to evolve.
6 Examples of food retail and supermarket trends
The below six examples demonstrate an emerging trend in grocery shopping, while also highlighting the need for accurate product information creation, curation and distribution.
- Ready-to-cook product bundles: Nice and very customer facing concept is done by German food retailer www.kochhaus.de (meaning house of cooking). The only offer product bundles of all ingredients which are required to cook a certain meal for the required number of guests. It can be seen as the look books which are well established at fashion brands and retailers sales strategy.
- Self-checkout Systems – More supermarkets are beginning to include self-checkouts. American and UK companies lead, Germany or Australia are behind. But there is the same risk of cart abandonment here as there is online, so providing a comprehensive and rich suite of product information at these POS systems is crucial.
- In-store Information Kiosks – Some supermarkets are beginning to include interactive displays in-store, with some even providing tablets mounted onto shopping trolleys. These displays serve in place of an in-store sales assistant, providing consumers with directions, promotions and complete access to product information (such as stock levels) on any item in the store.
- Supermarket Pop-ups – Food retailers are increasingly experimenting and improving the traditional shopping experience. One example that has turned the bricks-and-mortar concept on its head is electronic shopping ‘walls’, where products are prominently displayed in a high-traffic area. Consumers are able to access product details and make purchases by scanning a code presented alongside the image of a given product.
- Store-to-door Delivery Services – It’s starting to become commonplace. Not only are supermarkets offering same-day delivery services, the major brands are also experimenting with click and collect services. These supermarkets are moving toward websites that are just as busy and provide as much, if not more relevant content as their bricks-and-mortar outlets.
- App Commerce: Companies, like German food retailer Edeka offer an app for push marketing, or help matching customer profiles of dietary or allergy profiles with QR-code scanned products on the shopping list within the supermarket app.
What is next?
The supermarket of the future:
Reviving Customer Loyalty with leveraging information potential
Due to the increased transparency brought on by the ‘Google Era’, retailers have experienced a marked decline in customer loyalty. This concept of omnichannel shopping behaviour has led previously loyal customers to shop elsewhere.
Putting customers in the centre of all retail activities may not be a new trend, but in order to achieve it, retailers must foster more intelligent touch points. The supermarkets of the future will combine both product and customer data in such a way that every touch point presents a uniquely personalised experience for the customer, and a single, 360-degree view of the customer to the retailer.
The major supermarket brands already have comprehensive customer loyalty programs and they’re building on these with added products, such as consumer insurance packages. However, these initiatives haven’t necessarily led to an increase in loyalty.
Instead, the imperative to create a personal, intimate connection with consumers will eventually lead to a return in loyalty. The supermarket of the future will be able to send recipe and shopping list recommendations directly to the shopper’s preferred device, taking into account any allergies or delivery preferences.
Gamification as a tool for loyalty?
Moreover, this evolution will slowly lead into another phase of loyalty marketing: gamification. Comprehensive and detailed product data will form the basis of a loyalty program that includes targets, goals and rewards for loyal customers. The more comprehensive and engaging these shopping ‘games’ become, the more successful they will be from a marketing and loyalty perspective. However, the demands for detailed, accurate product information will also increase accordingly.
Private side note: My wife likes the simple Edaka App Game, where users need to cut slices of sausages. The challenge you need to hit exactly the weight the customer requires, like the in-store associate.
Those supermarkets that can deploy these initiatives first – and continue to innovate beyond this point – will have a bright future. Those that lag behind when it comes to leveraging their information and real time process might quickly begin to fade away.
What can I cook of my fridge remains?
I have been working all week long on the next year planning, so my fridge was not feeded well this week. Being almost empty the asks are
- What products are left?
- When do they expire?
- What can I cook of my fridge leftovers? (receipts)
- Where do I get the missing items for dinner with my wife? – And for which price
- Do they all match with my dietary and here allergy to nuts?
- Can I order online?
- When will they get delivered?
- What things can make our evening a success? The right wine recommendation? Two candles?
Well it is up to your imagination which products also can be sold in addition to make the customer happy and create a nice candle light dinner… But at least a good reason to increase the assortment.
Unlike some of my friends, History was a subject in high school and college that I truly enjoyed. I particularly appreciated biographies of favorite historical figures because it painted a human face and gave meaning and color to the past. I also vowed at that time to navigate my life and future under the principle attributed to Harvard professor Jorge Agustín Nicolás Ruiz de Santayana y Borrás that goes, “Those who cannot remember the past are condemned to repeat it.”
So that’s a little ditty regarding my history regarding history.
Forwarding now to the present in which I have carved out my career in technology, and in particular, enterprise software, I’m afforded a great platform where I talk to lots of IT and business leaders. When I do, I usually ask them, “How are you implementing advanced projects that help the business become more agile or effective or opportunistically proactive?” They usually answer something along the lines of “this is the age and renaissance of data science and analytics” and then end up talking exclusively about their meat and potatoes business intelligence software projects and how 300 reports now run their business.
Then when I probe and hear their answer more in depth, I am once again reminded of THE history quote and think to myself there’s an amusing irony at play here. When I think about the Business Intelligence systems of today, most are designed to “remember” and report on the historical past through large data warehouses of a gazillion transactions, along with basic, but numerous shipping and billing histories and maybe assorted support records.
But when it comes right down to it, business intelligence “history” is still just that. Nothing is really learned and applied right when and where it counted – AND when it would have made all the difference had the company been able to react in time.
So, in essence, by using standalone BI systems as they are designed today, companies are indeed condemned to repeat what they have already learned because they are too late – so the same mistakes will be repeated again and again.
This means the challenge for BI is to reduce latency, measure the pertinent data / sensors / events, and get scalable – extremely scalable and flexible enough to handle the volume and variety of the forthcoming data onslaught.
There’s a part 2 to this story so keep an eye out for my next blog post History Repeats Itself (Part 2)
That tag line got your attention – did it not? Last week I talked about how companies are trying to squeeze more value out of their asset data (e.g. equipment of any kind) and the systems that house it. I also highlighted the fact that IT departments in many companies with physical asset-heavy business models have tried (and often failed) to create a consistent view of asset data in a new ERP or data warehouse application. These environments are neither equipped to deal with all life cycle aspects of asset information, nor are they fixing the root of the data problem in the sources, i.e. where the stuff is and what it look like. It is like a teenager whose parents have spent thousands of dollars on buying him the latest garments but he always wears the same three outfits because he cannot find the other ones in the pile he hoardes under her bed. And now they bought him a smart phone to fix it. So before you buy him the next black designer shirt, maybe it would be good to find out how many of the same designer shirts he already has, what state they are in and where they are.
Recently, I had the chance to work on a like problem with a large overseas oil & gas company and a North American utility. Both are by definition asset heavy, very conservative in their business practices, highly regulated, very much dependent on outside market forces such as the oil price and geographically very dispersed; and thus, by default a classic system integration spaghetti dish.
My challenge was to find out where the biggest opportunities were in terms of harnessing data for financial benefit.
The initial sense in oil & gas was that most of the financial opportunity hidden in asset data was in G&G (geophysical & geological) and the least on the retail side (lubricants and gas for sale at operated gas stations). On the utility side, the go to area for opportunity appeared to be maintenance operations. Let’s say that I was about right with these assertions but that there were a lot more skeletons in the closet with diamond rings on their fingers than I anticipated.
After talking extensively with a number of department heads in the oil company; starting with the IT folks running half of the 400 G&G applications, the ERP instances (turns out there were 5, not 1) and the data warehouses (3), I queried the people in charge of lubricant and crude plant operations, hydrocarbon trading, finance (tax, insurance, treasury) as well as supply chain, production management, land management and HSE (health, safety, environmental).
The net-net was that the production management people said that there is no issue as they already cleaned up the ERP instance around customer and asset (well) information. The supply chain folks also indicated that they have used another vendor’s MDM application to clean up their vendor data, which funnily enough was not put back into the procurement system responsible for ordering parts. The data warehouse/BI team was comfortable that they cleaned up any information for supply chain, production and finance reports before dimension and fact tables were populated for any data marts.
All of this was pretty much a series of denial sessions on your 12-step road to recovery as the IT folks had very little interaction with the business to get any sense of how relevant, correct, timely and useful these actions are for the end consumer of the information. They also had to run and adjust fixes every month or quarter as source systems changed, new legislation dictated adjustments and new executive guidelines were announced.
While every department tried to run semi-automated and monthly clean up jobs with scripts and some off-the-shelve software to fix their particular situation, the corporate (holding) company and any downstream consumers had no consistency to make sensible decisions on where and how to invest without throwing another legion of bodies (by now over 100 FTEs in total) at the same problem.
So at every stage of the data flow from sources to the ERP to the operational BI and lastly the finance BI environment, people repeated the same tasks: profile, understand, move, aggregate, enrich, format and load.
Despite the departmental clean-up efforts, areas like production operations did not know with certainty (even after their clean up) how many well heads and bores they had, where they were downhole and who changed a characteristic as mundane as the well name last and why (governance, location match).
Marketing (Trading) was surprisingly open about their issues. They could not process incoming, anchored crude shipments into inventory or assess who the counterparty they sold to was owned by and what payment terms were appropriate given the credit or concentration risk associated (reference data, hierarchy mgmt.). As a consequence, operating cash accuracy was low despite ongoing improvements in the process and thus, incurred opportunity cost.
Operational assets like rig equipment had excess insurance coverage (location, operational data linkage) and fines paid to local governments for incorrectly filing or not renewing work visas was not returned for up to two years incurring opportunity cost (employee reference data).
A big chunk of savings was locked up in unplanned NPT (non-production time) because inconsistent, incorrect well data triggered incorrect maintenance intervals. Similarly, OEM specific DCS (drill control system) component software was lacking a central reference data store, which did not trigger alerts before components failed. If you add on top a lack of linkage of data served by thousands of sensors via well logs and Pi historians and their ever changing roll-up for operations and finance, the resulting chaos is complete.
One approach we employed around NPT improvements was to take the revenue from production figure from their 10k and combine it with the industry benchmark related to number of NPT days per 100 day of production (typically about 30% across avg depth on & offshore types). Then you overlay it with a benchmark (if they don’t know) how many of these NPT days were due to bad data, not equipment failure or alike, and just fix a portion of that, you are getting big numbers.
When I sat back and looked at all the potential it came to more than $200 million in savings over 5 years and this before any sensor data from rig equipment, like the myriad of siloed applications running within a drill control system, are integrated and leveraged via a Hadoop cluster to influence operational decisions like drill string configuration or asmyth.
Next time I’ll share some insight into the results of my most recent utility engagement but I would love to hear from you what your experience is in these two or other similar industries.
Recommendations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
I had a disturbing conversation at Dreamforce. Long story short, thousands of highly skilled and highly paid financial advisors (read sales reps) at a large financial services company are spending most of their day pulling together information about their clients in a spreadsheet, leaving only a few hours to engage with clients and generate revenue.
Not all valuable customer information is in Salesforce
Why? They don’t have a 360-degree customer view within Salesforce.
Why not? Not all client information that’s valuable to the financial advisors is in Salesforce. Important client information is in other applications too, such as:
- Marketing automation application
- Customer support application
- Account management applications
- Finance applications
- Business intelligence applications
Are you in sales? Do you work for a company that has multiple products or lines of business? Then you can probably relate. In my 15 years of experience working with sales, I’ve found this to be a harsh reality. You have to manually pull together customer information, which is a time-consuming process that doesn’t boost job satisfaction.
Stop building 360-degree customer views in spreadsheets
So what can you do about it? Stop building 360-degree customer views in spreadsheets. There is a better way and your sales operations leader can help.
One of my favorite customer success stories is about one of the world’s leading wealth management companies, with 16,000 financial advisors globally. Like most companies, their goal is to increase revenue by understanding their customers’ needs and making relevant cross-sell and up-sell offers.
But, the financial advisors needed an up-to-date view of the “total customer relationship” with the bank before they talked to their high net-worth clients. They wanted to appear knowledgeable and offer a product the client might actually want.
Can you guess what was holding them back? The bank operated in an account-centric world. Each line of business had its own account management application. To get a 360-degree customer view, the financial advisors spent 70% of their time pulling important client information from different applications into spreadsheets. Sound familiar?
Once the head of sales realized this, he decided to invest in information management technology that provides clean, consistent and connected customer information and delivers a 360-degree customer view within Salesforce.
The result? They’ve had a $50 million dollar impact annually and a 30% increase in productivity. In fact, word spread to other banks and the 360-degree customer view in Salesforce became an incentive to attract top talent in the industry.
Ask sales operations to give you 360-degree customer views within Salesforce
I urge you to take action. In particular, talk to your sales operations leader if he or she is at all interested in improving performance and productivity, acquiring and retaining top sales talent, and cutting costs.
Want to see how you can get 360-degree customer views in Salesforce? Check out this demo: Enrich Customer Data in Your CRM Application with MDM. Then schedule a meeting with your sales operations leader.
Have a similar experience to share? Please share it in the comments below.
Last year around this time, I wrote a blog about how the death of ETL was exaggerated. Time to revisit the topic briefly given a couple of interesting events that happened in the past few weeks.
First, one of the companies who had a senior executive that had claimed that ETL and the data integration layer was dead came by to visit. It turns out that the bold executive who claimed that everything they were doing had been migrated to Hadoop is no longer with that company. In addition, the thing they wanted to talk to us about what how they can more effectively build out the data warehouse and pull in mainframe, that’s right, mainframe data. It seems that old data sources never die, and they don’t even just fade away either. In fact, very little of what this company was doing was actually happening on Hadoop. Like I noted in my last blog, Hadoop is a lot like teenage sex.
Second, I gave a talk at a trade show on how new companies like Informatica were going to fill the ease of use gap on top of Hadoop by providing tooling so less skilled developers could also take advantage of Hadoop (for more on this topic, please check back on my blog titled “Dinner with my French Neighbor” ) . After my talk, a gentleman in his late 20’s came up to me and told me that he used to work for Aster Data, which was subsequently bought by Teradata. He had recently left to join a new startup. He used to think that the data integration layer would die away because you could easily use something like Aster to handle both the analytics queries and the data integration. Then after Aster was acquired by Teradata, he got to see an Informatica PowerCenter mapping that brought in a number of data sources, cleaned and integrated the data before moving it into Teradata. He told me that he hadn’t realized how complex real customer environments were and that there was no way that they could have done all of that integration in Aster. This is pretty typical of people who are new to the data space or who are building out Hadoop based startups. They don’t have to deal with legacy environments so they have no idea how messy they are until they finally see them first hand.
Third and last, someone from a startup company that I had talked to last year which has a visual data preparation and analytics environment on top of Hadoop sent me an email after Strata. I wasn’t at Strata, but he got my email address from one of my employees. He wanted to talk about partnering with us because their customers need to be able to handle more sophisticated data integration jobs ( connecting, cleansing, integrating, transforming, parsing etc) before their users can make use of the data. Only last year, this same company said that they were competing with Informatica because underneath their visualization layer, they had basic data integration transformation tools. As it turns out, basic wasn’t anywhere near enough so they are back talking to us about a partnership.
The point is that just because we can now dump all of our data into Hadoop, doesn’t mean it is integrated. If you take 10 legacy data sources plus internet data and sensor data and so on, and just dump it into Hadoop, it doesn’t make it integrated. It just makes it collocated. So while “ETL” in the classic sense will definitely change, the idea that there won’t be a data integration layer that exists to simplify and manage the integration of all of the old and new sources of data is just silly. That layer will continue to exist, it just might use a variety of technologies, including Hadoop, underneath as a storage and processing engine.
Regardless, I am happy to see that more and more companies are realizing that today’s data world is actually getting more complicated, not less complicated. The result, data fragmentation is only getting worse, so the future for data integration is only looking brighter.
So I missed Strata this year so I can only report back what I heard from my team. I was out on the road talking with customers while the gang was at Strata, talking to customers and prospective customers. That said, the conversations they had with new cool Hadoop companies were and my conversations were quite similar. Lots of talk about trials on Hadoop, but outside of the big internet firms, some startups that are focused on solving “big data” problems and some wall street firms, most companies are still kicking the Hadoop tires.
Which reminds me of a picture my neighbor took of a presentation that he saw on Hadoop. The presenter had a slide with a rehash of an old joke that went something like this (I am paraphrasing here as I don’t have the exact quote):
“Hadoop is a lot like teenage sex. Everyone says they do it, but most are not. And for those who are doing it, most of them aren’t very good at it yet. “
So if you haven’t gotten started on your Hadoop project, don’t worry, you aren’t as far behind as you think.
My wife invited my new neighbors over for dinner this past Saturday night. They are a French couple with a super cute 5 year old son. Dinner was nice, and like most ex-pats in the San Francisco Bay Area, he is in high tech. His company is a successful internet company in Europe, but have had a hard time penetrating the U.S. market which is why they moved to the Bay Area. He is starting up a satellite engineering organization in Palo Alto and he asked me where he can find good “big data” engineers. He is having a hard time finding people.
This is a story that I am hearing quite a bit with customers that I have been talking to as well. They want to start up big data teams, but can’t find enough skilled engineers who understand how to develop in PIG or HIVE or YARN or whatever is coming next in the Hadoop/map reduce world.
This reminds me of when I used to work in the telecom software business 20 years ago and everyone was looking at technologies like DCE and CORBA to build out distributed computing environments to solve complex problems that couldn’t be solved easily on a single computing system. If you don’t know what DCE or CORBA are/were, that’s OK. It is kind of the point. They are distributed computing development platforms that failed because they were too damn hard and there just weren’t enough people who could understand how to use them effectively. Now DCE and CORBA were not trying to solve the same problems as Hadoop, but the basic point still stands, they were damn hard and the reality is that programming on a Hadoop platform is damn hard as well.
So could Hadoop fail, just like CORBA and DCE. I doubt it, for a few key reasons. One… there is a considerable amount of venture and industrial investment going into Hadoop to make it work. Not since Java has there been such a concerted effort by the industry to try to make a new technology successful. Second, much of that investment is in providing graphical development environments and applications that use the storage and compute power of Hadoop, but hide its complexity. That is what Informatica is doing with PowerCenter Big Data Edition. We are making it possible for data integration developers to parse, cleanse, transform and integrate data using Hadoop as the underlying storage and engine. But the developer doesn’t have to know anything about Hadoop. The same thing is happening at the analytics layer, at the data prep layer and at the visualization layer.
Bit by bit, software vendors are hiding the underlying complexity of Hadoop so organizations won’t have to hire an army of big data scientists to solve interesting problems. They will still need a few of them, but not so many that Hadoop will end up like those other technologies that most Hadoop developers have never even heard of.
Power to the elephant. And more later about my dinner guest and his super cute 5 year old son.