Category Archives: Data Aggregation
I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.
Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.
A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.
You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.
When will the machine understand that “I look like a lobster” means I am sunburned?
I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.
But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.
For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?
If you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.
Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.
If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.
1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers .
Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.
You can contact me on twitter @bigdatabeat with your comments.
I recently participated on an EDM Council panel on BCBS 239 earlier this month in London and New York. The panel consisted of Chief Risk Officers, Chief Data Officers, and information management experts from the financial industry. BCBS 239 set out 14 key principles requiring banks aggregate their risk data to allow banking regulators to avoid another 2008 crisis, with a deadline of Jan 1, 2016. Earlier this year, the Basel Committee on Banking Supervision released the findings from a self-assessment from the Globally Systemically Important Banks (GISB’s) in their readiness to 11 out of the 14 principles related to BCBS 239.
Given all of the investments made by the banking industry to improve data management and governance practices to improve ongoing risk measurement and management, I was expecting to hear signs of significant process. Unfortunately, there is still much work to be done to satisfy BCBS 239 as evidenced from my findings. Here is what we discussed in London and New York.
- It was clear that the “Data Agenda” has shifted quite considerably from IT to the Business as evidenced by the number of risk, compliance, and data governance executives in the room. Though it’s a good sign that business is taking more ownership of data requirements, there was limited discussions on the importance of having capable data management technology, infrastructure, and architecture to support a successful data governance practice. Specifically capable data integration, data quality and validation, master and reference data management, metadata to support data lineage and transparency, and business glossary and data ontology solutions to govern the terms and definitions of required data across the enterprise.
- With regard to accessing, aggregating, and streamlining the delivery of risk data from disparate systems across the enterprise and simplifying the complexity that exists today from point to point integrations accessing the same data from the same systems over and over again creating points of failure and increasing the maintenance costs of supporting the current state. The idea of replacing those point to point integrations via a centralized, scalable, and flexible data hub approach was clearly recognized as a need however, difficult to envision given the enormous work to modernize the current state.
- Data accuracy and integrity continues to be a concern to generate accurate and reliable risk data to meet normal and stress/crisis reporting accuracy requirements. Many in the room acknowledged heavy reliance on manual methods implemented over the years and investing in Automating data integration and onboarding risk data from disparate systems across the enterprise is important as part of Principle 3 however, much of what’s in place today was built as one off projects against the same systems accessing the same data delivering it to hundreds if not thousands of downstream applications in an inconsistent and costly way.
- Data transparency and auditability was a popular conversation point in the room as the need to provide comprehensive data lineage reports to help explain how data is captured, from where, how it’s transformed, and used remains a concern despite advancements in technical metadata solutions that are not integrated with their existing risk management data infrastructure
- Lastly, big concerns regarding the ability to capture and aggregate all material risk data across the banking group to deliver data by business line, legal entity, asset type, industry, region and other groupings, to support identifying and reporting risk exposures, concentrations and emerging risks. This master and reference data challenge unfortunately cannot be solved by external data utility providers due to the fact the banks have legal entity, client, counterparty, and securities instrument data residing in existing systems that require the ability to cross reference any external identifier for consistent reporting and risk measurement.
To sum it up, most banks admit they have a lot of work to do. Specifically, they must work to address gaps across their data governance and technology infrastructure.BCBS 239 is the latest and biggest data challenge facing the banking industry and not just for the GSIB’s but also for the next level down as mid-size firms will also be required to provide similar transparency to regional regulators who are adopting BCBS 239 as a framework for their local markets. BCBS 239 is not just a deadline but the principles set forth are a key requirement for banks to ensure they have the right data to manage risk and ensure transparency to industry regulators to monitor system risk across the global markets. How ready are you?
With that said, the basic approaches to consider are from the top-down, or the bottom-up. You can be successful with either approach. However, there are certain efficiencies you’ll gain with a specific choice, and it could significantly reduce the risk and cost. Let’s explore the pros and cons of each approach.
Approaching data integration from the top-down means moving from the high level integration flows, down to the data semantics. Thus, you an approach, perhaps even a tool-set (using requirements), and then define the flows that are decomposed down to the raw data.
The advantages of this approach include:
The ability to spend time defining the higher levels of abstraction without being limited by the underlying integration details. This typically means that those charged with designing the integration flows are more concerned with how they have to deal with the underlying source and target, and this approach means that they don’t have to deal with that issue until later, as they break down the flows.
The disadvantages of this approach include:
The data integration architect does not consider the specific needs of the source or target systems, in many instances, and thus some rework around the higher level flows may have to occur later. That causes inefficiencies, and could add risk and cost to the final design and implementation.
For the most part, this is the approach that most choose for data integration. Indeed, I use this approach about 75 percent of the time. The process is to start from the native data in the sources and targets, and work your way up to the integration flows. This typically means that those charged with designing the integration flows are more concerned with the underlying data semantic mediation than the flows.
The advantages of this approach include:
It’s typically a more natural and traditional way of approaching data integration. Called “data-driven” integration design in many circles, this initially deals with the details, so by the time you get up to the integration flows there are few surprises, and there’s not much rework to be done. It’s a bit less risky and less expensive, in most cases.
The disadvantages of this approach include:
Starting with the details means that you could get so involved in the details that you miss the larger picture, and the end state of your architecture appears to be poorly planned, when all is said and done. Of course, that depends on the types of data integration problems you’re looking to solve.
No matter which approach you leverage, with some planning and some strategic thinking, you’ll be fine. However, there are different paths to the same destination, and some paths are longer and less efficient than others. As you pick an approach, learn as you go, and adjust as needed.
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.
The way I see it, the biggest impact of the Apple Watch will come from how it will finally make data fashionable. For starters, the three Apple Watch models and interchangeable bands will actually make it hip to wear a watch again. But I think the ramifications of this genuinely good-looking watch go well beyond the skin deep. The Cupertino company has engineered its watch and its mobile software to recognize related data and seamlessly share it across relevant apps. And those capabilities allow it to, for instance, monitor our fitness and health, show us where we parked the car, open the door to our hotel room and control our entertainment centers.
Think what this could mean for any company with a Data-First point of view. I like to say that a data-first POV changes everything. With it, companies can unleash the killer app, killer marketing campaign and killer sales organization.The Apple Watch finally gives people a reason to have that killer app with them at all times, wherever they are and whatever they’re doing. Looked at a different way, it could unleash a new culture of Data-Only consumers: People who rely on being told what they need to know, in the right context.
But while Apple may the first to push this Data-First POV in unexpected ways, history has shown they won’t be the last. It’s time for every company to tap into the newest fashion accessory, and make data their first priority.
That second question is a killer because most people — no matter if they’re in marketing, sales or manufacturing — rely on incomplete, inaccurate or just plain wrong information. Regardless of industry, we’ve been fixated on historic transactions because that’s what our systems are designed to provide us.
“Moneyball: The Art of Winning an Unfair Game” gives a great example of what I mean. The book (not the movie) describes Billy Beane hiring MBAs to map out the factors that would win a baseball game. They discovered something completely unexpected: That getting more batters on base would tire out pitchers. It didn’t matter if batters had multi-base hits, and it didn’t even matter if they walked. What mattered was forcing pitchers to throw ball after ball as they faced an unrelenting string of batters. Beane stopped looking at RBIs, ERAs and even home runs, and started hiring batters who consistently reached first base. To me, the book illustrates that the most useful knowledge isn’t always what we’ve been programmed to depend on or what is delivered to us via one app or another.
For years, people across industries have turned to ERP, CRM and web analytics systems to forecast sales and acquire new customers. By their nature, such systems are transactional, forcing us to rely on history as the best predictor of the future. Sure it might be helpful for retailers to identify last year’s biggest customers, but that doesn’t tell them whose blogs, posts or Tweets influenced additional sales. Isn’t it time for all businesses, regardless of industry, to adopt a different point of view — one that we at Informatica call “Data-First”? Instead of relying solely on transactions, a data-first POV shines a light on interactions. It’s like having a high knowledge IQ about relationships and connections that matter.
A data-first POV changes everything. With it, companies can unleash the killer app, the killer sales organization and the killer marketing campaign. Imagine, for example, if a sales person meeting a new customer knew that person’s concerns, interests and business connections ahead of time? Couldn’t that knowledge — gleaned from Tweets, blogs, LinkedIn connections, online posts and transactional data — provide a window into the problems the prospect wants to solve?
That’s the premise of two startups I know about, and it illustrates how a data-first POV can fuel innovation for developers and their customers. Today, we’re awash in data-fueled things that are somehow attached to the Internet. Our cars, phones, thermostats and even our wristbands are generating and gleaning data in new and exciting ways. That’s knowledge begging to be put to good use. The winners will be the ones who figure out that knowledge truly is power, and wield that power to their advantage.
I was recently searching for fishing rods for my 5-year old son and his friends to use at our neighborhood pond. I know nothing about fishing, so I needed to get educated. First up, a Google search on my laptop at home. Then, I jostled between my phone, tablet and laptop visiting websites, reading descriptions, looking at photos and reading reviews. Offline, I talked to friends and visited local stores recently, searching for fishing rods for my 5-year old son and his friends to use at our neighborhood pond. I know nothing about fishing, so I needed to get educated. First up, a Google search on my laptop at home. Then, I jostled between my phone, tablet and laptop visiting websites, reading descriptions, looking at photos and reading reviews. Offline, I talked to friends and visited local stores.
This blog post initially appeared on CMSwire.com and is reblogged here with their consent.
The product descriptions weren’t very helpful. What is a “practice casting plug”? Turns out, this was a great feature! Instead of a hook, the rod had a rubber fish to practice casting safely. What a missed opportunity for the retailers who didn’t share this information. I bought the fishing rods from the retailer that educated me with valuable product information and offered free three to five day shipping.
What does this mean for companies who sell products across multiple channels?
Virtually everyone is a cross-channel shopper: 95 percent of consumers frequently or at least occasionally shop a retailer’s website and store, according to the “Omni-Channel Insights” study by CFI Group. In the report, “The Omnichannel Opportunity: Unlocking the Power of the Connected Customer,” Deloitte predicts more than 50 percent of in-store purchases will be influenced digitally by the end of 2014.
Because of all this crosschannel activity, a new term is trending: omnichannel
What Does Omnichannel Mean?
Let’s take a look back in time. Retailers started with one channel — the brick-and-mortar store. Then they introduced the catalog and call center. Then they built another channel — e-Commerce. Instead of making it an extension of the brick-and-mortar experience, many implemented an independent strategy, including operations, resources, technology and inventory. Retailers recently started integrating brick-and-mortar and e-Commerce channels, but it’s not always consistent. And now they are building another channel — mobile sites and apps.
Multichannel is a retailer-centric, transaction-focused view of operations. Each channel operates and aims to boost sales independently. Omnichannel is a customer-centric view. The goal is to understand through which channels customers want to engage at each stage of the shopping journey and enable a seamless, integrated and consistent brand experience across channels and devices.
Shoppers expect an omnichannel experience, but delivering it efficiently isn’t easy. Those responsible for enabling an omnichannel experience are encountering barriers. Let’s look at the three barriers most relevant for marketing, merchandising, sales, customer experience and information management leaders.
Barrier #1: Shift from product-centric to customer-centric view
Many retailers focus on how many products are sold by channel. Three key questions are:
- How can we drive store sales growth?
- How can we drive online sales growth?
- What’s our mobile strategy?
This is the old way of running a retail business. The new way is analyzing customer data to understand how they are engaging and transacting across channels.
Why is this difficult? At the Argyle eCommerce Leadership Forum, Vice President of Multichannel at GameStop Corp Jason Allen shared the $8.8 billion video game retailer’s approach to overcoming this barrier. While online represents 3 percent of sales, no one measured how much the online channel was influencing overall business.
They started by collecting customer data for analytics to find out who their customers were and how they interacted with Game Stop online and in 6,600 stores across 15 countries. The analysis revealed customers used multiple channels: 60 percent engaged on the web, and 26 percent of web visitors who didn’t buy online bought in-store within 48 hours.
This insight changed the perception of the online channel as a small contributor. Now they use two metrics to measure performance. While the online channel delivers 3 percent of sales, it influences 22 percent of overall business.
Take Action: Start collecting customer data. Analyze it. Learn who your customers are. Find out how they engage and transact with your business across channels.
Barrier #2: Shift from fragmented customer data to centralized customer data everyone can use
Nikki Baird, Managing Partner at Retail Systems Research (RSR), told me she believes the fundamentals of retail are changing from “right product, right price, right place, right time” to:
- Who is my customer?
- What are they trying to accomplish?
- How can we help?
According to RSR, creating a consistent customer experience remains the most valued capability for retailers, but 54 percent indicated their biggest inhibitor was not having a single view of the customer across channels.
Why is this difficult? A $12 billion specialty retailer known for its relentless focus on customer experience, with 200 stores and an online channel had to overcome this barrier. To deliver a high-touch omnichannel experience, they needed to replace the many views of the customer with one unified customer view. They invested in master data management (MDM) technology and competencies.
Now they bring together customer, employee and product data scattered across 30 applications (e.g., e-Commerce, POS, clienteling, customer service, order management) into a central location, where it’s managed and shared on an ongoing basis. Employees’ applications are fueled with clean, consistent and connected customer data. They are able to deliver a high-touch omnichannel experience because they can answer important questions about customers and their valuable relationships, such as:
- Who is this customer and who’s in their household?
- Who do they buy for, what do they buy, where do they buy?
- Which employees do they typically buy from in store?
Take Action: Think of the valuable information customers share when they interact with different parts of your business. Tap into it by bridging customer information silos. Bring fragmented customer information together in one central location. Make it universally accessible. Don’t let it remain locked up in departmental applications. Keep it up-to-date. Automate the process of updating customer information across departmental applications.
Barrier #3: Shift from fragmented product data to centralized product data everyone can use
Two-thirds of purchase journeys start with a Google search. To have a fighting chance, retailers need rich and high quality product information to rank higher than the competition.
Take a look at the image on the left. Would you buy this product? Probably not. One-third of shoppers who don’t make a purchase didn’t have enough information to make a purchase decision. What product information does a shopper need to convert in the moment? Rich, high quality information has conversion power.
Consumers return about 40 percent of all fashion and 15 percent of electronics purchases. That’s not good for retailers or shoppers. Minimize costly returns with complete product information so shoppers can make more informed purchase decisions. Jason Allen’s advice is, “Focus less on the cart and check out. Focus more on search, product information and your store locator. Eighty percent of customers are coming to the web for research.”
Why is this difficult? Crestline is a multichannel direct marketing firm selling promotional products through direct mail and e-Commerce. The barrier to quickly bringing products to market and updating product information across channels was fragmented and complex product information. To replace the manual, time consuming spreadsheet process to manage product information, they invested in product information management (PIM) technology.
Now Crestline’s product introduction and update process is 300 percent more efficient. Because they are 100 percent current on top products and over 50 percent current for all products, the company is boosting margins and customer service.
Take Action: Think about all the product information shoppers need to research and make a decision. Tap into it by bridging product information silos. Bring fragmented product information together in one central location. Make it universally usable, not channel-specific. Keep it up-to-date. Automate the process of publishing product information across channels, including the applications used by customer service and store associates.
Delivering an omnichannel experience efficiently isn’t easy. The Game Stop team collected and analyzed customer data to learn more about who their customers are and how they interact with the company. A specialty retailer centralized fragmented customer data. Crestline centralized product information to accelerate their ability to bring products to market and make updates across channels. Which of these barriers are holding you back from delivering an omnichannel experience?
It is troublesome to me to repeatedly get into conversations with IT managers who want to fix data “for the sake of fixing it”. While this is presumably increasingly rare, due to my department’s role, we probably see a higher occurrence than the normal software vendor employee. Given that, please excuse the inflammatory title of this post.
Nevertheless, once the deal is done, we find increasingly fewer of these instances, yet still enough, as the average implementation consultant or developer cares about this aspect even less. A few months ago a petrochemical firm’s G&G IT team lead told me that he does not believe that data quality improvements can or should be measured. He also said, “if we need another application, we buy it. End of story.” Good for software vendors, I thought, but in most organizations $1M here or there do not lay around leisurely plus decision makers want to see the – dare I say it – ROI.
However, IT and business leaders should take note that a misalignment due to lack OR disregard of communication is a critical success factor. If the business does not get what it needs and wants AND it differs what Corporate IT is envisioning and working on – and this is what I am talking about here – it makes any IT investment a risky proposition.
Let me illustrate this with 4 recent examples I ran into:
1. Potential for flawed prioritization
A retail customer’s IT department apparently knew that fixing and enriching a customer loyalty record across the enterprise is a good and financially rewarding idea. They only wanted to understand what the less-risky functional implementation choices where. They indicated that if they wanted to learn what the factual financial impact of “fixing” certain records or attributes, they would just have to look into their enterprise data warehouse. This is where the logic falls apart as the warehouse would be just as unreliable as the “compromised” applications (POS, mktg, ERP) feeding it.
Even if they massaged the data before it hit the next EDW load, there is nothing inherently real-time about this as all OLTP are running processes of incorrect (no bidirectional linkage) and stale data (since the last load).
I would question if the business is now completely aligned with what IT is continuously correcting. After all, IT may go for the “easy or obvious” fixes via a weekly or monthly recurring data scrub exercise without truly knowing, which the “biggest bang for the buck” is or what the other affected business use cases are, they may not even be aware of yet. Imagine the productivity impact of all the roundtripping and delay in reporting this creates. This example also reminds me of a telco client, I encountered during my tenure at another tech firm, which fed their customer master from their EDW and now just found out that this pattern is doomed to fail due to data staleness and performance.
2. Fix IT issues and business benefits will trickle down
Client number two is a large North American construction Company. An architect built a business case for fixing a variety of data buckets in the organization (CRM, Brand Management, Partner Onboarding, Mobility Services, Quotation & Requisitions, BI & EPM).
Grand vision documents existed and linked to the case, which stated how data would get better (like a sick patient) but there was no mention of hard facts of how each of the use cases would deliver on this. After I gave him some major counseling what to look out and how to flesh it out – radio silence. Someone got scared of the math, I guess.
3. Now that we bought it, where do we start
The third culprit was a large petrochemical firm, which apparently sat on some excess funds and thought (rightfully so) it was a good idea to fix their well attributes. More power to them. However, the IT team is now in a dreadful position having to justify to their boss and ultimately the E&P division head why they prioritized this effort so highly and spent the money. Well, they had their heart in the right place but are a tad late. Still, I consider this better late than never.
4. A senior moment
The last example comes from a South American communications provider. They seemingly did everything right given the results they achieved to date. This gets to show that misalignment of IT and business does not necessarily wreak havoc – at least initially.
However, they are now in phase 3 of their roll out and reality caught up with them. A senior moment or lapse in judgment maybe? Whatever it was; once they fixed their CRM, network and billing application data, they had to start talking to the business and financial analysts as complaints and questions started to trickle in. Once again, better late than never.
So what is the take-away from these stories. Why wait until phase 3, why have to be forced to cram some justification after the purchase? You pick, which one works best for you to fix this age-old issue. But please heed Sohaib’s words of wisdom recently broadcast on CNN Money “IT is a mature sector post bubble…..now it needs to deliver the goods”. And here is an action item for you – check out the new way for the business user to prepare their own data (30 minutes into the video!). Agreed?
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
Murphy’s First Law of Bad Data – If You Make A Small Change Without Involving Your Client – You Will Waste Heaps Of Money
I have not used my personal encounter with bad data management for over a year but a couple of weeks ago I was compelled to revive it. Why you ask? Well, a complete stranger started to receive one of my friend’s text messages – including mine – and it took days for him to detect it and a week later nobody at this North American wireless operator had been able to fix it. This coincided with a meeting I had with a European telco’s enterprise architecture team. There was no better way to illustrate to them how a customer reacts and the risk to their operations, when communication breaks down due to just one tiny thing changing – say, his address (or in the SMS case, some random SIM mapping – another type of address).
In my case, I moved about 250 miles within the United States a couple of years ago and this seemingly common experience triggered a plethora of communication screw ups across every merchant a residential household engages with frequently, e.g. your bank, your insurer, your wireless carrier, your average retail clothing store, etc.
For more than two full years after my move to a new state, the following things continued to pop up on a monthly basis due to my incorrect customer data:
- In case of my old satellite TV provider they got to me (correct person) but with a misspelled last name at my correct, new address.
- My bank put me in a bit of a pickle as they sent “important tax documentation”, which I did not want to open as my new tenants’ names (in the house I just vacated) was on the letter but with my new home’s address.
- My mortgage lender sends me a refinancing offer to my new address (right person & right address) but with my wife’s as well as my name completely butchered.
- My wife’s airline, where she enjoys the highest level of frequent flyer status, continually mails her offers duplicating her last name as her first name.
- A high-end furniture retailer sends two 100-page glossy catalogs probably costing $80 each to our address – one for me, one for her.
- A national health insurer sends “sensitive health information” (disclosed on envelope) to my new residence’s address but for the prior owner.
- My legacy operator turns on the wrong premium channels on half my set-top boxes.
- The same operator sends me a SMS the next day thanking me for switching to electronic billing as part of my move, which I did not sign up for, followed by payment notices (as I did not get my invoice in the mail). When I called this error out for the next three months by calling their contact center and indicating how much revenue I generate for them across all services, they counter with “sorry, we don’t have access to the wireless account data”, “you will see it change on the next bill cycle” and “you show as paper billing in our system today”.
Ignoring the potential for data privacy law suits, you start wondering how long you have to be a customer and how much money you need to spend with a merchant (and they need to waste) for them to take changes to your data more seriously. And this are not even merchants to whom I am brand new – these guys have known me and taken my money for years!
One thing I nearly forgot…these mailings all happened at least once a month on average, sometimes twice over 2 years. If I do some pigeon math here, I would have estimated the postage and production cost alone to run in the hundreds of dollars.
However, the most egregious trespass though belonged to my home owner’s insurance carrier (HOI), who was also my mortgage broker. They had a double whammy in store for me. First, I received a cancellation notice from the HOI for my old residence indicating they had cancelled my policy as the last payment was not received and that any claims will be denied as a consequence. Then, my new residence’s HOI advised they added my old home’s HOI to my account.
After wondering what I could have possibly done to trigger this, I called all four parties (not three as the mortgage firm did not share data with the insurance broker side – surprise, surprise) to find out what had happened.
It turns out that I had to explain and prove to all of them how one party’s data change during my move erroneously exposed me to liability. It felt like the old days, when seedy telco sales people needed only your name and phone number and associate it with some sort of promotion (back of a raffle card to win a new car), you never took part in, to switch your long distance carrier and present you with a $400 bill the coming month. Yes, that also happened to me…many years ago. Here again, the consumer had to do all the legwork when someone (not an automatic process!) switched some entry without any oversight or review triggering hours of wasted effort on their and my side.
We can argue all day long if these screw ups are due to bad processes or bad data, but in all reality, even processes are triggered from some sort of underlying event, which is something as mundane as a database field’s flag being updated when your last purchase puts you in a new marketing segment.
Now imagine you get married and you wife changes her name. With all these company internal (CRM, Billing, ERP), free public (property tax), commercial (credit bureaus, mailing lists) and social media data sources out there, you would think such everyday changes could get picked up quicker and automatically. If not automatically, then should there not be some sort of trigger to kick off a “governance” process; something along the lines of “email/call the customer if attribute X has changed” or “please log into your account and update your information – we heard you moved”. If American Express was able to detect ten years ago that someone purchased $500 worth of product with your credit card at a gas station or some lingerie website, known for fraudulent activity, why not your bank or insurer, who know even more about you? And yes, that happened to me as well.
Tell me about one of your “data-driven” horror scenarios?