Category Archives: Profiling
In my last blog, I talked about the dreadful experience of cleaning raw data by hand as a former analyst a few years back. Well, the truth is, I was not alone. At a recent data mining Meetup event in San Francisco bay area, I asked a few analysts: “How much time do you spend on cleaning your data at work?” “More than 80% of my time” and “most my days” said the analysts, and “they are not fun”.
But check this out: There are over a dozen Meetup groups focused on data science and data mining here in the bay area I live. Those groups put on events multiple times a month, with topics often around hot, emerging technologies such as machine learning, graph analysis, real-time analytics, new algorithm on analyzing social media data, and of course, anything Big Data. Cools BI tools, new programming models and algorithms for better analysis are a big draw to data practitioners these days.
That got me thinking… if what analysts said to me is true, i.e., they spent 80% of their time on data prepping and 1/4 of that time analyzing the data and visualizing the results, which BTW, “is actually fun”, quoting a data analyst, then why are they drawn to the events focused on discussing the tools that can only help them 20% of the time? Why wouldn’t they want to explore technologies that can help address the dreadful 80% of the data scrubbing task they complain about?
Having been there myself, I thought perhaps a little self-reflection would help answer the question.
As a student of math, I love data and am fascinated about good stories I can discover from them. My two-year math program in graduate school was primarily focused on learning how to build fabulous math models to simulate the real events, and use those formula to predict the future, or look for meaningful patterns.
I used BI and statistical analysis tools while at school, and continued to use them at work after I graduated. Those software were great in that they helped me get to the results and see what’s in my data, and I can develop conclusions and make recommendations based on those insights for my clients. Without BI and visualization tools, I would not have delivered any results.
That was fun and glamorous part of my job as an analyst, but when I was not creating nice charts and presentations to tell the stories in my data, I was spending time, great amount of time, sometimes up to the wee hours cleaning and verifying my data, I was convinced that was part of my job and I just had to suck it up.
It was only a few months ago that I stumbled upon data quality software – it happened when I joined Informatica. At first I thought they were talking to the wrong person when they started pitching me data quality solutions.
Turns out, the concept of data quality automation is a highly relevant and extremely intuitive subject to me, and for anyone who is dealing with data on the regular basis. Data quality software offers an automated process for data cleansing and is much faster and delivers more accurate results than manual process. To put that in math context, if a data quality tool can reduce the data cleansing effort from 80% to 40% (btw, this is hardly a random number, some of our customers have reported much better results), that means analysts can now free up 40% of their time from scrubbing data, and use that times to do the things they like – playing with data in BI tools, building new models or running more scenarios, producing different views of the data and discovering things they may not be able to before, and do all of that with clean, trusted data. No more bored to death experience, what they are left with are improved productivity, more accurate and consistent results, compelling stories about data, and most important, they can focus on doing the things they like! Not too shabby right?
I am excited about trying out the data quality tools we have here at Informtica, my fellow analysts, you should start looking into them also. And I will check back in soon with more stories to share..
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
I recently had a lengthy conversation with a business executive of a European telco. His biggest concern was to not only understand the motivations and related characteristics of consumers but to accomplish this insight much faster than before. Given available resources and current priorities this is something unattainable for many operators.
Unlike a few years ago – remember the time before iPad – his organization today is awash with data points from millions of devices, hundreds of device types and many applications.
One way for him to understand consumer motivation; and therefore intentions, is to get a better view of a user’s network and all related interactions and transactions. This includes his family household, friends and business network (also a type of household). The purpose of householding is to capture social and commercial relationships in a grouping of individuals (or businesses or both mixed together) in order to identify patterns (context), which can be exploited to better serve a customer a new individual product or bundle upsell, to push relevant apps, audio and video content.
Let’s add another layer of complexity by understanding not only who a subscriber is, who he knows and how often he interacts with these contacts and the services he has access to via one or more devices but also where he physically is at the moment he interacts. You may also combine this with customer service and (summarized) network performance data to understand who is high-value, high-overhead and/or high in customer experience. Most importantly, you will also be able to assess who will do what next and why.
Some of you may be thinking “Oh gosh, the next NSA program in the making”. Well, it may sound like it but the reality is that this data is out there today, available and interpretable if cleaned up, structured and linked and served in real time. Not only do data quality, ETL, analytical and master data systems provide the data backbone for this reality but process-based systems dealing with the systematic real-time engagement of consumers are the tool to make it actionable. If you add some sort of privacy rules using database or application-level masking technologies, most of us would feel more comfortable about this proposition.
This may feel like a massive project but as many things in IT life; it depends on how you scope it. I am a big fan of incremental mastering of increasingly more attributes of certain customer segments, business units, geographies, where lessons learnt can be replicated over and over to scale. Moreover, I am a big fan of figuring out what you are trying to achieve before even attempting to tackle it.
The beauty behind a “small” data backbone – more about “small data” in a future post – is that if a certain concept does not pan out in terms of effort or result, you have just wasted a small pile of cash instead of the $2 million for a complete throw-away. For example: if you initially decided that the central lynch pin in your household hub & spoke is the person, who owns the most contracts with you rather than the person who pays the bills every month or who has the largest average monthly bill, moving to an alternative perspective does not impact all services, all departments and all clients. Nevertheless, the role of each user in the network must be defined over time to achieve context, i.e. who is a contract signee, who is a payer, who is a user, who is an influencer, who is an employer, etc.
Why is this important to a business? It is because without the knowledge of who consumes, who pays for and who influences the purchase/change of a service/product, how can one create the right offers and target them to the right individual.
However, in order to make this initial call about household definition and scope or look at the options available and sensible, you have to look at social and cultural conventions, what you are trying to accomplish commercially and your current data set’s ability to achieve anything without a massive enrichment program. A couple of years ago, at a Middle Eastern operator, it was very clear that the local patriarchal society dictated that the center of this hub and spoke model was the oldest, non-retired male in the household, as all contracts down to children of cousins would typically run under his name. The goal was to capture extended family relationships more accurately and completely in order to create and sell new family-type bundles for greater market penetration and maximize usage given new bandwidth capacity.
As a parallel track aside from further rollout to other departments, customer segments and geos, you may also want to start thinking like another European operator I engaged a couple of years ago. They were trying to outsource some data validation and enrichment to their subscribers, which allowed for a more accurate and timely capture of changes, often life-style changes (moves, marriages, new job). The operator could then offer new bundles and roaming upsells. As a side effect, it also created a sense of empowerment and engagement in the client base.
I see bits and pieces of some of this being used when I switch on my home communication systems running broadband signal through my X-Box or set-top box into my TV using Netflix and Hulu and gaming. Moreover, a US cable operator actively promotes a “moving” package to help make sure you do not miss a single minute of entertainment when relocating.
Every time now I switch on my TV, I get content suggested to me. If telecommunication services would now be a bit more competitive in the US (an odd thing to say in every respect) and prices would come down to European levels, I would actually take advantage of the offer. And then there is the log-on pop up asking me to subscribe (or throubleshoot) a channel I have already subscribed to. Wonder who or what automated process switched that flag.
Ultimately, there cannot be a good customer experience without understanding customer intentions. I would love to hear stories from other practitioners on what they have seen in such respect
The challenge for supermarkets today is balancing the needs of the customer against their ability to serve those needs. How are supermarkets and food manufacturers preparing their business for e-readiness? What about more customer centricity?
Currently, brands are not particularly good at serving consistent product information across in-store and online environments, leading to lower conversions and poor customer satisfaction. This shortfall is also preventing these brands from moving forward and innovating with new technologies. As a result, Product Information Management (PIM) is becoming a significant focus in effective omnichannel initiatives.
Consider the large range of products that can be seen at the average grocery store. The sheer number of categories is staggering, before you even consider the quantity of items in each category. There’s little wonder of local brands are struggling to replicate this level of product data anywhere else but on their store shelves.
Furthermore, consider the various kinds of information supermarkets are expected to include. Then, add to this the kinds of information supermarkets could include in order to present a competitive advantage over and above the rest. Information types currently possible are: Ingredients, additives, Images and videos, marketing copy, gene manipulation information, references, product seals, allergens, nutritional facts, translations, product categories, expiration/use-by dates, variants, region-specific information, GSDN information and more.
Ultimately, supermarkets are already on the path of improving consumers’ shopping experience and a few of the emerging technologies indicate the way this industry will continue to evolve.
6 Examples of food retail and supermarket trends
The below six examples demonstrate an emerging trend in grocery shopping, while also highlighting the need for accurate product information creation, curation and distribution.
- Ready-to-cook product bundles: Nice and very customer facing concept is done by German food retailer www.kochhaus.de (meaning house of cooking). The only offer product bundles of all ingredients which are required to cook a certain meal for the required number of guests. It can be seen as the look books which are well established at fashion brands and retailers sales strategy.
- Self-checkout Systems – More supermarkets are beginning to include self-checkouts. American and UK companies lead, Germany or Australia are behind. But there is the same risk of cart abandonment here as there is online, so providing a comprehensive and rich suite of product information at these POS systems is crucial.
- In-store Information Kiosks – Some supermarkets are beginning to include interactive displays in-store, with some even providing tablets mounted onto shopping trolleys. These displays serve in place of an in-store sales assistant, providing consumers with directions, promotions and complete access to product information (such as stock levels) on any item in the store.
- Supermarket Pop-ups – Food retailers are increasingly experimenting and improving the traditional shopping experience. One example that has turned the bricks-and-mortar concept on its head is electronic shopping ‘walls’, where products are prominently displayed in a high-traffic area. Consumers are able to access product details and make purchases by scanning a code presented alongside the image of a given product.
- Store-to-door Delivery Services – It’s starting to become commonplace. Not only are supermarkets offering same-day delivery services, the major brands are also experimenting with click and collect services. These supermarkets are moving toward websites that are just as busy and provide as much, if not more relevant content as their bricks-and-mortar outlets.
- App Commerce: Companies, like German food retailer Edeka offer an app for push marketing, or help matching customer profiles of dietary or allergy profiles with QR-code scanned products on the shopping list within the supermarket app.
What is next?
The supermarket of the future:
Reviving Customer Loyalty with leveraging information potential
Due to the increased transparency brought on by the ‘Google Era’, retailers have experienced a marked decline in customer loyalty. This concept of omnichannel shopping behaviour has led previously loyal customers to shop elsewhere.
Putting customers in the centre of all retail activities may not be a new trend, but in order to achieve it, retailers must foster more intelligent touch points. The supermarkets of the future will combine both product and customer data in such a way that every touch point presents a uniquely personalised experience for the customer, and a single, 360-degree view of the customer to the retailer.
The major supermarket brands already have comprehensive customer loyalty programs and they’re building on these with added products, such as consumer insurance packages. However, these initiatives haven’t necessarily led to an increase in loyalty.
Instead, the imperative to create a personal, intimate connection with consumers will eventually lead to a return in loyalty. The supermarket of the future will be able to send recipe and shopping list recommendations directly to the shopper’s preferred device, taking into account any allergies or delivery preferences.
Gamification as a tool for loyalty?
Moreover, this evolution will slowly lead into another phase of loyalty marketing: gamification. Comprehensive and detailed product data will form the basis of a loyalty program that includes targets, goals and rewards for loyal customers. The more comprehensive and engaging these shopping ‘games’ become, the more successful they will be from a marketing and loyalty perspective. However, the demands for detailed, accurate product information will also increase accordingly.
Private side note: My wife likes the simple Edaka App Game, where users need to cut slices of sausages. The challenge you need to hit exactly the weight the customer requires, like the in-store associate.
Those supermarkets that can deploy these initiatives first – and continue to innovate beyond this point – will have a bright future. Those that lag behind when it comes to leveraging their information and real time process might quickly begin to fade away.
What can I cook of my fridge remains?
I have been working all week long on the next year planning, so my fridge was not feeded well this week. Being almost empty the asks are
- What products are left?
- When do they expire?
- What can I cook of my fridge leftovers? (receipts)
- Where do I get the missing items for dinner with my wife? – And for which price
- Do they all match with my dietary and here allergy to nuts?
- Can I order online?
- When will they get delivered?
- What things can make our evening a success? The right wine recommendation? Two candles?
Well it is up to your imagination which products also can be sold in addition to make the customer happy and create a nice candle light dinner… But at least a good reason to increase the assortment.
I believe that most in the software business believe that it is tough enough to calculate and hence financially justify the purchase or build of an application – especially middleware – to a business leader or even a CIO. Most of business-centric IT initiatives involve improving processes (order, billing, service) and visualization (scorecarding, trending) for end users to be more efficient in engaging accounts. Some of these have actually migrated to targeting improvements towards customers rather than their logical placeholders like accounts. Similar strides have been made in the realm of other party-type (vendor, employee) as well as product data. They also tackle analyzing larger or smaller data sets and providing a visual set of clues on how to interpret historical or predictive trends on orders, bills, usage, clicks, conversions, etc.
If you think this is a tough enough proposition in itself, imagine the challenge of quantifying the financial benefit derived from understanding where your “hardware” is physically located, how it is configured, who maintained it, when and how. Depending on the business model you may even have to figure out who built it or owns it. All of this has bottom-line effects on how, who and when expenses are paid and revenues get realized and recognized. And then there is the added complication that these dimensions of hardware are often fairly dynamic as they can also change ownership and/or physical location and hence, tax treatment, insurance risk, etc.
Such hardware could be a pump, a valve, a compressor, a substation, a cell tower, a truck or components within these assets. Over time, with new technologies and acquisitions coming about, the systems that plan for, install and maintain these assets become very departmentalized in terms of scope and specialized in terms of function. The same application that designs an asset for department A or region B, is not the same as the one accounting for its value, which is not the same as the one reading its operational status, which is not the one scheduling maintenance, which is not the same as the one billing for any repairs or replacement. The same folks who said the Data Warehouse is the “Golden Copy” now say the “new ERP system” is the new central source for everything. Practitioners know that this is either naiveté or maliciousness. And then there are manual adjustments….
Moreover, to truly take squeeze value out of these assets being installed and upgraded, the massive amounts of data they generate in a myriad of formats and intervals need to be understood, moved, formatted, fixed, interpreted at the right time and stored for future use in a cost-sensitive, easy-to-access and contextual meaningful way.
I wish I could tell you one application does it all but the unsurprising reality is that it takes a concoction of multiple. None or very few asset life cycle-supporting legacy applications will be retired as they often house data in formats commensurate with the age of the assets they were built for. It makes little financial sense to shut down these systems in a big bang approach but rather migrate region after region and process after process to the new system. After all, some of the assets have been in service for 50 or more years and the institutional knowledge tied to them is becoming nearly as old. Also, it is probably easier to engage in often required manual data fixes (hopefully only outliers) bit-by-bit, especially to accommodate imminent audits.
So what do you do in the meantime until all the relevant data is in a single system to get an enterprise-level way to fix your asset tower of Babel and leverage the data volume rather than treat it like an unwanted step child? Most companies, which operate in asset, fixed-cost heavy business models do not want to create a disruption but a steady tuning effect (squeezing the data orange), something rather unsexy in this internet day and age. This is especially true in “older” industries where data is still considered a necessary evil, not an opportunity ready to exploit. Fact is though; that in order to improve the bottom line, we better get going, even if it is with baby steps.
If you are aware of business models and their difficulties to leverage data, write to me. If you even know about an annoying, peculiar or esoteric data “domain”, which does not lend itself to be easily leveraged, share your thoughts. Next time, I will share some examples on how certain industries try to work in this environment, what they envision and how they go about getting there.
So wrote Potter Stewart, Associate Justice of the Supreme Court in Jacobellis v. Ohio opinion (1964). He was talking about pornography. The same holds true for data. For example, most business users have a hard time describing exactly what data they need for a new BI report, including what source system to get the data from, in sufficiently precise terms that allow designers, modelers and developers to build the report right the first time. But if you sit down with a user in front an analyst tool and profile the potential source data, they will tell you in an instant whether it’s the right data or not. (more…)
I just came back from MicroStrategy World. There were many conversations about social, mobile, cloud and big data. There was strong interest in cloud, clear adoption of mobile, and some big data adoption. eHarmony had a great presentation about how they handle big data with Informatica, and how they’re starting to use Hadoop with Informatica HParser running on Hadoop for processing JSON.
But that wasn’t the number one conversation. The one topic that everyone was interested in – and I talked to nearly 100 customers and partners over four days – was creating new reports faster, or Agile BI. (more…)
Today, agility and timely visibility are critical to the business. No wonder CIO.com, states that business intelligence (BI) will be the top technology priority for CIOs in 2012. However, is your data architecture agile enough to handle these exacting demands?
In his blog Top 10 Business Intelligence Predictions For 2012, Boris Evelson of Forrester Research, Inc., states that traditional BI approaches often fall short for the two following reasons (among many others):
- BI hasn’t fully empowered information workers, who still largely depend on IT
- BI platforms, tools and applications aren’t agile enough (more…)
If you haven’t already, I think you should read The Forrester Wave™: Data Virtualization, Q1 2012. For several reasons – one, to truly understand the space, and two, to understand the critical capabilities required to be a solution that solves real data integration problems.
At the very outset, let’s clearly define Data Virtualization. Simply put, Data Virtualization is foundational to Data Integration. It enables fast and direct access to the critical data and reports that the business needs and trusts. It is not to be confused with simple, traditional Data Federation. Instead, think of it as a superset which must complement existing data architectures to support BI agility, MDM and SOA. (more…)