Category Archives: Big Data
Despite spending more than $30 Billion in annual spending on Big Data, successful big data implementations elude most organizations. That’s the sobering assessment of a recent study of 226 senior executives from Capgemini, which found that only 13 percent feel they have truly have made any headway with their big data efforts.
The reasons for Big Data’s lackluster performance include the following:
- Data is in silos or legacy systems, scattered across the enterprise
- No convincing business case
- Ineffective alignment of Big Data and analytics teams across the organization
- Most data locked up in petrified, difficult to access legacy systems
- Lack of Big Data and analytics skills
Actually, there is nothing new about any of these issues – in fact, the perceived issues with Big Data initiatives so far map closely with the failed expect many other technology-driven initiatives. First, there’s the hype that tends to get way ahead of any actual well-functioning case studies. Second, there’s the notion that managers can simply take a solution of impressive magnitude and drop it on top of their organizations, expecting overnight delivery of profits and enhanced competitiveness.
Technology, and Big Data itself, is but a tool that supports the vision, well-designed plans and hard work of forward-looking organizations. Those managers seeking transformative effects need to look deep inside their organizations, at how deeply innovation is allowed to flourish, and in turn, how their employees are allowed to flourish. Think about it: if line employees suddenly have access to alternative ways of doing things, would they be allowed to run with it? If someone discovers through Big Data that customers are using a product differently than intended, do they have the latitude to promote that new use? Or do they have to go through chains of approval?
Big Data may be what everybody is after, but Big Culture is the ultimate key to success.
For its part, Capgemini provides some high-level recommendations for better baking in transformative values as part of Big Data initiatives, based on their observations of best-in-class enterprises:
The vision thing: “It all starts with vision,” says Capgemini’s Ron Tolido. “If the company executive leadership does not actively, demonstrably embrace the power of technology and data as the driver of change and future performance, nothing digitally convincing will happen. We have not even found one single exception to this rule. The CIO may live and breathe Big Data and there may even be a separate Chief Data Officer appointed – expect more of these soon – if they fail to commit their board of executives to data as the engine of success, there will be a dark void beyond the proof of concept.”
Establish a well-defined organizational structure: “Big Data initiatives are rarely, if ever, division-centric,” the Capgemini report states. “They often cut across various departments in an organization. Organizations that have clear organizational structures for managing rollout can minimize the problems of having to engage multiple stakeholders.”
Adopt a systematic implementation approach: Surprisingly, even the largest and most sophisticated organizations that do everything on process don’t necessarily approach Big Data this way, the report states. “Intuitively, it would seem that a systematic and structured approach should be the way to go in large-scale implementations. However, our survey shows that this philosophy and approach are rare. Seventy-four percent of organizations did not have well-defined criteria to identify, qualify and select Big Data use-cases. Sixty-seven percent of companies did not have clearly defined KPIs to assess initiatives. The lack of a systematic approach affects success rates.”
Adopt a “venture capitalist” approach to securing buy-in and funding: “The returns from investments in emerging digital technologies such as Big Data are often highly speculative, given the lack of historical benchmarks,” the Capgemini report points out. “Consequently, in many organizations, Big Data initiatives get stuck due to the lack of a clear and attributable business case.” To address this challenge, the report urges that Big Data leaders manage investments “by using a similar approach to venture capitalists. This involves making multiple small investments in a variety of proofs of concept, allowing rapid iteration, and then identifying PoCs that have potential and discarding those that do not.”
Leverage multiple channels to secure skills and capabilities: “The Big Data talent gap is something that organizations are increasingly coming face-to-face with. Closing this gap is a larger societal challenge. However, smart organizations realize that they need to adopt a multi-pronged strategy. They not only invest more on hiring and training, but also explore unconventional channels to source talent. Capgemini advises reaching out to partner organizations for the skills needed to develop Big Data initiatives. These can be employee exchanges, or “setting up innovation labs in high-tech hubs such as Silicon Valley.” Startups may also be another source of Big Data talent.
Over and over, when talking with people who are starting to learn Data Science, there’s a frustration that comes up: “I don’t know which programming language to start with.”
Moreover, it’s not just programming languages; it’s also software systems like Tableau, SPSS, etc. There is an ever-widening range of tools and programming languages and it’s difficult to know which one to select.
I get it. When I started focusing heavily on data science a few years ago, I reviewed all of the popular programming languages at the time: Python, R, SAS, D3, not to mention a few that in hindsight, really aren’t that great for analytics like Perl, Bash, and Java. I once read a suggestion to use arcane tools like UNIX’s AWK and SED.
There are so many suggestions, so much material, so many options; it becomes difficult to know what to learn first. There’s a mountain of content, and it’s difficult to know where to find the “gold nuggets”; the things to learn that will bring you the high return on time investment.
That’s the crux of the problem. The fact is – time is limited. Learning a new programming language is a large investment in your time, so you need to be strategic about which one you select. To be clear, some languages will yield a very high return on your investment. Other languages are purely auxiliary tools that you might use only a few times per year.
Let me make this easy for you: learn R first. Here’s why:
R is becoming the “lingua franca” of data science
R is becoming the lingua franca for data science. That’s not to say that it’s the only language, or that it’s the best tool for every job. It is, however, the most widely used and it is rising in popularity.
As I’ve noted before, O’Reilly Media conducted a survey in 2014 to understand the tools that data scientists are currently using. They found that R is the most popular programming language (if you exclude SQL as a “proper” programing language).
Looking more broadly, there are other rankings that look at programming language popularity in general. For example, Redmonk measures programming language popularity by examining discussion (on Stack Overflow) and usage (on GitHub). In their latest rankings, R placed 13th, the highest of any statistical programming language. Redmonk also noted that R has been rising in popularity over time.
A similar ranking by TIOBE, which ranks programming languages by the number of search engine searches, indicates a strong year over year rise for R.
Keep in mind that the Redmonk and TIOBE rankings are for all programming languages. When you look at these, R is now ranking among the most popular and most commonly used over all.
It’s often said that 80% of the work in data science is data manipulation. More often than not, you’ll need to spend significant amounts of your time “wrangling” your data; putting it into the shape you want. R has some of the best data management tools you’ll find.
The dplyr package in R makes data manipulation easy. It is the tool I wish I had years ago. When you “chain” the basic dplyr together, you can dramatically simplify your data manipulation workflow.
ggplot2 is one of the best data visualization tools around, as of 2015. What’s great about ggplot2 is that as you learn the syntax, you also learn how to think about data visualization.
I’ve said numerous times, that there is a deep structure to all statistical visualizations. There is a highly structured framework for thinking about and creating all data visualizations. ggplot2 is based on that framework. By learning ggplot2, you will learn how to think about visualizing data.
Finally, there’s machine learning. While I think most beginning data science students should wait to learn machine learning (it is much more important to learn data exploration first), machine learning is an important skill. When data exploration stops yielding insight, you need stronger tools.
When you’re ready to start using (and learning) machine learning, R has some of the best tools and resources.
One of the best, most referenced introductory texts on machine learning, An Introduction to Statistical Learning, teaches machine learning using the R programming language. Additionally, the Stanford Statistical Learning course uses this textbook, and teaches machine learning in R.
Summary: Learn R, and focus your efforts
Once you start to learn R, don’t get “shiny new object” syndrome.
You’re likely to see demonstrations of new techniques and tools. Just look at some of the dazzling data visualizations that people are creating.
Seeing other people create great work (and finding out that they’re using a different tool) might lead you to try something else. Trust me on this: you need to focus. Don’t get “shiny new object” syndrome. You need to be able to devote a few months (or longer) to really diving into one tool.
And as I noted above, you really want to build up your competence in skills across the data science workflow. You need to have solid skills at least in data visualization and data manipulation. You need to be able to do some serious data exploration in R before you start moving on.
Spending 100 hours on R will yield vastly better returns than spending 10 hours on 10 different tools. In the end, your time ROI will be higher by concentrating your efforts. Don’t get distracted by the “latest, sexy new thing.”
Let’s face it, building a Data Governance program is no overnight task. As one CDO puts it: ”data governance is a marathon, not a sprint”. Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative. Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.
Why bother then? Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company? Well, the drivers for data governance can vary for different organizations. Let’s take a close look at some of the motivations behind data governance program.
For companies in heavily regulated industries, establishing a formal data governance program is a mandate. When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors. Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.
A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business, you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.
Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.
Now that we have identified the drivers for data governance, how do we start? This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results. And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.
A rule of thumb? Start small, take one-step at a time and focus on producing something tangible.
Sounds easy, right? Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement a successful data governance practice that brings business impact to an enterprise organization.
If you are currently kicking the tires on setting up data governance practice in your organization, I’d like to invite you to visit a member-only website dedicated to Data Governance: http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics. I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark. More than 200 members have taken the assessment to gain better understanding of their current data governance program, so why not give it a shot?
Data Governance is a journey, likely a never-ending one. We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.
2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.
In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies. Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services. These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.
These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work. Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.
2014 was not just a year of market momentum for Informatica, but also one of new product development innovations. We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.
Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014. Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run. Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more. Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.
As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.
Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.
At long last, the anxiously awaited rules from the FAA have brought some clarity to the world of commercial drone use. Up until now, commercial drone use has been prohibited. The new rules, of course, won’t sit well with Amazon who would like to drop merchandise on your porch at all hours. But the rules do work really well for insurers who would like to use drones to service their policyholders. So now drones, and soon to be fleets of unmanned cars will be driving the roadways in any numbers of capacities. It seems to me to be an ambulance chaser’s dream come true. I mean who wouldn’t want some seven or eight figure payday from Google for getting rear-ended?
What about “Great Data”? What does that mean in the context of unmanned vehicles, both aerial and terrestrial? Let’s talk about two aspects. First, the business benefits of great data using unmanned drones.
An insurance adjuster or catastrophe responder can leverage an aerial drone to survey large areas from a central location. They will pin point the locations needing attention for further investigation. This is a common scenario that many insurers talk about when the topic of aerial drone use comes up. Second to that is the ability to survey damage in hard to reach locations like roofs or difficult terrain (like farmland). But this is where great data comes into play. Surveying, service and use of unmanned vehicles demands that your data can answer some of the following questions for your staff operating in this new world:
Where am I?
Quality data and geocoded locations as part of that data is critical. In order to locate key risk locations, your data must be able to coordinate with the lat/long of the location recorded by your unmanned vehicles and the location of your operator. Ensure clean data through robust data quality practices.
Where are my policyholders?
Knowing the location of your policyholders not only relies on good data quality, but on knowing who they are and what risks you are there to help service. This requires a total customer relationship solution where you have a full view of not only locations, but risks, coverages and entities making up each policyholder.
What am I looking at?
Archived, current and work in process imaging is a key place where a Big Data environment can assist over time. By comparing saved images with new and processing claims, claims fraud and additional opportunities for service can be detected quickly by the drone operator.
Now that we’ve answered the business value questions and leveraged this new technology to better service policyholders and speed claims, let’s turn to how great data can be used to protect the insurer and drone operator from liability claims. This is important. The FAA has stopped short of requiring commercial drone operators to carry special liability insurance, leaving that instead to the drone operators to orchestrate with their insurer. And now we’re back to great data. As everyone knows, accidents happen. Technology, especially robotic mobile technology is not infallible. Something will crash somewhere, hopefully not causing injury or death, but sadly that too will likely happen. And there is nothing that will keep the ambulance chasers at bay more than robust great data. Any insurer offering liability cover for a drone operator should require that some of the following questions be answered by the commercial enterprise. And the interesting fact is that this information should be readily available if the business questions above have been answered.
- Where was my drone?
- What was it doing?
- Was it functioning properly?
Properly using the same data management technology as in the previous questions will provide valuable data to be used as evidence in the case of liability against a drone operator. Insurers would be wise to ask these questions of their liability policyholders who are using unmanned technology as a way to gauge liability exposure in this brave new world. The key to the assessment of risk being robust data management and great data feeding the insurer’s unmanned policyholder service workers.
Time will tell all the great and imaginative things that will take place with this new technology. One thing is for certain. Great data management is required in all aspects from amazing customer service to risk mitigation in operations. Happy flying to everyone!!
I’ve spent most of my career working with new technology, most recently helping companies make sense of mountains of incoming data. This means, as I like to tell people, that I have the sexiest job in the 21st century.
Harvard Business Review put the data scientist into the national spotlight in their publication Data Scientist: The Sexiest Job of the 21st Century. Job trends data from Indeed.com confirms the rise in popularity for the position, showing that the number of job postings for data scientist positions increased by 15,000%.
In the meantime, the role of data scientist has changed dramatically. Data used to reside on the fringes of the operation. It was usually important but seldom vital – a dreary task reserved for the geekiest of the geeks. It supported every function but never seemed to lead them. Even the executives who respected it never quite absorbed it.
For every Big Data problem, the solution often rests on the shoulders of a data scientist. The role of the data scientist is similar in responsibility to the Wall Street “quants” of the 80s and 90s – now, these data experienced are tasked with the management of databases previously thought too hard to handle, and too unstructured to derive any value.
So, is it the sexiest job of the 21st Century?
Think of a data scientist more like the business analyst-plus, part mathematician, part business strategist, these statistical savants are able to apply their background in mathematics to help companies tame their data dragons. But these individuals aren’t just math geeks, per se.
A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a renaissance individual who really wants to learn and bring change to an organization.
If this sounds like you, the good news is demand for data scientists is far outstripping supply. Nonetheless, with the rising popularity of the data scientist – not to mention the companies that are hiring for these positions – you have to be at the top of your field to get the jobs.
Companies look to build teams around data scientists that ask the most questions about:
- How the business works
- How it collects its data
- How it intends to use this data
- What it hopes to achieve from these analyses
These questions were important because data scientists will often unearth information that can “reshape an entire company.” Obtaining a better understanding of the business’ underpinnings not only directs the data scientist’s research, but helps them present the findings and communicate with the less-analytical executives within the organization.
While it’s important to understand your own business, learning about the successes of other corporations will help a data scientist in their current job–and the next.
As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.
Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.
As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.
Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.
We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?
In our house when we paint a room, my husband does the big rolling of the walls or ceiling, I do the cut-in work. I am good at prepping the room, taping all the trim and deliberately painting the corners. However, I am thrifty and constantly concerned that we won’t have enough paint to finish a room. My husband isn’t afraid to use enough paint and is extremely efficient at painting a wall in a single even coat. As a result, I don’t do the big rolling and he doesn’t do the cutting in. It took us awhile to figure this out, and a few rooms had to be repainted while we were figuring it out. Now we know what we are good at, and what we need help with.
Payers roles are changing. Payers were previously focused on risk assessment, setting and collecting premiums, analyzing claims and making payments – all while optimizing revenues. Payers are pretty good at selling to employers, figuring out the cost/benefit ratio from an employers perspective and ensuring a good, profitable product. With the advent of the Affordable Healthcare Act along with a much more transient insured population, payers now must focus more on the individual insured and be able to communicate with the individuals in a more nimble manner than in the past.
Individual members will shop for insurance based on consumer feedback and price. They are interested in ease of enrollment and the ability to submit and substantiate claims quickly and intuitively. Payers are discovering that they need to help manage population health at a individual member level. And population health management requires less of a business-data analytics approach and more social media and gaming-style logic to understand patients. In this way, payers can help develop interventions to sustain behavioral changes for better health.
When designing such analytics, payers should consider the following key design steps:
- Extend data warehouses to an analytics appliance
- Invest in a big data platform to absorb patients’ social data
- Build predictive analytics for patient behavior
- Bridge collaborative and behavioral analytics with claims to build revenue and profitability
Due to payers’ mature predictive analytics competencies, they will have a much easier time in the next generation of population behavior compared to their provider counterparts. As clinical content is often unstructured compared to the claims data, payers need to pay extra attention to context and semantics when deciphering clinical content submitted by providers. Payers can use help from vendors that can help them understand unstructured data, individual members. They can then use that data to create fantastic predictive analytic solutions.
Achieving and maintaining a single, semantically consistent version of master data is crucial for every organization. As many companies are moving from an account or product-centric approach to a customer-centric model, master data management is becoming an important part of their enterprise data management strategy. MDM provides the clean, consistent and connected information your organizations need for you to –
- Empower customer facing teams to capitalize on cross-sell and up-sell opportunities
- Create trusted information to improve employee productivity
- Be agile with data management so you can make confident decisions in a fast changing business landscape
- Improve information governance and be compliant with regulations
But there are challenges ahead for the organizations. As Andrew White of Gartner very aptly wrote in a blog post, we are only half pregnant with Master Data Management. Andrew in his blog post talked about increasing number of inquiries he gets from organizations that are making some pretty simple mistakes in their approach to MDM without realizing the impact of those decisions on a long run.
Over last 10 years, I have seen many organizations struggle to implement MDM in a right way. Few MDM implementations have failed and many have taken more time and incurred cost before showing value.
So, what is the secret sauce?
A key factor for a successful MDM implementation lays in mapping your business objectives to features and functionalities offered by the product you are selecting. It is a phase where you ask right questions and get them answered. There are few great ways in which organizations can get this done and talking to analysts is one of them. The other option is to attend MDM focused events that allow you to talk to experts, learn from other customer’s experience and hear about best practices.
We at Informatica have been working hard to deliver you a flexible MDM platform that provides complete capabilities out of the box. But MDM journey is more than just technology and product features as we have learnt over the years. To ensure our customer success, we are sharing knowledge and best practices we have gained with hundreds of successful MDM and PIM implementations. The Informatica MDM Day, is a great opportunity for organizations where we will –
- Share best practices and demonstrate our latest features and functionality
- Show our product capabilities which will address your current and future master data challenges
- Provide you opportunity to learn from other customer’s MDM and PIM journeys.
- Share knowledge about MDM powered applications that can help you realize early benefits
- Share our product roadmap and our vision
- Provide you an opportunity to network with other like-minded MDM, PIM experts and practitioners
So, join us by registering today for our MDM Day event in New York on 24th February. We are excited to see you all there and walk with you towards MDM Nirvana.
You might think this year’s Valentine’s Day is no different than any other. But make no mistake – Valentine’s Day has never been more powered by technology or more fueled by data.
It doesn’t get a lot of press coverage, but did you know the online dating industry is now a billion dollar industry globally? As technology empowers us all to live and work in nearly any place, we can no longer rely on geographic colocation to find our friends and soul mates. So, it’s no surprise that online dating and social networking grew in popularity as the smartphone revolution happened over the last eight years. Dating and networking are no longer just about running into people at the local bar on a Friday night. People are connecting with one another on every consumer device they have available, from their computers to their tablets to their phones. This mass consumerization of devices and the online social applications we run on them have fundamentally changed how we all connect with one another in the offline world.
There’s a lot of talk about big data in the tech industry, but there isn’t a lot of understanding of how big data actually affects the real world. Online dating serves as a fantastic example here. Did you know about 1 out of 6 people in the United States is single and that about 1 of 3 of them are members of an online dating site? With tens of millions of members just in the United States, the opportunity to meet new people online has never been more compelling. But the challenge is helping people sift through a “big data” store of millions of profiles. The solution to this has been the use of predictive recommendation engines. We are all familiar with recommendation engines on e-commerce sites that give us suggestions for new items to buy. The exact same analytics are now being applied to people and their preferences to help them find new friends and companions. Big data analytics is not just some fancy technology used by retailers and Wall Street. The proof is in the math: 1 of 18 people in the United States is using big data analytics today to fulfill the most human needs of all in finding companionship.
However, this everyday revolution of big data analytics is not just limited to online dating. US News and World Report estimates that a whopping $19 billion dollars will be spent around Valentine’s Day this year. This spending includes gifts, flowers, candy, jewelry and other forms of celebration. The online sites that sell these products are no foreigners to big data either. Organizations with e-commerce sites, many of whom are Informatica customers, are collecting real-time weblog and clickstream information to dynamically offer the best possible customer experiences. Big data is not only helping to build relationships between consumers, but is also building them between enterprises and consumers.
In an increasingly data-driven world, you cannot connect people unless you can connect data. The billion dollar online dating industry and the $19 billion dollar Valentine’s Day industry would not exist if they were not fueled by the ability to quickly derive meaning out of data assets and turn them into compelling analytical outcomes. Informatica is powering this data-driven world of connected data and connected people by making all types of data ready for analytics. Our leading technologies have helped customers collect data quickly, re-direct workloads easily, and perfect datasets completely. So on this Valentine’s Day, I invite you to connect with your customers and help them to connect with one another by first connecting with your data. There is no better way to help get ready for love than by getting big data ready for analytics!