Tag Archives: Big Data
On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics Ready Insurer”.
Register for the Webinar on March 25th at 10am Pacific/ 1pm Eastern
Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more). Josh and Cindy offer perspectives on the five questions posed here. Please join Insurance Journal, Informatica and Hortonworks on March 25th for more on this exciting topic.
See the Hortonworks site for a second posting of this blog and more details on exciting innovations in Big Data.
- What makes a big data environment attractive to an insurer?
CM: Many insurance companies are using new types of data to create innovative products that better meet their customers’ risk needs. For example, we are seeing insurance for “shared vehicles” and new products for prevention services. Much of this innovation is made possible by the rapid growth in sensor and machine data, which the industry incorporates into predictive analytics for risk assessment and claims management.
Customers who buy personal lines of insurance also expect the same type of personalized service and offers they receive from retailers and telecommunication companies. They expect carriers to have a single view of their business that permeates customer experience, claims handling, pricing and product development. Big data in Hadoop makes that single view possible.
JL: Let’s face it, insurance is all about analytics. Better analytics leads to better pricing, reduced risk and better customer service. But here’s the issue. Existing data sources are costly in storing vast amounts of data and inflexible to adapt to changing needs of innovative analytics. Imagine kicking off a simulation or modeling routine one evening only to return in the morning and find it incomplete or lacking data that requires a special request of IT.
This is where big data environments are helping insurers. Larger, more flexible data sets allowing longer series of analytics to be run, generating better results. And imagine doing all that at a fraction of the cost and time of traditional data structures. Oh, and heaven forbid you ask a mainframe to do any of this.
- So we hear a lot about Big Data being great for unstructured data. What about traditional data types that have been used in insurance forever?
CM: Traditional data types are very important to the industry – it drives our regulatory reporting and much of the performance management reporting. This data will continue to play a very important role in the insurance industry and for companies.
However, big data can now enrich that traditional data with new data sources for new insights. In areas such as customer service and product personalization, it can make the difference between cross-selling the right products to meet customer needs and losing the business. For commercial and group carriers, the new data provides the ability to better analyze risk needs, price accordingly and enable superior service in a highly competitive market.
JL: Traditional data will always be around. I doubt that I will outlive a mainframe installation at an insurer; which makes me a little sad. And for many rote tasks like financial reporting, a sales report, or a commission statement, those are sufficient. However, the business of insurance is changing in leaps and bounds. Innovators in data science are interested in correlating those traditional sources to other creative data to find new products, or areas to reduce risk. There is just a lot of data that is either ignored or locked in obscure systems that needs to be brought into the light. This data could be structured or unstructured, it doesn’t matter, and Big Data can assist there.
- How does this fit into an overall data management function?
JL: At the end of the day, a Hadoop cluster is another source of data for an insurer. More flexible, more cost effective and higher speed; but yet another data source for an insurer. So that’s one more on top of relational, cubes, content repositories, mainframes and whatever else insurers have latched onto over the years. So if it wasn’t completely obvious before, it should be now. Data needs to be managed. As data moves around the organization for consumption, it is shaped, cleaned, copied and we hope there is governance in place. And the Big Data installation is not exempt from any of these routines. In fact, one could argue that it is more critical to leverage good data management practices with Big Data not only to optimize the environment but also to eventually replace traditional data structures that just aren’t working.
CM: Insurance companies are blending new and old data and looking for the best ways to leverage “all data”. We are witnessing the development of a new generation of advanced analytical applications to take advantage of the volume, velocity, and variety in big data. We can also enhance current predictive models, enriching them with the unstructured information in claim and underwriting notes or diaries along with other external data.
There will be challenges. Insurance companies will still need to make important decisions on how to incorporate the new data into existing data governance and data management processes. The Chief Data or Chief Analytics officer will need to drive this business change in close partnership with IT.
- Tell me a little bit about how Informatica and Hortonworks are working together on this?
JL: For years Informatica has been helping our clients to realize the value in their data and analytics. And while enjoying great success in partnership with our clients, unlocking the full value of data requires new structures, new storage and something that doesn’t break the bank for our clients. So Informatica and Hortonworks are on a continuing journey to show that value in analytics comes with strong relationships between the Hadoop distribution and innovative market leading data management technology. As the relationship between Informatica and Hortonworks deepens, expect to see even more vertically relevant solutions and documented ROI for the Informatica/Hortonworks solution stack.
CM: Informatica and Hortonworks optimize the entire big data supply chain on Hadoop, turning data into actionable information to drive business value. By incorporating data management services into the data lake, companies can store and process massive amounts of data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field.
Matching data from internal sources (e.g. very granular data about customers) with external data (e.g. weather data or driving patterns in specific geographic areas) can unlock new revenue streams.
See this video for a discussion on unlocking those new revenue streams. Sanjay Krishnamurthi, Informatica CTO, and Shaun Connolly, Hortonworks VP of Corporate Strategy, share their perspectives.
- Do you have any additional comments on the future of data in this brave new world?
CM: My perspective is that, over time, we will drop the reference to “big” or ”small” data and get back to referring simply to “Data”. The term big data has been useful to describe the growing awareness on how the new data types can help insurance companies grow.
We can no longer use “traditional” methods to gain insights from data. Insurers need a modern data architecture to store, process and analyze data—transforming it into insight.
We will see an increase in new market entrants in the insurance industry, and existing insurance companies will improve their products and services based upon the insights they have gained from their data, regardless of whether that was “big” or “small” data.
JL: I’m sure that even now there is someone locked in their mother’s basement playing video games and trying to come up with the next data storage wave. So we have that to look forward to, and I’m sure it will be cool. But, if we are honest with ourselves, we’ll admit that we really don’t know what to do with half the data that we have. So while data storage structures are critical, the future holds even greater promise for new models, better analytical tools and applications that can make sense of all of this and point insurers in new directions. The trend that won’t change anytime soon is the ongoing need for good quality data, data ready at a moment’s notice, safe and secure and governed in a way that insurers can trust what those cool analytics show them.
Please join us for an interactive discussion on March 25th at 10am Pacific Time/ 1pm Eastern Time.
Register for the Webinar on March 25th at 10am Pacific/ 1pm Eastern
Despite spending more than $30 Billion in annual spending on Big Data, successful big data implementations elude most organizations. That’s the sobering assessment of a recent study of 226 senior executives from Capgemini, which found that only 13 percent feel they have truly have made any headway with their big data efforts.
The reasons for Big Data’s lackluster performance include the following:
- Data is in silos or legacy systems, scattered across the enterprise
- No convincing business case
- Ineffective alignment of Big Data and analytics teams across the organization
- Most data locked up in petrified, difficult to access legacy systems
- Lack of Big Data and analytics skills
Actually, there is nothing new about any of these issues – in fact, the perceived issues with Big Data initiatives so far map closely with the failed expect many other technology-driven initiatives. First, there’s the hype that tends to get way ahead of any actual well-functioning case studies. Second, there’s the notion that managers can simply take a solution of impressive magnitude and drop it on top of their organizations, expecting overnight delivery of profits and enhanced competitiveness.
Technology, and Big Data itself, is but a tool that supports the vision, well-designed plans and hard work of forward-looking organizations. Those managers seeking transformative effects need to look deep inside their organizations, at how deeply innovation is allowed to flourish, and in turn, how their employees are allowed to flourish. Think about it: if line employees suddenly have access to alternative ways of doing things, would they be allowed to run with it? If someone discovers through Big Data that customers are using a product differently than intended, do they have the latitude to promote that new use? Or do they have to go through chains of approval?
Big Data may be what everybody is after, but Big Culture is the ultimate key to success.
For its part, Capgemini provides some high-level recommendations for better baking in transformative values as part of Big Data initiatives, based on their observations of best-in-class enterprises:
The vision thing: “It all starts with vision,” says Capgemini’s Ron Tolido. “If the company executive leadership does not actively, demonstrably embrace the power of technology and data as the driver of change and future performance, nothing digitally convincing will happen. We have not even found one single exception to this rule. The CIO may live and breathe Big Data and there may even be a separate Chief Data Officer appointed – expect more of these soon – if they fail to commit their board of executives to data as the engine of success, there will be a dark void beyond the proof of concept.”
Establish a well-defined organizational structure: “Big Data initiatives are rarely, if ever, division-centric,” the Capgemini report states. “They often cut across various departments in an organization. Organizations that have clear organizational structures for managing rollout can minimize the problems of having to engage multiple stakeholders.”
Adopt a systematic implementation approach: Surprisingly, even the largest and most sophisticated organizations that do everything on process don’t necessarily approach Big Data this way, the report states. “Intuitively, it would seem that a systematic and structured approach should be the way to go in large-scale implementations. However, our survey shows that this philosophy and approach are rare. Seventy-four percent of organizations did not have well-defined criteria to identify, qualify and select Big Data use-cases. Sixty-seven percent of companies did not have clearly defined KPIs to assess initiatives. The lack of a systematic approach affects success rates.”
Adopt a “venture capitalist” approach to securing buy-in and funding: “The returns from investments in emerging digital technologies such as Big Data are often highly speculative, given the lack of historical benchmarks,” the Capgemini report points out. “Consequently, in many organizations, Big Data initiatives get stuck due to the lack of a clear and attributable business case.” To address this challenge, the report urges that Big Data leaders manage investments “by using a similar approach to venture capitalists. This involves making multiple small investments in a variety of proofs of concept, allowing rapid iteration, and then identifying PoCs that have potential and discarding those that do not.”
Leverage multiple channels to secure skills and capabilities: “The Big Data talent gap is something that organizations are increasingly coming face-to-face with. Closing this gap is a larger societal challenge. However, smart organizations realize that they need to adopt a multi-pronged strategy. They not only invest more on hiring and training, but also explore unconventional channels to source talent. Capgemini advises reaching out to partner organizations for the skills needed to develop Big Data initiatives. These can be employee exchanges, or “setting up innovation labs in high-tech hubs such as Silicon Valley.” Startups may also be another source of Big Data talent.
Over and over, when talking with people who are starting to learn Data Science, there’s a frustration that comes up: “I don’t know which programming language to start with.”
Moreover, it’s not just programming languages; it’s also software systems like Tableau, SPSS, etc. There is an ever-widening range of tools and programming languages and it’s difficult to know which one to select.
I get it. When I started focusing heavily on data science a few years ago, I reviewed all of the popular programming languages at the time: Python, R, SAS, D3, not to mention a few that in hindsight, really aren’t that great for analytics like Perl, Bash, and Java. I once read a suggestion to use arcane tools like UNIX’s AWK and SED.
There are so many suggestions, so much material, so many options; it becomes difficult to know what to learn first. There’s a mountain of content, and it’s difficult to know where to find the “gold nuggets”; the things to learn that will bring you the high return on time investment.
That’s the crux of the problem. The fact is – time is limited. Learning a new programming language is a large investment in your time, so you need to be strategic about which one you select. To be clear, some languages will yield a very high return on your investment. Other languages are purely auxiliary tools that you might use only a few times per year.
Let me make this easy for you: learn R first. Here’s why:
R is becoming the “lingua franca” of data science
R is becoming the lingua franca for data science. That’s not to say that it’s the only language, or that it’s the best tool for every job. It is, however, the most widely used and it is rising in popularity.
As I’ve noted before, O’Reilly Media conducted a survey in 2014 to understand the tools that data scientists are currently using. They found that R is the most popular programming language (if you exclude SQL as a “proper” programing language).
Looking more broadly, there are other rankings that look at programming language popularity in general. For example, Redmonk measures programming language popularity by examining discussion (on Stack Overflow) and usage (on GitHub). In their latest rankings, R placed 13th, the highest of any statistical programming language. Redmonk also noted that R has been rising in popularity over time.
A similar ranking by TIOBE, which ranks programming languages by the number of search engine searches, indicates a strong year over year rise for R.
Keep in mind that the Redmonk and TIOBE rankings are for all programming languages. When you look at these, R is now ranking among the most popular and most commonly used over all.
It’s often said that 80% of the work in data science is data manipulation. More often than not, you’ll need to spend significant amounts of your time “wrangling” your data; putting it into the shape you want. R has some of the best data management tools you’ll find.
The dplyr package in R makes data manipulation easy. It is the tool I wish I had years ago. When you “chain” the basic dplyr together, you can dramatically simplify your data manipulation workflow.
ggplot2 is one of the best data visualization tools around, as of 2015. What’s great about ggplot2 is that as you learn the syntax, you also learn how to think about data visualization.
I’ve said numerous times, that there is a deep structure to all statistical visualizations. There is a highly structured framework for thinking about and creating all data visualizations. ggplot2 is based on that framework. By learning ggplot2, you will learn how to think about visualizing data.
Finally, there’s machine learning. While I think most beginning data science students should wait to learn machine learning (it is much more important to learn data exploration first), machine learning is an important skill. When data exploration stops yielding insight, you need stronger tools.
When you’re ready to start using (and learning) machine learning, R has some of the best tools and resources.
One of the best, most referenced introductory texts on machine learning, An Introduction to Statistical Learning, teaches machine learning using the R programming language. Additionally, the Stanford Statistical Learning course uses this textbook, and teaches machine learning in R.
Summary: Learn R, and focus your efforts
Once you start to learn R, don’t get “shiny new object” syndrome.
You’re likely to see demonstrations of new techniques and tools. Just look at some of the dazzling data visualizations that people are creating.
Seeing other people create great work (and finding out that they’re using a different tool) might lead you to try something else. Trust me on this: you need to focus. Don’t get “shiny new object” syndrome. You need to be able to devote a few months (or longer) to really diving into one tool.
And as I noted above, you really want to build up your competence in skills across the data science workflow. You need to have solid skills at least in data visualization and data manipulation. You need to be able to do some serious data exploration in R before you start moving on.
Spending 100 hours on R will yield vastly better returns than spending 10 hours on 10 different tools. In the end, your time ROI will be higher by concentrating your efforts. Don’t get distracted by the “latest, sexy new thing.”
Let’s face it, building a Data Governance program is no overnight task. As one CDO puts it: ”data governance is a marathon, not a sprint”. Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative. Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.
Why bother then? Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company? Well, the drivers for data governance can vary for different organizations. Let’s take a close look at some of the motivations behind data governance program.
For companies in heavily regulated industries, establishing a formal data governance program is a mandate. When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors. Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.
A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business, you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.
Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.
Now that we have identified the drivers for data governance, how do we start? This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results. And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.
A rule of thumb? Start small, take one-step at a time and focus on producing something tangible.
Sounds easy, right? Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement a successful data governance practice that brings business impact to an enterprise organization.
If you are currently kicking the tires on setting up data governance practice in your organization, I’d like to invite you to visit a member-only website dedicated to Data Governance: http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics. I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark. More than 200 members have taken the assessment to gain better understanding of their current data governance program, so why not give it a shot?
Data Governance is a journey, likely a never-ending one. We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.
2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.
In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies. Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services. These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.
These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work. Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.
2014 was not just a year of market momentum for Informatica, but also one of new product development innovations. We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.
Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014. Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run. Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more. Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.
As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.
Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.
Strata 2015 – Making Data Work for Everyone with Cloud Integration, Cloud Data Management and Cloud Machine Learning
Are you ready to answer “Yes” to the questions:
a) “Are you Cloud Ready?”
b) “Are you Machine Learning Ready?”
I meet with hundreds of Informatica Cloud customers and prospects every year. While they are investing in Cloud, and seeing the benefits, they also know that there is more innovation out there. They’re asking me, what’s next for Cloud? And specifically, what’s next for Informatica in regards to Cloud Data Integration and Cloud Data Management? I’ll share more about my response throughout this blog post.
The spotlight will be on Big Data and Cloud at the Strata + Hadoop World conference taking place in Silicon Valley from February 17-20 with the theme “Make Data Work”. I want to focus this blog post on two topics related to making data work and business insights:
- How existing cloud technologies, innovations and partnerships can help you get ready for the new era in cloud analytics.
- How you can make data work in new and advanced ways for every user in your company.
Today, Informatica is announcing the availability of its Cloud Integration Secure Agent on Microsoft Azure and Linux Virtual Machines as well as an Informatica Cloud Connector for Microsoft Azure Storage. Users of Azure data services such as Azure HDInsight, Azure Machine Learning and Azure Data Factory can make their data work with access to the broadest set of data sources including on-premises applications, databases, cloud applications and social data. Read more from Microsoft about their news at Strata, including their relationship with Informatica, here.
“Informatica, a leader in data integration, provides a key solution with its Cloud Integration Secure Agent on Azure,” said Joseph Sirosh, Corporate Vice President, Machine Learning, Microsoft. “Today’s companies are looking to gain a competitive advantage by deriving key business insights from their largest and most complex data sets. With this collaboration, Microsoft Azure and Informatica Cloud provide a comprehensive portfolio of data services that deliver a broad set of advanced cloud analytics use cases for businesses in every industry.”
Even more exciting is how quickly any user can deploy a broad spectrum of data services for cloud analytics projects. The fully-managed cloud service for building predictive analytics solutions from Azure and the wizard-based, self-service cloud integration and data management user experience of Informatica Cloud helps overcome the challenges most users have in making their data work effectively and efficiently for analytics use cases.
The new solution enables companies to bring in data from multiple sources for use in Azure data services including Azure HDInsight, Azure Machine Learning, Azure Data Factory and others – for advanced analytics.
The broad availability of Azure data services, and Azure Machine Learning in particular, is a game changer for startups and large enterprises. Startups can now access cloud-based advanced analytics with minimal cost and complexity and large businesses can use scalable cloud analytics and machine learning models to generate faster and more accurate insights from their Big Data sources.
Success in using machine learning requires not only great analytics models, but also an end-to-end cloud integration and data management capability that brings in a wide breadth of data sources, ensures that data quality and data views match the requirements for machine learning modeling, and an ease of use that facilitates speed of iteration while providing high-performance and scalable data processing.
For example, the Informatica Cloud solution on Azure is designed to deliver on these critical requirements in a complementary approach and support advanced analytics and machine learning use cases that provide customers with key business insights from their largest and most complex data sets.
Using the Informatica Cloud solution on Azure connector with Informatica Cloud Data Integration enables optimized read-write capabilities for data to blobs in Azure Storage. Customers can use Azure Storage objects as sources, lookups, and targets in data synchronization tasks and advanced mapping configuration tasks for efficient data management using Informatica’s industry leading cloud integration solution.
As Informatica fulfills the promise of “making great data ready to use” to our 5,500 customers globally, we continue to form strategic partnerships and develop next-generation solutions to stay one step ahead of the market with our Cloud offerings.
My goal in 2015 is to help each of our customers say that they are Cloud Ready! And collaborating with solutions such as Azure ensures that our joint customers are also Machine Learning Ready!
To learn more, try our free Informatica Cloud trial for Microsoft Azure data services.
At long last, the anxiously awaited rules from the FAA have brought some clarity to the world of commercial drone use. Up until now, commercial drone use has been prohibited. The new rules, of course, won’t sit well with Amazon who would like to drop merchandise on your porch at all hours. But the rules do work really well for insurers who would like to use drones to service their policyholders. So now drones, and soon to be fleets of unmanned cars will be driving the roadways in any numbers of capacities. It seems to me to be an ambulance chaser’s dream come true. I mean who wouldn’t want some seven or eight figure payday from Google for getting rear-ended?
What about “Great Data”? What does that mean in the context of unmanned vehicles, both aerial and terrestrial? Let’s talk about two aspects. First, the business benefits of great data using unmanned drones.
An insurance adjuster or catastrophe responder can leverage an aerial drone to survey large areas from a central location. They will pin point the locations needing attention for further investigation. This is a common scenario that many insurers talk about when the topic of aerial drone use comes up. Second to that is the ability to survey damage in hard to reach locations like roofs or difficult terrain (like farmland). But this is where great data comes into play. Surveying, service and use of unmanned vehicles demands that your data can answer some of the following questions for your staff operating in this new world:
Where am I?
Quality data and geocoded locations as part of that data is critical. In order to locate key risk locations, your data must be able to coordinate with the lat/long of the location recorded by your unmanned vehicles and the location of your operator. Ensure clean data through robust data quality practices.
Where are my policyholders?
Knowing the location of your policyholders not only relies on good data quality, but on knowing who they are and what risks you are there to help service. This requires a total customer relationship solution where you have a full view of not only locations, but risks, coverages and entities making up each policyholder.
What am I looking at?
Archived, current and work in process imaging is a key place where a Big Data environment can assist over time. By comparing saved images with new and processing claims, claims fraud and additional opportunities for service can be detected quickly by the drone operator.
Now that we’ve answered the business value questions and leveraged this new technology to better service policyholders and speed claims, let’s turn to how great data can be used to protect the insurer and drone operator from liability claims. This is important. The FAA has stopped short of requiring commercial drone operators to carry special liability insurance, leaving that instead to the drone operators to orchestrate with their insurer. And now we’re back to great data. As everyone knows, accidents happen. Technology, especially robotic mobile technology is not infallible. Something will crash somewhere, hopefully not causing injury or death, but sadly that too will likely happen. And there is nothing that will keep the ambulance chasers at bay more than robust great data. Any insurer offering liability cover for a drone operator should require that some of the following questions be answered by the commercial enterprise. And the interesting fact is that this information should be readily available if the business questions above have been answered.
- Where was my drone?
- What was it doing?
- Was it functioning properly?
Properly using the same data management technology as in the previous questions will provide valuable data to be used as evidence in the case of liability against a drone operator. Insurers would be wise to ask these questions of their liability policyholders who are using unmanned technology as a way to gauge liability exposure in this brave new world. The key to the assessment of risk being robust data management and great data feeding the insurer’s unmanned policyholder service workers.
Time will tell all the great and imaginative things that will take place with this new technology. One thing is for certain. Great data management is required in all aspects from amazing customer service to risk mitigation in operations. Happy flying to everyone!!
I’ve spent most of my career working with new technology, most recently helping companies make sense of mountains of incoming data. This means, as I like to tell people, that I have the sexiest job in the 21st century.
Harvard Business Review put the data scientist into the national spotlight in their publication Data Scientist: The Sexiest Job of the 21st Century. Job trends data from Indeed.com confirms the rise in popularity for the position, showing that the number of job postings for data scientist positions increased by 15,000%.
In the meantime, the role of data scientist has changed dramatically. Data used to reside on the fringes of the operation. It was usually important but seldom vital – a dreary task reserved for the geekiest of the geeks. It supported every function but never seemed to lead them. Even the executives who respected it never quite absorbed it.
For every Big Data problem, the solution often rests on the shoulders of a data scientist. The role of the data scientist is similar in responsibility to the Wall Street “quants” of the 80s and 90s – now, these data experienced are tasked with the management of databases previously thought too hard to handle, and too unstructured to derive any value.
So, is it the sexiest job of the 21st Century?
Think of a data scientist more like the business analyst-plus, part mathematician, part business strategist, these statistical savants are able to apply their background in mathematics to help companies tame their data dragons. But these individuals aren’t just math geeks, per se.
A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a renaissance individual who really wants to learn and bring change to an organization.
If this sounds like you, the good news is demand for data scientists is far outstripping supply. Nonetheless, with the rising popularity of the data scientist – not to mention the companies that are hiring for these positions – you have to be at the top of your field to get the jobs.
Companies look to build teams around data scientists that ask the most questions about:
- How the business works
- How it collects its data
- How it intends to use this data
- What it hopes to achieve from these analyses
These questions were important because data scientists will often unearth information that can “reshape an entire company.” Obtaining a better understanding of the business’ underpinnings not only directs the data scientist’s research, but helps them present the findings and communicate with the less-analytical executives within the organization.
While it’s important to understand your own business, learning about the successes of other corporations will help a data scientist in their current job–and the next.
As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.
Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.
As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.
Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.
We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?
Talking to architects about analytics at a recent event, I kept hearing the familiar theme; data scientists are spending 80% of their time on “data wrangling” leaving only 20% for delivering the business insights that will drive the company’s innovation. It was clear to everybody that I spoke to that the situation will only worsen. The coming growth everybody sees in data volume and complexity, will only lengthen the time to value.
Gartner recently predicted that:
“by 2015, 50% of organizations will give up on managing growth and will redirect funds to improve classification and analytics.”
Some of the details of this study are interesting. In the end, many organizations are coming to two conclusions:
- It’s risky to delete data, so they keep it around as insurance.
- All data has potential business value, so more organizations are keeping it around for potential analytical purposes.
The other mega-trend here is that more and more organizations are looking to compete on analytics – and they need data to do it, both internal data and external data.
From an architect’s perspective, here are several observations:
- The floodgates are open and analytics is a top priority. Given that, the emphasis should be on architecting to manage the dramatic increases in both data quantity and data complexity rather than on trying to stop it.
- The immediate architectural priority has to be on simplifying and streamlining your current enterprise data architecture. Break down those data silos and standardize your enterprise data management tools and processes as much as possible. As discussed in other blogs, data integration is becoming the biggest bottleneck to business value delivery in your environment. Gartner has projected that “by 2018, more than half the cost of implementing new large systems will be spent on integration.” The more standardized your enterprise data management architecture is, the more efficient it will be.
- With each new data type, new data tool (Hive, Pig, etc.), and new data storage technology (Hadoop, NoSQL, etc.) ask first if your existing enterprise data management tools can handle the task before people go out and create a new “data silo” based on the cool, new technologies. Sometimes it will be necessary, but not always.
- The focus needs to be on speeding value delivery for the business. And the key bottleneck is highly likely to be your enterprise data architecture.
Rather than focusing on managing data growth, the priority should be on managing it in the most standardized and efficient way possible. It is time to think about enterprise data management as a function with standard processes, skills and tools (just like Finance, Marketing or Procurement.)
Several of our leading customers have built or are building a central “Data as a Service” platform within their organizations. This is a single, central place where all developers and analysts can go to get trustworthy data that is managed by IT through a standard architecture and served up for use by all.
For more information, see “The Big Big Data Workbook”
*Gartner Predicts 2015: Managing ‘Data Lakes’ of Unprecedented Enormity, December 2014 http://www.gartner.com/document/2934417#