Category Archives: Big Data
Looking for a data integration expert? Join the club. As cloud computing and big data become more desirable within the Global 2000, an abundance of data integration talent is required to make both cloud and big data work properly.
The fact of the matter is that you can’t deploy a cloud-based system without some sort of data integration as part of the solution. Either from on-premise to cloud, cloud-to-cloud, or even intra-company use of private clouds, these projects need someone who knows what they are doing when it comes to data integration.
While many cloud projects were launched without a clear understanding of the role of data integration, most people understand it now. As companies become more familiar with the could, they learn that data integration is key to the solution. For this reason, it’s important for teams to have at least some data integration talent.
The same goes for big data projects. Massive amounts of data need to be loaded into massive databases. You can’t do these projects using ad-hoc technologies anymore. The team needs someone with integration knowledge, including what technologies to bring to the project.
Generally speaking, big data systems are built around data integration solutions. Similar to cloud, the use of data integration architectural expertise should be a core part of the project. I see big data projects succeed and fail, and the biggest cause of failure is the lack of data integration expertise.
The demand for data integration talent has exploded with the growth of both big data and cloud computing. A week does not go by that I’m not asked for the names of people who have data integration, cloud computing and big data systems skills. I know several people who fit that bill, however they all have jobs and recently got raises.
The scary thing is, if these jobs go unfilled by qualified personnel, project directors may hire individuals without the proper skills and experience. Or worse, they may not hire anyone at all. If they plod along without the expertise required, in a year they’ll wonder why the systems are not sharing data the way they should, resulting in a big failure.
So, what can organizations do? You can find or build the talent you need before starting important projects. Thus, now is the time to begin the planning process, including how to find and hire the right resources. This might even mean internal training, hiring mentors or outside consultants, or working with data integration technology providers. Do everything necessary to make sure you get data integration done right the first time.
For the past few years, the press has been buzzing about the potential value of Big Data. However, there is little coverage focusing on the data itself – how do you get it, is it accurate, and who can be trusted with it?
We are the source of data that is often spoken about – our children, friends and relatives and especially those people we know on Facebook or LinkedIn. Over 40% of Big Data projects are in the sales and marketing arena – relying on personal data as a driving force. While machines have no choice but to provide data when requested, people do have a choice. We can choose not to provide data, or to purposely obscure our data, or to make it up entirely.
So, how can you ensure that your organization is receiving real information? Active participation is needed to ensure a constant flow of accurate data to feed your data-hungry algorithms and processes. While click-stream analysis does not require individual identification, follow-up sales & marketing campaigns will have limited value if the public at large is using false names and pretend information.
BCG has identified a link between trust and data sharing:
“We estimate that those that manage this issue well [creating trust] should be able to increase the amount of consumer data they can access by at least five to ten times in most countries.”[i]
With that in mind, how do you create the trust that will entice people to share data? The principles behind common data privacy laws provide guidelines. These include: accountability, purpose identification and disclosure, collection with knowledge and consent, data accuracy, individual access and correction, as well as the right to be forgotten.
But there are challenges in personal data stewardship – in part because the current world of Big Data analysis is far from stable. In the ongoing search for the value of Big Data, new technologies, tools and approaches are being piloted. Experimentation is still required which means moving data around between data storage technologies and analytical tools, and giving unprecedented access to data in terms of quantity, detail and variety to ever growing teams of analysts. This experimentation should not be discouraged, but it must not degrade the accuracy or security of your customers’ personal data.
How do you measure up? If I made contact and asked for the sum total of what you knew about me, and how my data was being used – how long would it take to provide this information? Would I be able to correct my information? How many of your analysts can view my personal data and how many copies have you distributed in your IT landscape? Are these copies even accurate?
Through our data quality, data mastering and data masking tools, Informatica can deliver a coordinated approach to managing your customer’s personal data and build trust by ensuring the safety and accuracy of that data. With Informatica managing your customer’s data, your internal team can focus their attention on analytics. Analytics from accurate data can help develop the customer loyalty and engagement that is vital to both the future security of your business and continued collection of accurate data to feed your Big Data analysis.
[i] The Trust Advantage: How to Win with Big Data; bcg.perspectives November 2013
In the other, they hear administrative talk of smaller budgets and scarcer resources.
As stringent requirements for both transparency and accountability grow, this paradox of pressure increases.
Sometimes, the best way to cope is to TALK to somebody.
What if you could ask other data technologists candid questions like:
- Do you think government regulation helps or hurts the sharing of data?
- Do you think government regulators balance the privacy needs of the public with commercial needs?
- What are the implications of big data government regulation, especially for users?
- How can businesses expedite the government adoption of the cloud?
- How can businesses aid in the government overcoming the security risks associated with the cloud?
- How should the policy frameworks for handling big data differ between the government and the private sector?
What if you could tell someone who understood? What if they had sweet suggestions, terrific tips, stellar strategies for success? We think you can. We think they will.
That’s why Twitter needs a #DataChat.
What on earth is a #DataChat?
Good question. It’s a Twitter Chat – A public dialog, at a set time, on a set topic. It’s something like a crowd-sourced discussion. Any Twitter user can participate simply by including the applicable hashtag in each tweet. Our hashtag is #DataChat. We’ll connect on Twitter, on the third Thursday of each month to share struggles, victories and advice about data governance. We’re going to begin this week, Thursday April 17, at 3:00 PM Eastern Time. For our first chat, we are going to discuss topics that relate to data technologies in government organizations.
What don’t you join us? Tell us about it. Mark your calendar. Bring a friend.
Because, sometimes, you just need someone to talk to.
A few years back, there was a movement in some businesses to establish “data stewards” – individuals who would sit at the hearts of the enterprise and make it their job to assure that data being consumed by the organization is of the highest possible quality, is secure, is contextually relevant, and capable of interoperating across any applications that need to consume it. While the data steward concept came along when everything was relational and structured, these individuals are now earning their pay when it comes to managing the big data boom.
The rise of big data is creating more than simple headaches for data stewards, it is creating turf wars across enterprises. As pointed out in a recent article in The Wall Street Journal, there isn’t yet a lot of clarity as to who owns and cares for such data. Is it IT? Is it lines of business? Is it legal? There are arguments that can be made for all jurisdictions.
In organizations these days, for example, marketing executives are generating, storing and analyzing large volumes of their own data within content management systems and social media analysis solutions. Many marketing departments even have their own IT budgets. Along with marketing, of course, everyone else within enterprises is seeking to pursue data analytics to better run their operations as well as foresee trends.
Typically, data has been under the domain of the CIO, the person who oversaw the collection, management and storage of information. In the Wall Street Journal article, however, it’s suggested that legal departments may be the best caretakers of big data, since big data poses a “liability exposure,” and legal departments are “better positioned to understand how to use big data without violating vendor contracts and joint-venture agreements, as well as keeping trade secrets.”
However, legal being legal, it’s likely that insightful data may end up getting locked away, never to see the light of day. Others may argue IT department needs to retain control, but there again, IT isn’t trained to recognize information that may set the business on a new course.
Focusing on big data ownership isn’t just an academic exercise. The future of the business may depend on the ability to get on top of big data. Gartner, for one, predicts that within the next three years, at least of a third of Fortune 100 organizations will experience an information crisis, “due to their inability to effectively value, govern and trust their enterprise information.”
This ability to “value, govern and trust” goes way beyond the traditional maintenance of data assets that IT has specialized in over the past few decades. As Gartner’s Andrew White put it: “Business leaders need to manage information, rather than just maintain it. When we say ‘manage,’ we mean ‘manage information for business advantage,’ as opposed to just maintaining data and its physical or virtual storage needs. In a digital economy, information is becoming the competitive asset to drive business advantage, and it is the critical connection that links the value chain of organizations.”
For starters, then, it is important that the business have full say over what data needs to be brought in, what data is important for further analysis, and what should be done with data once it gains in maturity. IT, however, needs to take a leadership role in assuring the data meets the organization’s quality standards, and that it is well-vetted so that business decision-makers can be confident in the data they are using.
The bottom line is that big data is a team effort, involving the whole enterprise. IT has a role to play, as does legal, as do the line of business.
This year, over one dozen healthcare leaders will share their knowledge on data driven insights at Informatica World 2014. These will be included in six tracks and over 100 breakout sessions during the conference. We are only five weeks away and I am excited that the healthcare path has grown 220% from 2013!
Join us for these healthcare sessions:
- Moving From Vision to Reality at UPMC : Structuring a Data Integration and Analytics Program: University of Pittsburgh Medical Center (UPMC) partnered with Informatica IPS to establish enterprise analytics as a core organizational competency through an Integration Competency Center engagement. Join IPS and UPMC to learn more.
- HIPAA Validation for Eligibility and Claims Status in Real Time: Healthcare reform requires healthcare payers to exchange and process HIPAA messages in less time with greater accuracy. Learn how HealthNet tackled this challenge.
- Application Retirement for Healthcare ROI : Dallas Children’s Hospital needed to retire outdated operating systems, hardware, and applications while retaining access to their legacy data for compliance purposes. Learn why application retirement is critical to the healthcare industry, how Dallas Children’s selected which applications to retire and the healthcare specific functionality that Informatica is delivering.
- UPMC’s story of implementing a Multi-Domain MDM healthcare solution in support of Data Governance : This presentation will unfold the UPMC story of implementing a Multi-Domain MDM healthcare solution as part of an overall enterprise analytics / data warehousing effort. MDM is a vital part of the overall architecture needed to support UPMC’s efforts to improve the quality of patient care and help create methods for personalized medicine. Today, the leading MDM solution developer will discuss how the team put together the roadmap, worked with domain specific workgroups, created the trust matrix and share his lessons learned. He will also share what they have planned for their consolidated and trusted Patient, Provider and Facility master data in this changing healthcare industry. This will also explain how the MDM program fits into the ICC (Integration Competency Center) currently implemented at UPMC.
- Enterprise Codeset Repositories for Healthcare: Controlling the Chaos: Learn the benefit of a centralized storage point to govern and manage codes (ICD-9/10, CPT, HCPCS, DRG, SNOMED, Revenue, TOS, POS, Service Category, etc.), mappings and artifacts that reference codes.
- Christus Health Roadmap to Data Driven Healthcare : To organize information and effectively deliver services in a hypercompetitive market, healthcare organizations must deliver data in an accurate, timely, efficient way while ensuring its clarity. Learn how CHRISTUS Health is developing and pursuing its vision for data management, including lessons adopted from other industries and the business case used to fund data management as a strategic initiative.
- Business Value of Data Quality : This customer panel will address why data quality is a business imperative which significantly affects business success.
- MD Anderson – Foster Business and IT Collaboration to Reveal Data Insights with Informatica: Is your integration team intimidated by the new Informatica 9.6 tools? Do your analysts and business users require faster access to data and answers about where data comes from. If so, this session is a must attend.
- The Many Faces of the Healthcare Customer : In the healthcare industry, the customer paying for services (individuals, insurers, employers, the government) is not necessarily the decision-influencer (physicians) or even the patient — and the provider comes in just as many varieties. Learn how, Quest, the world’s leading provider of diagnostic information leverages master data management to resolve the chaos of serving 130M+ patients, 1200+ payers, and almost half of all US physicians and hospitals.
- Lessons in Healthcare Enterprise Information Management from St. Joseph Health and Sutter Health St. Joseph : Health created a business case for enterprise information management, then built a future-proofed strategy and architecture to unlock, share, and use data. Sutter Health engaged the business, established a governance structure, and freed data from silos for better organizational performance and efficiency. Come hear these leading health systems share their best practices and lessons learned in making data-driven care a reality.
- Navinet, Inc and Informatica – Delivering Network Intelligence, The Value to the Payer, Provider and Patient: Today, healthcare payers and providers must share information in unprecedented ways to reduce redundancy, cut costs, coordinate care, and drive positive outcomes. Learn how NaviNet’s vision of a “smart” communications network combines Big Data and network intelligence to share proactive real-time information between insurers and providers.
- Providence Health Services takes a progressive approach to automating ETL development and documentation: A newly organized team of BI Generalists, most of whom have no ETL experience and even fewer with Informatica skills, were tasked with Informatica development when Providence migrated from Microsoft SSIS to Informatica. Learn how the team relied on Informatica to alleviate the burden of low value tasks.
- Using IDE for Data On-boarding Framework at HMS : HMS’s core business is to onboard large amounts of external data that arrive in different formats. HMS developed a framework using IDE to standardize the on-boarding process. This tool can be used by non-IT analysts and provides standard profiling reports and reusable mapping “templates” which has improved the hand-off to IT and significantly reduced misinterpretations and errors.
Additionally, this year’s attendees are invited to:
- Over 100 breakout sessions: Customers from other industries, including financial services, insurance, retail, manufacturing, oil and gas will share their data driven stories.
- Healthcare networking reception on Wednesday, May 14th: Join your healthcare peers and Informatica’s healthcare team on Wednesday from 6-7:30pm in the Vesper bar of the Cosmopolitan Resort for a private Healthcare networking reception. Come and hear firsthand how others are achieving a competitive advantage by maximizing return on data while enjoying hors d’oeuvres and cocktails.
- Data Driven Healthcare Roundtable Breakfast on Wednesday, May 14th. Customer led roundtable discussion.
- Personal meetings: Since most of the Informatica team will be in attendance, this is a great opportunity to meet face to face with Informatica’s product, services and solution teams.
- Informatica Pavilion and Partner Expo: Interact with the latest Informatica and our partners provide.
- An expanded “Hands-on-Lab”: Learn from real-life case studies and talk to experts about your unique environment.
The Healthcare industry is facing extraordinary changes and uncertainty — both from a business and a technology perspective. Join us to learn about key drivers for change and innovative uses of data technology solutions to discover sources for operational and process improvement. There is still time to Register now!
Which comes first: innovation or analytics?
Bain & Company released some survey findings a few months back that actually put a value on big data. Companies with advanced analytic capabilities, the consultancy finds, are twice as likely to be in the top quartile of financial performance within their industries; five times as likely to make decisions much faster than market peers; three times as likely to execute decisions as intended; and twice as likely to use data very frequently when making decisions.
This is all good stuff, and the survey, which covered the input of 400 executives, makes a direct correlation between big data analytics efforts and the business’s bottom line. However, it begs a question: How does an organization become one of these analytic leaders? And there’s a more brain-twisting question to this as well: would the type of organization supporting an advanced analytics culture be more likely to be ahead of its competitors because its management tends to be more forward-thinking on a lot of fronts, and not just big data?
You just can’t throw a big data or analytics program or solution set on top of the organization (or drop in a data scientist) and expect to be dazzled with sudden clarity and insight. If an organization is dysfunctional, with a lot of silos, fiefdoms, or calcified and uninspired management, all the big data in the world isn’t going to lift its intelligence quota.
The author of the Bain and Company study, Travis Pearson and Rasmus Wegener, point out that “big data isn’t just one more technology initiative” – “in fact, it isn’t a technology initiative at all; it’s a business program that requires technical savvy.”
Succeeding with big data analytics requires a change in the organization’s culture, and the way it approaches problems and opportunities. The enterprise needs to be open to innovation and change. And, as Pearson and Wegener point out, “you need to embed big data deeply into your organization. It’s the only way to ensure that information and insights are shared across business units and functions. This also guarantees the entire company recognizes the synergies and scale benefits that a well-conceived analytics capability can provide.”
Pearson and Wegener also point to the following common characteristics of big data leaders they have studied:
Pick the “right angle of entry”: There are many areas of the business that can benefit from big data analytics, but just a few key areas that will really impact the business. It’s important to focus big data efforts on the right things. Pearson and Wegener say there are four areas where analytics can be relevant: “improving existing products and services, improving internal processes, building new product or service offerings, and transforming business models.”
Communicate big data ambition: Make it clear that big data analytics is a strategy that has the full commitment of management, and it’s a key part of the organization’s strategy. Messages that need to be communicated: “We will embrace big data as a new way of doing business. We will incorporate advanced analytics and insights as key elements of all critical decisions.” And, the co-authors add, “the senior team must also answer the question: To what end? How is big data going to improve our performance as a business? What will the company focus on?”
Sell and evangelize: Selling big data is a long-term process, not just one or two announcements at staff meetings. “Organizations don’t change easily and the value of analytics may not be apparent to everyone, so senior leaders may have to make the case for big data in one venue after another,” the authors caution. Big data leaders, they observe, have learned to take advantage of the tools at their disposal: they “define clear owners and sponsors for analytics initiatives. They provide incentives for analytics-driven behavior, thereby ensuring that data is incorporated into processes for making key decisions. They create targets for operational or financial improvements. They work hard to trace the causal impact of big data on the achievement of these targets.”
Find an organizational “home” for big data analysis: A common trend seen among big data leaders is that they have created an organizational home for their advanced analytics capability, “often a Center of Excellence overseen by a chief analytics officer,” according to Pearson and Wegener. This is where matters such as strategy, collection and ownership of data across business functions come into play. Organizations also need to plan how to generate insights, and prioritize opportunities and allocation of data analysts’ scientists’ time.
There is a hope and perception that adopting data analytics will open up new paths to innovation. But it often takes a innovative spirit to open up analytics.
“If I had my way, I’d fire the statisticians – all of them – they don’t add value”.
Surely not? Why would you fire the very people who were employed to make sense of the vast volumes of manufacturing data and guide future production? But he was right. The problem was at that time data management was so poor that data was simply not available for the statisticians to analyze.
So, perhaps this title should be re-written to be:
Fire your Data Scientists – They Aren’t Able to Add Value.
Although this statement is a bit extreme, the same situation may still exist. Data scientists frequently share frustrations such as:
- “I’m told our data is 60% accurate, which means I can’t trust any of it.”
- “We achieved our goal of an answer within a week by working 24 hours a day.”
- “Each quarter we manually prepare 300 slides to anticipate all questions the CFO may ask.”
- “Fred manually audits 10% of the invoices. When he is on holiday, we just don’t do the audit.”
This is why I think the original quote is so insightful. Value from data is not automatically delivered by hiring a statistician, analyst or data scientist. Even with the latest data mining technology, one person cannot positively influence a business without the proper data to support them.
Most organizations are unfamiliar with the structure required to deliver value from their data. New storage technologies will be introduced and a variety of analytics tools will be tried and tested. This change is crucial for to success. In order for statisticians to add value to a company, they must have access to high quality data that is easily sourced and integrated. That data must be available through the latest analytics technology. This new ecosystem should provide insights that can play a role in future production. Staff will need to be trained, as this new data will be incorporated into daily decision making.
With a rich 20-year history, Informatica understands data ecosystems. Employees become wasted investments when they do not have access to the trusted data they need in order to deliver their true value.
Who wants to spend their time recreating data sets to find a nugget of value only to discover it can’t be implemented?
Build a analytical ecosystem with a balanced focus on all aspects of data management. This will mean that value delivery is limited only by the imagination of your employees. Rather than questioning the value of an analytics team, you will attract some of the best and the brightest. Then, you will finally be able to deliver on the promised value of your data.
Let’s face it, big data – or data in any size, format or shape – is nothing more than just a bunch of digital bits that occupy space on a disk somewhere. To be useful to the business, end-users need to be able to access it, and pull out and assemble the nuggets of information they need. Data needs to be brought to life.
That’s the theme of a webcast I recently had the opportunity to co-present with Tableau Software, titled “Making Big Data User-Centric.” In fact, there’s a lot more to it than making data user-centric – big data should be a catalyst that fires peoples’ imaginations, enabling them to explore new avenues that were never opened up before.
Many organizations are beginning their journey into the new big data analytics space, and are starting to discover all the possibilities it offers. But, in an era where data is now scaling into the petabyte range, it’s more than technology. It’s a disruptive force, and with disruption comes new opportunities for growth.
Here are nine ways to make this innovative disruption possible:
1. Remember that “data” is not “information.” Too many people think that data itself is a valuable commodity. However, that is like taking oil right out of the ground and trying to sell it at gas stations – it’s not usable. It needs to be processed, refined, and packaged for delivery. It needs to be unified for eventual delivery and presentation. And, finally, to give information its value, it needs to tell a story.
2. Make data sharable across the enterprise. Big data – like all types of data – tend to naturally drift into silos within departments across enterprises. For years, people have struggled to break down these silos and provide a single view of all relevant data. Now there’s a away to do it – through a unified service layer. Think of all the enterprisey things coming to the forefront in recent years – service oriented architecture, data virtualization, search technologies. No matter how you do it, the key is to provide a way for data to be made available across enterprise walls.
3. Use analytics to push the innovation envelope. Big data analytics enables end-users to ask questions and consider options that weren’t possible within standard, relational data environments.
4. Encourage critical thinking among data users. Business users have powerful tools at their disposal, and access to data they’ve never had before. It’s more important than ever to consider where the information came from, its context, and other potential sources that are not in the enterprise’s data stream.
5. Develop analytical skills across the board. Surveys I have conducted in partnership with Unisphere Research finds barely 10% of organizations offer self-service BI on a widespread basis. This needs to change. Everybody is working with information and data, everyone needs to understand the implications of the information and data with which they are working.
6. Promote self-service. Analytic capabilities should be delivered on a self-service basis. End-users are accustomed to information being delivered to them a Google speeds, making the processes they deal with at work – requesting reports from their IT departments, setting up queries – seem downright antiquated, as well as frustrating.
7. Make it visual. Yes, graphical displays of data have been around for more than a couple of decades now. But now, there is an emerging class of front-end visualization tools that convert data points into visual displays – often stunning – that enable users to spot anomalies or trends within seconds.
8. Make it mobile. Just about everyone now carries mobile devices from which they can access data from any place. It’s now possible to offering analytics ranging from key performance indicator marketing, drill-down navigation, data selection, data filtering, and alerts.
9. Make it social. There are two ways to look at big data analytics and social media. First, there’s the social media data itself. BI and analytics efforts would be missing a big piece of the picture if it did not address the wealth of social media data flowing through organizations. This includes sentiment analysis and other applications to monitor interactions on external social media sites, to determine reactions to new products or predict customer needs. But there’s also the collaboration aspect, the ability to share insights and discoveries with peers and partners. Either way, it takes many minds working together to effectively pull information from all that data.
In recent times, the big Internet companies – the Googles, Yahoos and eBays – have proven that it is possible to build a sustainable business on data analytics, in which corporate decisions and actions are being seamlessly guided via an analytics culture, based on data, measurement and quantifiable results. Now, two of the top data analytics thinkers say we are reaching a point that non-tech, non-Internet companies are on their way to becoming analytics-driven organizations in a similar vein, as part of an emerging data economy.
In a report written for the International Institute for Analytics, Thomas Davenport and Jill Dyché divulge the results of their interviews with 20 large organizations, in which they find big data analytics to be well integrated into the decision-making cycle. “Large organizations across industries are joining the data economy,” they observe. “They are not keeping traditional analytics and big data separate, but are combining them to form a new synthesis.”
Davenport and Dyché call this new state of management “Analytics 3.0, ” in which the concept and practices of competing on analytics are no longer confined to data management and IT departments or quants – analytics is embedded into all key organizational processes. That means major, transformative effects for organizations. “There is little doubt that analytics can transform organizations, and the firms that lead the 3.0 charge will seize the most value,” they write.
Analytics 3.0 is the current of three distinct phases in the way data analytics has been applied to business decision making, Davenport and Dyché say. The first two “eras” looked like this:
- Analytics 1.0, prevalent between 1954 and 2009, was based on relatively small and structured data sources from internal corporate sources.
- Analytics 2.0, which arose between 2005 and 2012, saw the rise of the big Web companies – the Googles and Yahoos and eBays – which were leveraging big data stores and employing prescriptive analytics to target customers and shape offerings. This time span was also shaped by a growing interest in competing on analytics, in which data was applied to strategic business decision-making. “However, large companies often confined their analytical efforts to basic information domains like customer or product, that were highly-structured and rarely integrated with other data,” the authors write.
- In the Analytics 3.0 era, analytical efforts are being integrated with other data types, across enterprises.
This emerging environment “combines the best of 1.0 and 2.0—a blend of big data and traditional analytics that yields insights and offerings with speed and impact,” Davenport and Dyché say. The key trait of Analytics 3.0 “is that not only online firms, but virtually any type of firm in any industry, can participate in the data-driven economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as supporting internal decisions with big data.”
Davenport and Dyché describe how one major trucking and transportation company has been able to implement low-cost sensors for its trucks, trailers and intermodal containers, which “monitor location, driving behaviors, fuel levels and whether a trailer/container is loaded or empty. The quality of the optimized decisions [the company] makes with the sensor data – dispatching of trucks and containers, for example – is improving substantially, and the company’s use of prescriptive analytics is changing job roles and relationships.”
New technologies and methods are helping enterprises enter the Analytics 3.0 realm, including “a variety of hardware/software architectures, including clustered parallel servers using Hadoop/MapReduce, in-memory analytics, and in-database processing,” the authors adds. “All of these technologies are considerably faster than previous generations of technology for data management and analysis. Analyses that might have taken hours or days in the past can be done in seconds.”
In addition, another key characteristic of big data analytics-driven enterprises is the ability to fail fast – to deliver, with great frequency, partial outputs to project stakeholders. With the rise of new ‘agile’ analytical methods and machine learning techniques, organizations are capable of delivering “insights at a much faster rate,” and provide for “an ongoing sense of urgency.”
Perhaps most importantly, big data and analytics are integrated and embedded into corporate processes across the board. “Models in Analytics 3.0 are often being embedded into operational and decision processes, dramatically increasing their speed and impact,” Davenport and Dyché state. “Some are embedded into fully automated systems based on scoring algorithms or analytics-based rules. Some are built into consumer-oriented products and features. In any case, embedding the analytics into systems and processes not only means greater speed, but also makes it more difficult for decision-makers to avoid using analytics—usually a good thing.”
The report is available here.
People are obsessed with data. Data captured from our smartphones. Internet data showing how we shop and search — and what marketers do with that data. Big Data, which I loosely define as people throwing every conceivable data point into a giant Hadoop cluster with the hope of figuring out what it all means.
Too bad all that attention stems from fear, uncertainty and doubt about the data that defines us. I blame the technology industry, which — in the immortal words of Cool Hand Luke has had a “failure to communicate.” For decades we’ve talked the language of IT and left it up to our direct customers to explain the proper care-and-feeding of data to their business users. Small wonder it’s way too hard for regular people to understand what we, as an industry, are doing. After all, how we can expect others to explain the do’s and don’ts of data management when we haven’t clearly explained it ourselves?
I say we need to start talking about the ABC’s of handling data in a way that’s easy for anyone to understand. I’m convinced we can because — if you think about it — everything you learned about data you learned in kindergarten: It has to be clean, safe and connected. Here’s what I mean:
Data cleanliness has always been important, but assumes real urgency with the move toward Big Data. I blame Hadoop, the underlying technology that makes Big Data possible. On the plus side, Hadoop gives companies a cost-effective way to store, process and analyze petabytes of nearly every imaginable data type. And that’s the problem as companies go through the enormous time suck of cataloging and organizing vast stores of data. Put bluntly, big data can be a swamp.
The question is, how to make it potable. This isn’t always easy, but it’s always, always necessary. It begins, naturally, by ensuring the data is accurate, de-deduped and complete.
Now comes the truly difficult part: Knowing where that data originated, where it’s been, how it’s related to other data and its lineage. That data provenance is absolutely vital in our hyper-connected world where one company’s data interacts with data from suppliers, partners, and customers. Someone else’s dirty data, regardless of origin, can ruin reputations and drive down sales faster than you can say “Target breach.” In fact, we now know that hackers entered Target’s point-of-sales terminals through a supplier’s project management and electronic billing system. We won’t know for a while the full extent of the damage. We do know the hack affected one-third of the entire U.S. population. Which brings us to:
Obviously, being safe means keeping data out of the hands of criminals. But it doesn’t stop there. That’s because today’s technologies make it oh so easy to misuse the data we have at our disposal. If we’re really determined to keep data safe, we have to think long and hard about responsibility and governance. We have to constantly question the data we use, and how we use it. Questions like:
- How much of our data should be accessible, and by whom?
- Do we really need to include personal information, like social security numbers or medical data, in our Hadoop clusters?
- When do we go the extra step of making that data anonymous?
And as I think about it, I realize that everything we learned in kindergarten boils down to down to the ethics of data: How, for example, do we know if we’re using data for good or for evil?
That question is especially relevant for marketers, who have a tendency to use data to scare people, for crass commercialism, or to violate our privacy just because technology makes it possible. Use data ethically, and we can help change the use.
In fact, I believe that the ethics of data is such an important topic that I’ve decided to make it the title of my new blog.
Stay tuned for more musings on The Ethics of Data.