Tag Archives: Big Data
Gartner’s official definition of Information Governance is “…the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. It includes the processes, roles, standards, and metrics that ensure the effective and efficient use of information in enabling a business to achieve its goals.” It therefore looks to address important considerations that key stakeholders within an enterprise face.
A CIO of a large European bank once asked me – “How long do we need to keep information?”
Keeping Information Governance relevant
This bank had to govern, index, search, and provide content to auditors to show it is managing data appropriately to meet Dodd-Frank regulation. In the past, this information was retrieved from a database or email. Now, however, the bank was required to produce voice recordings from phone conversations with customers, show the Reuters feeds coming in that are relevant, and document all appropriate IMs and social media interactions between employees.
All these were systems the business had never considered before. These environments continued to capture and create data and with it complex challenges. These islands of information that seemingly do not have anything to do with each other, yet impact how that bank governs itself and how it saves any of the records associated with trading or financial information.
Coping with the sheer growth is one issue; what to keep and what to delete is another. There is also the issue of what to do with all the data once you have it. The data is potentially a gold mine for the business, but most businesses just store it and forget about it.
Legislation, in tandem, is becoming more rigorous and there are potentially thousands of pieces of regulation relevant to multinational companies. Businesses operating in the EU, in particular, are affected by increasing regulation. There are a number of different regulations, including Solvency II, Dodd-Frank, HIPAA, Gramm-Leach-Bliley Act (GLBA), Basel III and new tax laws. In addition, companies face the expansion of state-regulated privacy initiatives and new rules relating to disaster recovery, transportation security, value chain transparency, consumer privacy, money laundering, and information security.
Regardless, an enterprise should consider the following 3 core elements before developing and implementing a policy framework.
Whatever your size or type of business, there are several key processes you must undertake in order to create an effective information governance program. As a Business Transformation Architect, I can see 3 foundation stones of an effective Information Governance Program:
Assess Your Business Maturity
Understand the full scope of requirements on your business is a heavy task. Assess whether your business is mature enough to embrace information governance. Many businesses in EMEA do not have an information governance team already in place, but instead have key stakeholders with responsibility for information assets spread across their legal, security, and IT teams.
Undertake a Regulatory Compliance Review
Understand the legal obligations to your business are critical in shaping an information governance program. Every business is subject to numerous compliance regimes managed by multiple regulatory agencies, which can differ across markets. Many compliance requirements are dependent upon the numbers of employees and/or turnover reaching certain limits. For example, certain records may need to be stored for 6 years in Poland, yet the same records may need to be stored for 3 years in France.
Establish an Information Governance Team
It is important that a core team be assigned responsibility for the implementation and success of the information governance program. This steering group and a nominated information governance lead can then drive forward operational and practical issues, including; Agreeing and developing a work program, Developing policy and strategy, and Communication and awareness planning.
The future of lighting may first be peeking through at Newark Liberty Airport in New Jersey. The airport has installed 171 new LED-based light fixtures that include a variety of sensors to detect and record what’s going in the airport, as reported by Diane Cardwell in The New York Times. Together they make a network of devices that communicates wirelessly and allows authorities to scan license plates of passing cars, watch out for lines and delays, and check out travelers for suspicious activities.
I get the feeling that Newark’s new gear will not be the last of lighting-based digital networks. Over the last few years, LED street lights have gone from something cities would love to have to the sector standard. That the market has shifted so swiftly is thanks to the efforts of early movers such as the City of Los Angeles, which last year completed the world’s largest LED street light replacement project, with LED fixtures installed on 150,000 streetlights.
Los Angeles is certainly not alone in making the switch to LED street lighting. In March 2013, Las Vegas outfitted 50,000 streetlights with LED fixtures. One month later, the Austin TX announced plans to install 35,000 LED street lights. Not to be outdone, New York City, is planning to go all-LED by 2017, which would save $14 million and many tons of carbon emissions each year.
The impending switch to LEDs is an excellent opportunity for LED light fixture makers and Big Data software vendors like Informatica. These fixtures are made with a wide variety of sensors that can be tailored to whatever the user wants to detect, including temperature, humidity, seismic activity, radiation, audio, and video, among other things. The sensors could even detect and triangulate the source of a gunshot.
This steady stream of real-time data collected from these fixtures can be transformed into torrents of small messages and events with unprecedented agility using Informatica Vibe Data Stream. Analyzed data can then be distributed to various governmental and non-governmental agencies, such as; law enforcement, environmental monitors, retailers, etc.
If I were to guess the number of streetlights in the world, I would say 4 billion. Upgrading these is a “once-in-a-generation opportunity” to harness “lots of data, i.e., Sensory big data.”
Come and get it. For developers hungry to get their hands on Informatica on Hadoop, a downloadable free trial of Informatica Big Data Edition was launched today on the Informatica Marketplace. See for yourself the power of the killer app on Hadoop from the leader in data integration and quality.
Thanks to the generous help of our partners, the Informatica Big Data team has preinstalled the Big Data Edition inside the sandbox VMs of the two leading Hadoop distributions. This empowers Hadoop and Informatica developers to easily try the codeless, GUI driven Big Data Edition to build and execute ETL and data integration pipelines natively on Hadoop for Big Data analytics.
Informatica Big Data Edition is the most complete and powerful suite for Hadoop data pipelines and can increase productivity up to 5 times. Developers can leverage hundreds of out-of-the-box Informatica pre-built transforms and connectors for structured and unstructured data processing on Hadoop. With the Informatica Vibe Virtual Data Machine running directly on each node of the Hadoop cluster, the Big Data Edition can profile, parse, transform and cleanse data at any scale to prepare data for data science, business intelligence and operational analytics.
The Informatica Big Data Edition Trial Sandbox VMs will have a 60 day trial version of the Big Data Edition preinstalled inside a 1-node Hadoop cluster. The trials include sample data and mappings as well as getting started documentation and videos. It is possible to try your own data with the trials, but processing is limited to the 1-node Hadoop cluster and the machine you have it running on. Any mappings you develop in the trial can be easily moved on to a production Hadoop cluster running the Big Data Edition. The Informatica Big Data Edition also supports MapR and Pivotal Hadoop distributions, however, the trial is currently only available for Cloudera and Hortonworks.
Accelerate your ability to bring Hadoop from the sandbox into production by leveraging Informatica’s Big Data Edition. Informatica’s visual development approach means that more than one hundred thousand existing Informatica developers are now Hadoop developers without having to learn Hadoop or new hand coding techniques and languages. Informatica can help organizations easily integrate Hadoop into their enterprise data infrastructure and bring the PowerCenter data pipeline mappings running on traditional servers onto Hadoop clusters with minimal modification. Informatica Big Data Edition reduces the risk of Hadoop projects and increases agility by enabling more of your organization to interact with the data in your Hadoop cluster.
To get the Informatica Big Data Edition Trial Sandbox VMs and more information please visit Informatica Marketplace
You probably know this already, but I’m going to say it anyway: It’s time you changed your infrastructure. I say this because most companies are still running infrastructure optimized for ERP, CRM and other transactional systems. That’s all well and good for running IT-intensive, back-office tasks. Unfortunately, this sort of infrastructure isn’t great for today’s business imperatives of mobility, cloud computing and Big Data analytics.
Virtually all of these imperatives are fueled by information gleaned from potentially dozens of sources to reveal our users’ and customers’ activities, relationships and likes. Forward-thinking companies are using such data to find new customers, retain existing ones and increase their market share. The trick lies in translating all this disparate data into useful meaning. And to do that, IT needs to move beyond focusing solely on transactions, and instead shine a light on the interactions that matter to their customers, their products and their business processes.
They need what we at Informatica call a “Data First” perspective. You can check out my first blog first about being Data First here.
A Data First POV changes everything from product development, to business processes, to how IT organizes itself and —most especially — the impact IT has on your company’s business. That’s because cloud computing, Big Data and mobile app development shift IT’s responsibilities away from running and administering equipment, onto aggregating, organizing and improving myriad data types pulled in from internal and external databases, online posts and public sources. And that shift makes IT a more-empowering force for business change. Think about it: The ability to connect and relate the dots across data from multiple sources finally gives you real power to improve entire business processes, departments and organizations.
I like to say that the role of IT is now “big I, little t,” with that lowercase “t” representing both technology and transactions. But that role requires a new set of priorities. They are:
- Think about information infrastructure first and application infrastructure second.
- Create great data by design. Architect for connectivity, cleanliness and security. Check out the eBook Data Integration for Dummies.
- Optimize for speed and ease of use – SaaS and mobile applications change often. Click here to try Informatica Cloud for free for 30 days.
- Make data a team sport. Get tools into your users’ hands so they can prepare and interact with it.
I never said this would be easy, and there’s no blueprint for how to go about doing it. Still, I recognize that a little guidance will be helpful. In a few weeks, Informatica’s CIO Eric Johnson and I will talk about how we at Informatica practice what we preach.
Malcolm Gladwell wrote an article in The New Yorker magazine in January, 2007 entitled “Open Secrets.” In the article, he pointed out that a national-security expert had famously made a distinction between puzzles and mysteries.
Osama bin Laden’s whereabouts were, for many years, a puzzle. We couldn’t find him because we didn’t have enough information. The key to the puzzle, it was assumed, would eventually come from someone close to bin Laden, and until we could find that source, bin Laden would remain at large. In fact, that’s precisely what happened. Al-Qaida’s No. 3 leader, Khalid Sheikh Mohammed, gave authorities the nicknames of one of bin Laden’s couriers, who then became the linchpin to the CIA’s efforts to locate Bin Laden.
By contrast, the problem of what would happen in Iraq after the toppling of Saddam Hussein was a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much.
This was written before “Big Data” was a household word and it begs the very interesting question of whether organizations and corporations that are, by anyone’s standards, totally deluged with data, are facing puzzles or mysteries. Consider the amount of data that a company like Western Union deals with.
Western Union is a 160-year old company. Having built scale in the money transfer business, the company is in the process of evolving its business model by enabling the expansion of digital products, growth of web and mobile channels, and a more personalized online customer experience. Sounds good – but get this: the company processes more than 29 transactions per seconds on average. That’s 242 million consumer-to-consumer transactions and 459 million business payments in a year. Nearly a billion transactions – a billion! As my six-year-old might say, that number is big enough “to go to the moon and back.” Layer on top of that the fact that the company operates in 200+ countries and territories, and conducts business in 120+ currencies. Senior Director and Head of Engineering Abhishek Banerjee has said, “The data is speaking to us. We just need to react to it.” That implies a puzzle, not a mystery – but only if data scientists are able to conduct statistical modeling and predictive analysis, systematically noting trends in sending and receiving behaviors. Check out what Banerjee and Western Union CTO Sanjay Saraf have to say about it here.
Or consider General Electric’s aggressive and pioneering move into what’s dubbed as the industrial internet. In a white paper entitled “The Case for an Industrial Big Data Platform: Laying the Groundwork for the New Industrial Age,” GE reveals some of the staggering statistics related to the industrial equipment that it manufactures and supports (services comprise 75% of GE’s bottom line):
- A modern wind turbine contains approximately 50 sensors and control loops which collect data every 40 milliseconds.
- A farm controller then receives more than 30 signals from each turbine at 160-millisecond intervals.
- At every one-second interval, the farm monitoring software processes 200 raw sensor data points with various associated properties with each turbine.
Phew! I’m no electricity operations expert, and you probably aren’t either. And most of us will get no further than simply wrapping our heads around the simple fact that GE turbines are collecting a LOT of data. But what the paper goes on to say should grab your attention in a big way: “The key to success for this wind farm lies in the ability to collect and deliver the right data, at the right velocity, and in the right quantities to a wide set of well-orchestrated analytics.” And the paper goes on to recommend that anyone involved in the Industrial Internet revolution strongly consider its talent requirements, with the suggestion that Chief Data officers and/or Data Scientists may be the next critical hires.
Which brings us back to Malcolm Gladwell. In the aforementioned article, Gladwell goes on to pull apart the Enron debacle, and argues that it was a prime example of the perils of too much information. “If you sat through the trial of (former CEO) Jeffrey Skilling, you’d think that the Enron scandal was a puzzle. The company, the prosecution said, conducted shady side deals that no one quite understood. Senior executives withheld critical information from investors…We were not told enough—the classic puzzle premise—was the central assumption of the Enron prosecution.” But in fact, that was not true. Enron employed complicated – but perfectly legal–accounting techniques used by companies that engage in complicated financial trading. Many journalists and professors have gone back and looked at the firm’s regulatory filings, and have come to the conclusion that, while complex and difficult to identify, all of the company’s shenanigans were right there in plain view. Enron cannot be blamed for covering up the existence of its side deals. It didn’t; it disclosed them. As Gladwell summarizes:
“Puzzles are ‘transmitter-dependent’; they turn on what we are told. Mysteries are ‘receiver dependent’; they turn on the skills of the listener.”
I would argue that this extremely complex, fast moving and seismic shift that we call Big Data will favor those who have developed the ability to attune, to listen and make sense of the data. Winners in this new world will recognize what looks like an overwhelming and intractable mystery, and break that mystery down into small and manageable chunks and demystify the landscape, to uncover the important nuggets of truth and significance.
Several years ago, I got to participate in one of the first neural net conferences. At the time, I thought it was amazing just to be there. There were chip and software vendors galore. Many even claimed to be the next Intel or the next Microsoft. Years later I joined a complex event processing vendor. Again, I felt the same sense of excitement. In both cases, the market participants moved from large horizontal market plays to smaller and more vertical solutions.
Now to be clear, it is not my goal today to pop anyone’s big data balloon. But as I have gotten more excited about big data, I have gotten more and more an eerie sense of deja vu. The fact is the more that I dig into big data and hear customer’s stories about what they are trying to do with big data; the more I have concern about the similarities between big data and neural nets and complex event processing.
Clearly, big data does offer some interesting new features. And big data does take advantage of other market trends including virtualization and cloud. By doing so, big data achieves new orders of scalability than traditional business intelligence processing and storage. At the same time, big data offers the potential for lowering cost but I should take a moment to stress the word potential. The reason I do this is that while a myriad of processing approaches have been developed, no standard has yet emerged. And early adopters complain about having a difficulty in hiring big data map reduce programmers. And just like neural nets, the programing that needs to be done is turning out to be application specific.
With this said, it should be clear that big data does offer the potential to test datasets and to discover new and sometimes unexpected data relationship. This is a real positive thing. However, like its predecessors, this work is application specific and the data that is being related is truly of differing quality and detail. This means that the best that big data can do as a technology movement is discover potential data relationship. Once this is done, meaning can only be created by establishing detailed data relationships and dealing with the varying quality of data sets within the big data cluster.
Big Data will become small for management analysis
This means that big data must become small in order to really solve customer problems. Judith Hurwitz puts it this way, “big data analysis is really about small data. Small data, therefore, is the product of big data analysis. Big data will become small so that it is easier to comprehend”. What is “more necessary than ever is the capability to analyze the right data in a timely enough fashion to make decisions and take actions”. Judith says that in the end what is needed is quality data that is consistent, accurate, reliable, complete, timely, reasonable, and valid. The critical point is whether you use map reduce processing or traditional BI means, you shouldn’t throw out your data integration and quality tools. As big data becomes smaller, these will in reality become increasingly important.
So how does Judith see big data evolving? Judith sees big data propelling a lot of new small data. Judith believes that, “getting the right perspective on data quality can be very challenging in the world of big data. With a majority of big data sources, you need to assume that you are working with data that is not clean”. Judith says that we need to accept the fact that a lot of noise will exist in data. It is by searching and pattern matching that you will be able to find some sparks of truth in the midst of some very dirty data”. Judith suggests, therefore, a two phase approach—1) look for patterns in big data without concern for data quality; and 2) after you locate your patterns, applying the same data quality standards that have been applied to traditional data sources.
For this reason, I believe that history will to a degree repeat itself. Clearly, the big data emperor does have his clothes on, but big data will become smaller and more vertical. Big data will become about relationship discovery and small data will become about quality analysis of data sources. In sum, this means that small data analysis is focused and provides the data for business decision making and big data analysis is broad and is about discovering what data potentially relates to what data. I know this is a bit of different from the hype but it is realistic and makes sense. Remember, in the end, you will still need what business intelligence has refined.
Business leaders share with Fortune Magazine their view of Big Data
Fortune Magazine recently asked a number of business leaders about what Big Data means to them. These leaders provide three great stories for the meaning of Big Data. Phil McAveety at Starwood Hotels talked about their oldest hotel having a tunnel between the general manager’s office and the front desk. This way the general manager could see and hear new arrivals and greet each like an old friend. Phil sees Big Data as a 21st century version of this tunnel. It enables us to know our guests and send them offers that matter to them. Jamie Miller at GE says Big Data is being about transforming how they service their customers while simplifying the way they run their company. Finally, Ellen Richey at VISA says that big data holds the promise of making new connections between disperse bits of information creating value.
Everyone is doing it but nobody really knows why?
I find all of these definitions interesting, but they are all very different and application specific. This isn’t encouraging. The message from Gartner is even less so. They find that “everyone is doing it but nobody really knows why”. According to Matt Asay, “the gravitational pull of Big Data is now so strong that even people who haven’t a clue as to what it’s all about report that they are running Big Data projects”. Gartner found in their research that 64% of enterprises surveyed say they’re deploying or planning to deploy Big Data projects. The problem is that 56% of those surveyed are struggling trying to determine how to get value out of big data, and 23% of those surveyed are struggling at how to define Big Data. Hopefully, none of the latter are being counted in the 64%. . Regardless, Gartner believes that the number of companies with Big Data projects is only going to increase. The question is how many of projects are just a recast of an existing BI project in order to secure funding or approval. No one will ever know.
Managing the hype phase of Big Data
One CIO that we talked to worries about this hype phase of Big Data. He says the opportunity is to inform analytics and guiding and finding business value. However, worries whether past IT mistakes will repeat themselves. This CIO believes that IT has gone through three waves. IT has grown from homegrown systems to ERP to Business Intelligence/Big Data. ERP was supposed to solve all the problems of the homegrown solutions but it did not provide anything more than information on transactions. You could not understand what is going on out there with ERP. BI and Big Data is trying to go after this. However, this CIO worries that CEOs/CFOs will soon start complaining that the information garnered does not make the business more money. He worries that CEOs and CFOs will start effectively singing the Who song, “We won’t get fooled again.”
This CIO believes that to make more money, Big Data needs to connect the dots between transactional systems, BI, and planning systems. It needs to convert data into business value. This means Big Data is not just another silo of data, but needs to be connected and correlated to the rest of your data landscape to make it actionable. To do this, he says it needs to be proactive and cut the time to execution. It needs to enable the enterprise to generate value different than competitors. This, he believes mean that it needs to orchestrate activities so they maximize profit or increase customer satisfaction. You need to get to the point where it is sense and response. Transactional systems, BI, and planning systems need to provide intelligence to allow managers to optimize business processes execution. According to Judith Hurwitz, optimization is about establishing the correlation between streams of information and matching the resulting pattern with defined behaviors such as mitigating a threat or seizing an opportunity.”
Don’t leave your CEO and CFO with a sense of deja vu
In sum, Big Data needs to go further in generating enough value to not leave your CEO and CFO with a sense of deja vu. The question is do you agree? Do you personally have a good handle on what Big Data is? And lastly, do you fear a day when the value generated needs to be attested to?
I just finished reading a great article from one of my former colleagues, Bill Franks. He makes a strong argument that Big Data is not inherently good or evil anymore than money is. What makes Big Data (or any data as I see it) take on a characteristic of good or evil is how it is used. Same as money, right? Here’s the rest of Bill’s article.
Bill framed his thoughts within the context of a discussion with a group of government legislators who I would characterize based on his commentary as a bit skittish of government collecting Big Data. Given many recent headlines, I sincerely do not blame them for being concerned. In fact, I applaud them for being cautious.
At the same time, while Big Data seems to be the “type” of data everyone wants to speak about, the scope of the potential problem extends to ALL data. Just because a particular dataset is highly structured into a 20 year old schema that does not exclude it from misuse. I believe structured data has been around for so long people are comfortable with (or have forgotten about) the associated risks.
Any data can be used for good or ill. Clearly, it does not make sense to take the position that “we” should not collect, store and leverage data based on the notion someone could do something bad.
I suggest the real conversation should revolve around access to data. Bill touches on this as well. Far too often, data, whether Big Data or “traditional”, is openly accessible to some people who truly have no need based on job function.
Consider this example – a contracted application developer in a government IT shop is working on the latest version of an existing application for agency case managers. To test the application and get it successfully through a rigorous quality assurance process the IT developer needs a representative dataset. And where does this data come from? It is usually copied from live systems, with personally identifiable information still intact. Not good.
Another example – Creating a 360 degree view of the citizens in a jurisdiction to be shared cross-agency can certainly be an advantageous situation for citizens and government alike. For instance, citizens can be better served, getting more of what they need, while agencies can better protect from fraud, waste and abuse. Practically any agency serving the public could leverage the data to better serve and protect. However, this is a recognized sticky situation. How much data does a case worker from the Department of Human Services need versus that of a law enforcement officer or an emergency services worker need? The way this has been addressed for years is to create silos of data, carrying with it, its own host of challenges. However, as technology evolves, so too should process and approach.
Stepping back and looking at the problem from a different perspective, both examples above, different as they are, can be addressed by incorporating a layer of data security directly into the architecture of the enterprise. Rather than rely on a hodgepodge of data security mechanisms built into point applications and silo’d systems, create a layer through which all data, Big or otherwise, is accessed.
Through such a layer, data can be persistently and/or dynamically masked based on the needs and role of the user. In the first example of the developer, this person would not want access to a live system to do their work. However, the ability to replicate the working environment of the live system is crucial. So, in this case, live data could be masked or altered in a permanent fashion as it is moved from production to development. Personally identifiable information could be scrambled or replaced with XXXXs. Now developers can do their work and the enterprise can rest assured that no harm can come from anyone seeing this data.
Further, through this data security layer, data can be dynamically masked based on a user’s role, leaving the original data unaltered for those who do require it. There are plenty of examples of how this looks in practice, think credit card numbers being displayed as xxxx-xxxx-xxxx-3153. However, this is usually implemented at the application layer and considered to be a “best practice” rather than governed from a consistent layer in the enterprise.
The time to re-think the enterprise approach to data security is here. Properly implemented and deployed, many of the arguments against collecting, integrating and analyzing data from anywhere are addressed. No doubt, having an active discussion on the merits and risks of data is prudent and useful. Yet, perhaps it should not be a conversation to save or not save data, it should be a conversation about access
- It’s difficult to find and retain resource skills to staff big data projects
- It takes too long to deploy Big Data projects from ‘proof-of-concept’ to production
- Big data technologies are evolving too quickly to adapt
- Big Data projects fail to deliver the expected value
- It’s difficult to make Big Data fit-for-purpose, assess trust, and ensure security
Informatica has extended its leadership in data integration and data quality to Hadoop with our Big Data Edition to address all of these Big Data challenges.
The biggest challenge companies’ face is finding and retaining Big Data resource skills to staff their Big Data projects. One large global bank started their first Big Data project with 5 Java developers but as their Big Data initiative gained momentum they needed to hire 25 more Java developers that year. They quickly realized that while they had scaled their infrastructure to store and process massive volumes of data they could not scale the necessary resource skills to implement their Big Data projects. The research mentioned earlier indicates that 80% of the work in a Big Data project relates to data integration and data quality. With Informatica you can staff Big Data projects with readily available Informatica developers instead of an army of developers hand-coding in Java and other Hadoop programming languages. In addition, we’ve proven to our customers that Informatica developers are up to 5 times more productive on Hadoop than hand-coding and they don’t need to know how to program on Hadoop. A large Fortune 100 global manufacturer needed to hire 40 data scientists for their Big Data initiative. Do you really want these hard-to-find and expensive resources spending 80% of their time integrating and preparing data?
Another key challenge is that it takes too long to deploy Big Data projects to production. One of our Big Data Media and Entertainment customers told me prior to purchasing the Informatica Big Data Edition that most of his Big Data projects had failed. Naturally, I asked him why they had failed. His response was, “We have these hot-shot Java developers with a good idea which they prove out in our sandbox environment. But then when it comes time to deploy it to production they have to re-work a lot of code to make it perform and scale, make it highly available 24×7, have robust error-handling, and integrate with the rest of our production infrastructure. In addition, it is very difficult to maintain as things change. This results in project delays and cost overruns.” With Informatica, you can automate the entire data integration and data quality pipeline; everything you build in the development sandbox environment can be immediately and automatically deployed and scheduled for production as enterprise ready. Performance, scalability, and reliability are simply handled through configuration parameters without having to re-build or re-work any development which is typical with hand-coding. And Informatica makes it easier to reuse existing work and maintain Big Data projects as things change. The Big Data Editions is built on Vibe our virtual data machine and provides near universal connectivity so that you can quickly onboard new types of data of any volume and at any speed.
Big Data technologies are emerging and evolving extremely fast. This in turn becomes a barrier to innovation since these technologies evolve much too quickly for most organizations to adopt before the next big thing comes along. What if you place the wrong technology bet and find that it is obsolete before you barely get started? Hadoop is gaining tremendous adoption but it has evolved along with other big data technologies where there are literally hundreds of open source projects and commercial vendors in the Big Data landscape. Informatica is built on the Vibe virtual data machine which means that everything you built yesterday and build today can be deployed on the major big data technologies of tomorrow. Today it is five flavors of Hadoop but tomorrow it could be Hadoop and other technology platforms. One of our Big Data Edition customers, stated after purchasing the product that Informatica Big Data Edition with Vibe is our insurance policy to insulate our Big Data projects from changing technologies. In fact, existing Informatica customers can take PowerCenter mappings they built years ago, import them into the Big Data Edition and can run on Hadoop in many cases with minimal changes and effort.
Another complaint of business is that Big Data projects fail to deliver the expected value. In a recent survey (1), 86% Marketers say they could generate more revenue if they had a more complete picture of customers. We all know that the cost of us selling a product to an existing customer is only about 10 percent of selling the same product to a new customer. But, it’s not easy to cross-sell and up-sell to existing customers. Customer Relationship Management (CRM) initiatives help to address these challenges but they too often fail to deliver the expected business value. The impact is low marketing ROI, poor customer experience, customer churn, and missed sales opportunities. By using Informatica’s Big Data Edition with Master Data Management (MDM) to enrich customer master data with Big Data insights you can create a single, complete, view of customers that yields tremendous results. We call this real-time customer analytics and Informatica’s solution improves total customer experience by turning Big Data into actionable information so you can proactively engage with customers in real-time. For example, this solution enables customer service to know which customers are likely to churn in the next two weeks so they can take the next best action or in the case of sales and marketing determine next best offers based on customer online behavior to increase cross-sell and up-sell conversions.
Chief Data Officers and their analytics team find it difficult to make Big Data fit-for-purpose, assess trust, and ensure security. According to the business consulting firm Booz Allen Hamilton, “At some organizations, analysts may spend as much as 80 percent of their time preparing the data, leaving just 20 percent for conducting actual analysis” (2). This is not an efficient or effective way to use highly skilled and expensive data science and data management resource skills. They should be spending most of their time analyzing data and discovering valuable insights. The result of all this is project delays, cost overruns, and missed opportunities. The Informatica Intelligent Data platform supports a managed data lake as a single place to manage the supply and demand of data and converts raw big data into fit-for-purpose, trusted, and secure information. Think of this as a Big Data supply chain to collect, refine, govern, deliver, and manage your data assets so your analytics team can easily find, access, integrate and trust your data in a secure and automated fashion.
If you are embarking on a Big Data journey I encourage you to contact Informatica for a Big Data readiness assessment to ensure your success and avoid the pitfalls of the top 5 Big Data challenges.
- Gleanster Survey of 100 senior level marketers. The title of this survey is, Lifecycle Engagement: Imperatives for Midsize and Large Companies. Sponsored by YesMail.
- “The Data Lake: Take Big Data Beyond the Cloud”, Booz Allen Hamilton, 2013
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.