Tag Archives: Big Data
Business leaders share with Fortune Magazine their view of Big Data
Fortune Magazine recently asked a number of business leaders about what Big Data means to them. These leaders provide three great stories for the meaning of Big Data. Phil McAveety at Starwood Hotels talked about their oldest hotel having a tunnel between the general manager’s office and the front desk. This way the general manager could see and hear new arrivals and greet each like an old friend. Phil sees Big Data as a 21st century version of this tunnel. It enables us to know our guests and send them offers that matter to them. Jamie Miller at GE says Big Data is being about transforming how they service their customers while simplifying the way they run their company. Finally, Ellen Richey at VISA says that big data holds the promise of making new connections between disperse bits of information creating value.
Everyone is doing it but nobody really knows why?
I find all of these definitions interesting, but they are all very different and application specific. This isn’t encouraging. The message from Gartner is even less so. They find that “everyone is doing it but nobody really knows why”. According to Matt Asay, “the gravitational pull of Big Data is now so strong that even people who haven’t a clue as to what it’s all about report that they are running Big Data projects”. Gartner found in their research that 64% of enterprises surveyed say they’re deploying or planning to deploy Big Data projects. The problem is that 56% of those surveyed are struggling trying to determine how to get value out of big data, and 23% of those surveyed are struggling at how to define Big Data. Hopefully, none of the latter are being counted in the 64%. . Regardless, Gartner believes that the number of companies with Big Data projects is only going to increase. The question is how many of projects are just a recast of an existing BI project in order to secure funding or approval. No one will ever know.
Managing the hype phase of Big Data
One CIO that we talked to worries about this hype phase of Big Data. He says the opportunity is to inform analytics and guiding and finding business value. However, worries whether past IT mistakes will repeat themselves. This CIO believes that IT has gone through three waves. IT has grown from homegrown systems to ERP to Business Intelligence/Big Data. ERP was supposed to solve all the problems of the homegrown solutions but it did not provide anything more than information on transactions. You could not understand what is going on out there with ERP. BI and Big Data is trying to go after this. However, this CIO worries that CEOs/CFOs will soon start complaining that the information garnered does not make the business more money. He worries that CEOs and CFOs will start effectively singing the Who song, “We won’t get fooled again.”
This CIO believes that to make more money, Big Data needs to connect the dots between transactional systems, BI, and planning systems. It needs to convert data into business value. This means Big Data is not just another silo of data, but needs to be connected and correlated to the rest of your data landscape to make it actionable. To do this, he says it needs to be proactive and cut the time to execution. It needs to enable the enterprise to generate value different than competitors. This, he believes mean that it needs to orchestrate activities so they maximize profit or increase customer satisfaction. You need to get to the point where it is sense and response. Transactional systems, BI, and planning systems need to provide intelligence to allow managers to optimize business processes execution. According to Judith Hurwitz, optimization is about establishing the correlation between streams of information and matching the resulting pattern with defined behaviors such as mitigating a threat or seizing an opportunity.”
Don’t leave your CEO and CFO with a sense of deja vu
In sum, Big Data needs to go further in generating enough value to not leave your CEO and CFO with a sense of deja vu. The question is do you agree? Do you personally have a good handle on what Big Data is? And lastly, do you fear a day when the value generated needs to be attested to?
I just finished reading a great article from one of my former colleagues, Bill Franks. He makes a strong argument that Big Data is not inherently good or evil anymore than money is. What makes Big Data (or any data as I see it) take on a characteristic of good or evil is how it is used. Same as money, right? Here’s the rest of Bill’s article.
Bill framed his thoughts within the context of a discussion with a group of government legislators who I would characterize based on his commentary as a bit skittish of government collecting Big Data. Given many recent headlines, I sincerely do not blame them for being concerned. In fact, I applaud them for being cautious.
At the same time, while Big Data seems to be the “type” of data everyone wants to speak about, the scope of the potential problem extends to ALL data. Just because a particular dataset is highly structured into a 20 year old schema that does not exclude it from misuse. I believe structured data has been around for so long people are comfortable with (or have forgotten about) the associated risks.
Any data can be used for good or ill. Clearly, it does not make sense to take the position that “we” should not collect, store and leverage data based on the notion someone could do something bad.
I suggest the real conversation should revolve around access to data. Bill touches on this as well. Far too often, data, whether Big Data or “traditional”, is openly accessible to some people who truly have no need based on job function.
Consider this example – a contracted application developer in a government IT shop is working on the latest version of an existing application for agency case managers. To test the application and get it successfully through a rigorous quality assurance process the IT developer needs a representative dataset. And where does this data come from? It is usually copied from live systems, with personally identifiable information still intact. Not good.
Another example – Creating a 360 degree view of the citizens in a jurisdiction to be shared cross-agency can certainly be an advantageous situation for citizens and government alike. For instance, citizens can be better served, getting more of what they need, while agencies can better protect from fraud, waste and abuse. Practically any agency serving the public could leverage the data to better serve and protect. However, this is a recognized sticky situation. How much data does a case worker from the Department of Human Services need versus that of a law enforcement officer or an emergency services worker need? The way this has been addressed for years is to create silos of data, carrying with it, its own host of challenges. However, as technology evolves, so too should process and approach.
Stepping back and looking at the problem from a different perspective, both examples above, different as they are, can be addressed by incorporating a layer of data security directly into the architecture of the enterprise. Rather than rely on a hodgepodge of data security mechanisms built into point applications and silo’d systems, create a layer through which all data, Big or otherwise, is accessed.
Through such a layer, data can be persistently and/or dynamically masked based on the needs and role of the user. In the first example of the developer, this person would not want access to a live system to do their work. However, the ability to replicate the working environment of the live system is crucial. So, in this case, live data could be masked or altered in a permanent fashion as it is moved from production to development. Personally identifiable information could be scrambled or replaced with XXXXs. Now developers can do their work and the enterprise can rest assured that no harm can come from anyone seeing this data.
Further, through this data security layer, data can be dynamically masked based on a user’s role, leaving the original data unaltered for those who do require it. There are plenty of examples of how this looks in practice, think credit card numbers being displayed as xxxx-xxxx-xxxx-3153. However, this is usually implemented at the application layer and considered to be a “best practice” rather than governed from a consistent layer in the enterprise.
The time to re-think the enterprise approach to data security is here. Properly implemented and deployed, many of the arguments against collecting, integrating and analyzing data from anywhere are addressed. No doubt, having an active discussion on the merits and risks of data is prudent and useful. Yet, perhaps it should not be a conversation to save or not save data, it should be a conversation about access
- It’s difficult to find and retain resource skills to staff big data projects
- It takes too long to deploy Big Data projects from ‘proof-of-concept’ to production
- Big data technologies are evolving too quickly to adapt
- Big Data projects fail to deliver the expected value
- It’s difficult to make Big Data fit-for-purpose, assess trust, and ensure security
Informatica has extended its leadership in data integration and data quality to Hadoop with our Big Data Edition to address all of these Big Data challenges.
The biggest challenge companies’ face is finding and retaining Big Data resource skills to staff their Big Data projects. One large global bank started their first Big Data project with 5 Java developers but as their Big Data initiative gained momentum they needed to hire 25 more Java developers that year. They quickly realized that while they had scaled their infrastructure to store and process massive volumes of data they could not scale the necessary resource skills to implement their Big Data projects. The research mentioned earlier indicates that 80% of the work in a Big Data project relates to data integration and data quality. With Informatica you can staff Big Data projects with readily available Informatica developers instead of an army of developers hand-coding in Java and other Hadoop programming languages. In addition, we’ve proven to our customers that Informatica developers are up to 5 times more productive on Hadoop than hand-coding and they don’t need to know how to program on Hadoop. A large Fortune 100 global manufacturer needed to hire 40 data scientists for their Big Data initiative. Do you really want these hard-to-find and expensive resources spending 80% of their time integrating and preparing data?
Another key challenge is that it takes too long to deploy Big Data projects to production. One of our Big Data Media and Entertainment customers told me prior to purchasing the Informatica Big Data Edition that most of his Big Data projects had failed. Naturally, I asked him why they had failed. His response was, “We have these hot-shot Java developers with a good idea which they prove out in our sandbox environment. But then when it comes time to deploy it to production they have to re-work a lot of code to make it perform and scale, make it highly available 24×7, have robust error-handling, and integrate with the rest of our production infrastructure. In addition, it is very difficult to maintain as things change. This results in project delays and cost overruns.” With Informatica, you can automate the entire data integration and data quality pipeline; everything you build in the development sandbox environment can be immediately and automatically deployed and scheduled for production as enterprise ready. Performance, scalability, and reliability are simply handled through configuration parameters without having to re-build or re-work any development which is typical with hand-coding. And Informatica makes it easier to reuse existing work and maintain Big Data projects as things change. The Big Data Editions is built on Vibe our virtual data machine and provides near universal connectivity so that you can quickly onboard new types of data of any volume and at any speed.
Big Data technologies are emerging and evolving extremely fast. This in turn becomes a barrier to innovation since these technologies evolve much too quickly for most organizations to adopt before the next big thing comes along. What if you place the wrong technology bet and find that it is obsolete before you barely get started? Hadoop is gaining tremendous adoption but it has evolved along with other big data technologies where there are literally hundreds of open source projects and commercial vendors in the Big Data landscape. Informatica is built on the Vibe virtual data machine which means that everything you built yesterday and build today can be deployed on the major big data technologies of tomorrow. Today it is five flavors of Hadoop but tomorrow it could be Hadoop and other technology platforms. One of our Big Data Edition customers, stated after purchasing the product that Informatica Big Data Edition with Vibe is our insurance policy to insulate our Big Data projects from changing technologies. In fact, existing Informatica customers can take PowerCenter mappings they built years ago, import them into the Big Data Edition and can run on Hadoop in many cases with minimal changes and effort.
Another complaint of business is that Big Data projects fail to deliver the expected value. In a recent survey (1), 86% Marketers say they could generate more revenue if they had a more complete picture of customers. We all know that the cost of us selling a product to an existing customer is only about 10 percent of selling the same product to a new customer. But, it’s not easy to cross-sell and up-sell to existing customers. Customer Relationship Management (CRM) initiatives help to address these challenges but they too often fail to deliver the expected business value. The impact is low marketing ROI, poor customer experience, customer churn, and missed sales opportunities. By using Informatica’s Big Data Edition with Master Data Management (MDM) to enrich customer master data with Big Data insights you can create a single, complete, view of customers that yields tremendous results. We call this real-time customer analytics and Informatica’s solution improves total customer experience by turning Big Data into actionable information so you can proactively engage with customers in real-time. For example, this solution enables customer service to know which customers are likely to churn in the next two weeks so they can take the next best action or in the case of sales and marketing determine next best offers based on customer online behavior to increase cross-sell and up-sell conversions.
Chief Data Officers and their analytics team find it difficult to make Big Data fit-for-purpose, assess trust, and ensure security. According to the business consulting firm Booz Allen Hamilton, “At some organizations, analysts may spend as much as 80 percent of their time preparing the data, leaving just 20 percent for conducting actual analysis” (2). This is not an efficient or effective way to use highly skilled and expensive data science and data management resource skills. They should be spending most of their time analyzing data and discovering valuable insights. The result of all this is project delays, cost overruns, and missed opportunities. The Informatica Intelligent Data platform supports a managed data lake as a single place to manage the supply and demand of data and converts raw big data into fit-for-purpose, trusted, and secure information. Think of this as a Big Data supply chain to collect, refine, govern, deliver, and manage your data assets so your analytics team can easily find, access, integrate and trust your data in a secure and automated fashion.
If you are embarking on a Big Data journey I encourage you to contact Informatica for a Big Data readiness assessment to ensure your success and avoid the pitfalls of the top 5 Big Data challenges.
- Gleanster Survey of 100 senior level marketers. The title of this survey is, Lifecycle Engagement: Imperatives for Midsize and Large Companies. Sponsored by YesMail.
- “The Data Lake: Take Big Data Beyond the Cloud”, Booz Allen Hamilton, 2013
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.
Last week I had the opportunity to attend the Gartner Security and Risk Management Summit. At this event, Gartner analysts and security industry experts meet to discuss the latest trends, advances, best practices and research in the space. At the event, I had the privilege of connecting with customers, peers and partners. I was also excited to learn about changes that are shaping the data security landscape.
Here are some of the things I learned at the event:
- Security continues to be a top CIO priority in 2014. Security is well-aligned with other trends such as big data, IoT, mobile, cloud, and collaboration. According to Gartner, the top CIO priority area is BI/analytics. Given our growing appetite for all things data and our increasing ability to mine data to increase top-line growth, this top billing makes perfect sense. The challenge is to protect the data assets that drive value for the company and ensure appropriate privacy controls.
- Mobile and data security are the top focus for 2014 spending in North America according to Gartner’s pre-conference survey. Cloud rounds out the list when considering worldwide spending results.
- Rise of the DRO (Digital Risk Officer). Fortunately, those same market trends are leading to an evolution of the CISO role to a Digital Security Officer and, longer term, a Digital Risk Officer. The DRO role will include determination of the risks and security of digital connectivity. Digital/Information Security risk is increasingly being reported as a business impact to the board.
- Information management and information security are blending. Gartner assumes that 40% of global enterprises will have aligned governance of the two programs by 2017. This is not surprising given the overlap of common objectives such as inventories, classification, usage policies, and accountability/protection.
- Security methodology is moving from a reactive approach to compliance-driven and proactive (risk-based) methodologies. There is simply too much data and too many events for analysts to monitor. Organizations need to understand their assets and their criticality. Big data analytics and context-aware security is then needed to reduce the noise and false positive rates to a manageable level. According to Gartner analyst Avivah Litan, ”By 2018, of all breaches that are detected within an enterprise, 70% will be found because they used context-aware security, up from 10% today.”
I want to close by sharing the identified Top Digital Security Trends for 2014
- Software-defined security
- Big data security analytics
- Intelligent/Context-aware security controls
- Application isolation
- Endpoint threat detection and response
- Website protection
- Adaptive access
- Securing the Internet of Things
Getting started with Cloud Data Warehousing using Amazon Redshift is now easier than ever, thanks to the Informatica Cloud’s 60-day trial for Amazon Redshift. Now, anyone can easily and quickly move data from any on-premise, cloud, Big Data, or relational data sources into Amazon Redshift without writing a single line of code and without being a data integration expert. You can use Informatica Cloud’s six-step wizard to quickly replicate your data or use the productivity-enhancing cloud integration designer to tackle more advanced use cases, such as combining multiple data sources into one Amazon Redshift table. Existing Informatica PowerCenter users can use Informatica Cloud and Amazon Redshift to extend an existing data warehouse with through an affordable and scalable approach. If you are currently exploring self-service business intelligence solutions such as Birst, Tableau, or Microstrategy, the combination of Redshift and Informatica Cloud makes it incredibly easy to prepare the data for analytics by any BI solution.
To get started, execute the following steps:
- Go to http://informaticacloud.com/cloud-trial-for-redshift and click on the ‘Sign Up Now’ link
- You’ll be taken to the Informatica Marketplace listing for the Amazon Redshift trial. Sign up for a Marketplace account if you don’t already have one, and then click on the ‘Start Free Trial Now’ button
- You’ll then be prompted to login with your Informatica Cloud account. If you do not have an Informatica Cloud username and password, register one by clicking the appropriate link and fill in the required details
- Once you finish registration and obtain your login details, download the Vibe ™ Secure Agent to your Amazon EC2 virtual machine (or to a local Windows or Linux instance), and ensure that it can access your Amazon S3 bucket and Amazon Redshift cluster.
- Ensure that your S3 bucket, and Redshift cluster are both in the same availability zone
- To start using the Informatica Cloud connector for Amazon Redshift, create a connection to your Amazon Redshift nodes by providing your AWS Access Key ID and Secret Access Key, specifying your cluster details, and obtaining your JDBC URL string.
You are now ready to begin moving data to and from Amazon Redshift by creating your first Data Synchronization task (available under Applications). Pick a source, pick your Redshift target, map the fields, and you’re done!
The value of using Informatica Cloud to load data into Amazon Redshift is the ability of the application to move massive amounts of data in parallel. The Informatica engine optimizes by moving processing close to where the data is using push-down technology. Unlike other data integration solutions for Redshift that perform batch processing using an XML engine which is inherently slow when processing large data volumes and don’t have multitenant architectures that scale well, Informatica Cloud processes over 2 billion transactions every day.
Amazon Redshift has brought agility, scalability, and affordability to petabyte-scale data warehousing, and Informatica Cloud has made it easy to transfer all your structured and unstructured data into Redshift so you can focus on getting data insights today, not weeks from now.
The other comparison is that data is like solar power. Like solar power, data is abundant. In addition, it’s getting cheaper and more efficient to harness. The juxtaposition of these images captures the current sentiment around data’s potential to improve our lives in many ways. For this to happen, however, corporations and data custodians must effectively balance the power of data with security and privacy concerns.
Many people have a preconception of security as an obstacle to productivity. Actually, good security practitioners understand that the purpose of security is to support the goals of the company by allowing the business to innovate and operate more quickly and effectively. Think back to the early days of online transactions; many people were not comfortable banking online or making web purchases for fear of fraud and theft. Similar fears slowed early adoption of mobile phone banking and purchasing applications. But security ecosystems evolved, concerns were addressed, and now Gartner estimates that worldwide mobile payment transaction values surpass $235B in 2013. An astute security executive once pointed out why cars have brakes: not to slow us down, but to allow us to drive faster, safely.
The pace of digital change and the current proliferation of data is not a simple linear function – it’s growing exponentially – and it’s not going to slow down. I believe this is generally a good thing. Our ability to harness data is how we will better understand our world. It’s how we will address challenges with critical resources such as energy and water. And it’s how we will innovate in research areas such as medicine and healthcare. And so, as a relatively new Informatica employee coming from a security background, I’m now at a crossroads of sorts. While Informatica’s goal of “Putting potential to work” resonates with my views and helps customers deliver on the promise of this data growth, I know we need to have proper controls in place. I’m proud to be part of a team building a new intelligent, context-aware approach to data security (Secure@SourceTM).
We recently announced Secure@SourceTM during InformaticaWorld 2014. One thing that impressed me was how quickly attendees (many of whom have little security background) understood how they could leverage data context to improve security controls, privacy, and data governance for their organizations. You can find a great introduction summary of Secure@SourceTM here.
I will be sharing more on Secure@SourceTM and data security in general, and would love to get your feedback. If you are an Informatica customer and would like to help shape the product direction, we are recruiting a select group of charter customers to drive and provide feedback for the first release. Customers who are interested in being a charter customer should register and send email to SecureCustomers@informatica.com.
In my last blog, I talked about the dreadful experience of cleaning raw data by hand as a former analyst a few years back. Well, the truth is, I was not alone. At a recent data mining Meetup event in San Francisco bay area, I asked a few analysts: “How much time do you spend on cleaning your data at work?” “More than 80% of my time” and “most my days” said the analysts, and “they are not fun”.
But check this out: There are over a dozen Meetup groups focused on data science and data mining here in the bay area I live. Those groups put on events multiple times a month, with topics often around hot, emerging technologies such as machine learning, graph analysis, real-time analytics, new algorithm on analyzing social media data, and of course, anything Big Data. Cools BI tools, new programming models and algorithms for better analysis are a big draw to data practitioners these days.
That got me thinking… if what analysts said to me is true, i.e., they spent 80% of their time on data prepping and 1/4 of that time analyzing the data and visualizing the results, which BTW, “is actually fun”, quoting a data analyst, then why are they drawn to the events focused on discussing the tools that can only help them 20% of the time? Why wouldn’t they want to explore technologies that can help address the dreadful 80% of the data scrubbing task they complain about?
Having been there myself, I thought perhaps a little self-reflection would help answer the question.
As a student of math, I love data and am fascinated about good stories I can discover from them. My two-year math program in graduate school was primarily focused on learning how to build fabulous math models to simulate the real events, and use those formula to predict the future, or look for meaningful patterns.
I used BI and statistical analysis tools while at school, and continued to use them at work after I graduated. Those software were great in that they helped me get to the results and see what’s in my data, and I can develop conclusions and make recommendations based on those insights for my clients. Without BI and visualization tools, I would not have delivered any results.
That was fun and glamorous part of my job as an analyst, but when I was not creating nice charts and presentations to tell the stories in my data, I was spending time, great amount of time, sometimes up to the wee hours cleaning and verifying my data, I was convinced that was part of my job and I just had to suck it up.
It was only a few months ago that I stumbled upon data quality software – it happened when I joined Informatica. At first I thought they were talking to the wrong person when they started pitching me data quality solutions.
Turns out, the concept of data quality automation is a highly relevant and extremely intuitive subject to me, and for anyone who is dealing with data on the regular basis. Data quality software offers an automated process for data cleansing and is much faster and delivers more accurate results than manual process. To put that in math context, if a data quality tool can reduce the data cleansing effort from 80% to 40% (btw, this is hardly a random number, some of our customers have reported much better results), that means analysts can now free up 40% of their time from scrubbing data, and use that times to do the things they like – playing with data in BI tools, building new models or running more scenarios, producing different views of the data and discovering things they may not be able to before, and do all of that with clean, trusted data. No more bored to death experience, what they are left with are improved productivity, more accurate and consistent results, compelling stories about data, and most important, they can focus on doing the things they like! Not too shabby right?
I am excited about trying out the data quality tools we have here at Informtica, my fellow analysts, you should start looking into them also. And I will check back in soon with more stories to share..
I have a little fable to tell you…
This fable has nothing to do with Big Data, but instead deals with an Overabundance of Food and how to better digest it to make it useful.
And it all started when this SEO copywriter from IT Corporation walked into a bar, pub, grill, restaurant, liquor establishment, and noticed 2 large crowded tables. After what seemed like an endless loop, an SQL programmer sauntered in and contemplated the table problem. “Mind if I join you?”, he said? Since the tables were partially occupied and there were no virtual tables available, the host looked on the patio of the restaurant at 2 open tables. “Shall I do an outside join instead?” asked the programmer? The host considered their schema and assigned 2 seats to the space.
The writer told the programmer to look at the menu, bill of fare, blackboard – there were so many choices but not enough real nutrition. “Hmmm, I’m hungry for the right combination of food, grub, chow, to help me train for a triathlon” he said. With that contextual information, they thought about foregoing the menu items and instead getting in the all-you-can-eat buffer line. But there was too much food available and despite its appealing looks in its neat rows and columns, it seemed to be mostly empty calories. They both realized they had no idea what important elements were in the food, but came to the conclusion that this restaurant had a “Big Food” problem.
They scoped it out for a moment and then the writer did an about face, reversal, change in direction and the SQL programmer did a commit and quick pivot toward the buffer line where they did a batch insert of all of the food, even the BLOBS of spaghetti, mash potatoes and jello. There was far too much and it was far too rich for their tastes and needs, but they binged and consumed it all. You should have seen all the empty dishes at the end – they even caused a stack overflow. Because it was a batch binge, their digestive tracts didn’t know how to process all of the food, so they got a stomach ache from “big food” ingestion – and it nearly caused a core dump – in which case the restaurant host would have assigned his most dedicated servers to perform a thorough cleansing and scrubbing. There was no way to do a rollback at this point.
It was clear they needed relief. The programmer did an ad hoc query to JSON, their Server who they thought was Active, for a response about why they were having such “big food” indigestion, and did they have packets of relief available. No response. Then they asked again. There was still no response. So the programmer said to the writer, “Gee, the Quality Of Service here is terrible!”
Just then, the programmer remembered a remedy he had heard about previously and so he spoke up. “Oh, it’s very easy just <SELECT>Vibe.Data.Stream from INFORMATICA where REAL-TIME is NOT NULL.”
Informatica’s Vibe Data Stream enables streaming food collection for real-time Big food analytics, operational intelligence, and traditional enterprise food warehousing from a variety of distributed food sources at high scale and low latency. It enables the right food ingested at the right time when nutrition is needed without any need for binge or batch ingestion.
And so they all lived happily ever after and all was good in the IT Corporation once again.
Download Now and take your first steps to rapidly developing applications that sense and respond to streaming food (or data) in real-time.
Big data is different than small data
The definitions of big data are diverse. Many authors define big data by its characteristics–volume, velocity, and variety of data. VC Choudary, Associate Professor of Information Systems at UCI, for example, says “what differentiates Big Data from traditional data is the sheer volume of information, velocity at which it is created, and the variety of sources from which it is drawn?” Hurwitz and Associates—a BI consultancy—defines Big Data similarly as the capability to manage a huge volume of disparate data, at the right speed, and within the right time frame to allow real time analysis and reaction.
But how about the business practitioner’s point of view? Recently, I heard a significant Healthcare CIO talk about Big Data. This CIO defined Big Data by defining what is “small data” first. He said small data is “single-source, often batch-processed, and locally managed”. So what then is Big Data? “It is multi-source, requires connecting between data sources, multi-structured (structured and unstructured), real time, and uses information in aggregate”. This healthcare CIO went onto say that he sees “Big Data aiming to establish a model from the data. Big Data is about finding data relationships in the data rather than creating the data relationships in a data model”. This is a huge difference from traditional business intelligence (BI), which is best implemented when there is a level of determinism for the data in the data model.
Parallel architectures enable Big Data
Truly parallel architectures are an enabler of Big Data. To be fair, parallel architectures are not truly new—parallel architectures have existing for some time. I remember seeing my first server based parallel architecture in the work that Intel and other chipset makers were doing back in the mid 90s. And to be really fair Von Neumann defined parallel processing and serial processing architectures at the same time. What is new is that over the last few years is that we have lost a degree of parallelism as we have sought to centralize and protect data. What Hadoop does is to gang bang together many cheap machines at the same time as it spreads the data and spreads the processing. Redundancy is achieved by sharing each processing load with more than one machine.
Big Data moves from descriptive statistics to predictive analytics
According to Choudary, Big Data is not just about the amount of data that can be processed. It also about what you can do with data. He claims that “Big Data is about changing the game of business from one of simple descriptive statistics into one where all of the available data is collected and mined together. The Big Data era is about predicting outcomes based on disparate pieces of information and therefore, it is about prescribing opportunities.”
Real Life Big Data Case Studies
So what is big data good for?
Let’s start with what has already been learned in healthcare big data analysis. In healthcare, they have found that people with higher pain scores crash more often in the ICU. Scary to me that they are just learning this! Another big issue in healthcare is re-admits. And Healthcare reform creates big penalties for them. To help limit them, it is really important that know that patients manage their illness after they leave the hospitals. What they have learned from studying patient credit scores, is that they are a good predictor of whether patients will take their medicines and therefore, have a tendency to be readmitted to the hospital. The higher the credit score, the higher the probability is that people will take their medicine after leaving the hospital. I found this particularly interesting, because several years ago, I got to work with Intuit. They had identified a persona for those that were meticulous with their finances. They called them anal-retentives. Big Data has determined that anal-retentives take their medicines more often. So hospitals should check-in more regularly on those with poor credit scores to make sure that they are taking their medicines and thus, limit their re-admits.
The Healthcare CIO that we mentioned earlier claims that Big Data will over time move from “differentiating healthcare organizations to table stakes.” When I asked him why, he said the reason is simple: “We are in the business of creating the highest value care. And big data is fundamentally about serving our patients better than we do today. And everyone in healthcare will have to do it.” Another Healthcare CIO says that he is looking to Big Data to help him create a greater understanding of the relationship of inputs to outputs concerning patients. We need to have a better understanding of the health status and needs of a specific patient over time. This means assembling data from multiple patient encounters and multiple sources. He goes on to say that “those organizations with strong partnerships up and down the value chain or for that matter, even among competitors, are better positioned to take insights, process improvements, and other advantages to the market. Use and management of data and will increasingly become an element of competitive advantage”.
Big Data helps with Credit Risk
It is not just healthcare that sees big changes being enabled by Big Data, the president of a major credit reporting agency sees Big Data as enabler of risk reduction for his firms. He asserts as well that on average firms use less than 5% of the data available to them—this is important in financial markets where the quality of risk management can to determine the earnings returned to shareholders. He says, however, he sees a challenge in big data is hiring the talented people who can ask the right questions of data. As in many growth industries, hiring talented practitioners and data scientist is a difficult thing to do.
Big Data makes Fast Food Better
Meanwhile, a major fast food vendor says that big data has enabled them to better understand their market from the outside in and across many disciplines including public relations, customer service, marketing, advertising, research, product innovation, and sales. Clearly, creating a view across all these touch points can lead to better decision making.
What does it mean?
So there you have it. Big Data is big in terms of what it involves and what it is trying to accomplish. It has already had derived interesting outcomes for healthcare, credit reporting, and even fast food. The question is what is doing or can do for your business. Please feel share to your results here.