Tag Archives: data
I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.
Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.
A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.
You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.
When will the machine understand that “I look like a lobster” means I am sunburned?
I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.
But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.
For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?
If you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.
Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.
If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.
1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers .
Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.
You can contact me on twitter @bigdatabeat with your comments.
Citrix: You may not realize you know them, but chances are pretty good that you do. And chances are also good that we marketers can learn something about achieving fortune teller-like marketing from them!
Citrix is the company that brought you GoToMeeting and a whole host of other mobile workspace solutions that provide virtualization, networking and cloud services. Their goal is to give their 100 million users in 260,000 organizations across the globe “new ways to work better with seamless and secure access to the apps, files and services they need on any device, wherever they go.”
Citrix is a company that has been imagining and innovating for over 25 years, and over that time, has seen a complete transformation in their market – virtual solutions and cloud services didn’t even exist when they were founded. Now it’s the backbone of their business. Their corporate video proudly states that the only constant in this world is change, and that they strive to embrace the “yet to be discovered.”
Having worked with them quite a bit over the past few years, we have seen first-hand how Citrix has demonstrated their ability to embrace change.
Back in 2011, it became clear to Citrix that they had a data problem, and that they would have to make some changes to stay ahead in this hyper competitive market. Sales & Marketing had identified data as their #1 concern – their data was incomplete, inaccurate, and duplicated in their CRM system. And with so many different applications in the organization, it was quite difficult to know which application or data source had the most accurate and up-to-date information. They realized they needed a single source of the truth – one system of reference where all of their global data management practices could be centralized and consistent.
The marketing team realized that they needed to take control of the solution to their data concerns, as their success truly depended upon it. They brought together their IT department and their systems integration partner, Cognizant to determine a course of action. Together they forged an overall data governance strategy which would empower the marketing team to manage data centrally – to be responsible for their own success.
As a key element of that data governance / management strategy, they determined that they needed a Master Data Management (MDM) solution to serve as their Single Trusted Source of Customer & Prospect Data. They did a great deal of research into industry best practices and technology solutions, and decided to select Informatica as their MDM partner. As you can see, Citrix’s environment is not unlike most marketing organizations. The difference is that they are now able to capture and distribute better customer and prospect data to and from these systems to achieve even better results. They are leveraging internal data sources and systems like CRM (Salesforce) and marketing automation (Marketo). Their systems live all over the enterprise, both on premises and in the cloud. And they leverage analytical tools to analyze and dashboard their results.
Citrix strategized and implemented their Single Trusted Source of Customer & Prospect solution in a phased approach throughout 2013 and 2014, and we believe that what they’ve been able to accomplish during that short period of time has been nothing short of phenomenal. Here are the higlights:
- Used Informatica MDM to provide clean, consistent and connected channel partner, customer and prospect data and the relationships between them for use in operational applications (SFDC, BI Reporting and Predictive Analytics)
- Recognized 20% increase in lead-to-opportunity conversion rates
- Realized 20% increase in marketing team’s operational efficiency
- Achieved 50% increase in quality of data at the point of entry, and a 50% reduction in the rate of junk and duplicate data for prospects, existing accounts and contact
- Delivered a better channel partner and customer experience by renewing all of a customers’ user licenses across product lines at one time and making it easy to identify whitespace opportunities to up-sell more user licenses
That is huge! Can you imagine the impact on your own marketing organization of a 20% increase in lead-to-opportunity conversion? Can you imagine the impact of spending 20% less time questioning and manually massaging data to get the information you need? That’s game changing!
Because Citrix now has great data and great resulting insight, they have been able to take the next step and embark on new fortune teller-like marketing strategies. As Citrix’s Dagmar Garcia discussed during a recent webinar, “We monitor implicit and explicit behavior of transactional leads and accounts, and then we leverage these insights and previous behaviors to offer net new offers and campaigns to our customers and prospects… And it’s all based on the quality of data we have within our database.”
I encourage you to take a few minutes to listen to Dagmar discuss Citrix’s project on a recent webinar. In the webinar, she dives deeper into their project, the project scope and timeline, and to what she means by “fortune telling abilities”. Also, take a look at the customer story section of the Informatica.com website for the PDF case study. And, if you’re in the mood to learn more, you can download a complimentary copy of the 2014 Gartner Magic Quadrant for MDM of Customer Data Solutions.
Hat’s off to you Citrix, and we look forward to working with you to continue to change the game even more in the coming months and years!
With the increasing importance of enterprise analytics, the question becomes who should own the analytics and data agenda. This question really matters today because, according to Thomas Davenport, “business processes are among the last remaining points of differentiation.” For this reason, Davenport even suggests that businesses that create a sustainable right to win use analytics to “wring every last drop of value from their processes”.
The CFO is the logical choice?
In talking with CIOs about both enterprise analytics and data, they are clear that they do not want to become their company’s data steward. They insist instead that they want to be an enabler of the analytics and data function. So what business function then should own enterprise analytics and data? Last week an interesting answer came from a CFO Magazine Article by Frank Friedman. Frank contends that CFOs are “the logical choice to own analytics and put them to work to serve the organization’s needs”.
To justify his position, Frank made the following claims:
- CFOs own most of the unprecedented quantities of data that businesses create from supply chains, product processes, and customer interactions
- Many CFOs already use analytics to address their organization’s strategic issues
- CFOs uniquely can act as a steward of value and an impartial guardian of truth across the organizations. This fact gives them the credibility and trust needed when analytics produce insights that effectively debunk currently accepted wisdom
Frank contends as well that owning the analytics agenda is a good thing because it allows CFOs to expand their strategic leadership role in doing the following:
- Growing top line revenue
- Strengthening their business ties
- Expanding the CFO’s influence outside the finance function.
Frank suggests as well that analytics empowers the CFO to exercise more centralized control of operational business decision making. The question is what do other CFOs think about Frank’s position?
CFOs clearly have an opinion about enterprise analytics and data
A major Retail CFO says that finance needs to own “the facts for the organization”—the metrics and KPIs. And while he honestly admits that finance organizations in the past have not used data well, he claims finance departments need to make the time to become truly data centric. He said “I do not consider myself a data expert, but finance needs to own enterprise data and the integrity of this data”. This CFO claims as well that “finance needs to use data to make sure that resources are focused on the right things; decisions are based on facts; and metrics are simple and understandable”. A Food and Beverage CFO agrees with the Retail CFO by saying that almost every piece of data is financial in one way or another. CFOs need to manage all of this data since they own operational performance for the enterprise. CFOs should own the key performance indicators of the business.
CIOs should own data, data interconnect, and system selection
A Healthcare CFO said he wants, however, the CIO to own data systems, data interconnect, and system selection. However, he believes that the finance organization is the recipient of data. “CFOs have a major stake in data. CFOs need to dig into operational data to be able to relate operations to internal accounting and to analyze things like costs versus price”. He said that “the CFOs can’t function without good operational data”.
An Accounting Firm CFO agreed with the Healthcare CFO by saying that CIOs are a means to get data. She said that CFOs need to make sense out of data in their performance management role. CFOs, therefore, are big consumers of both business intelligence and analytics. An Insurance CFO concurred by saying CIOs should own how data is delivered.
CFOs should be data validators
The Insurance CFOs said, however, CFOs need to be validators of data and reports. They should, as a result, in his opinion be very knowledgeable on BI and Analytics. In other words, CFOs need to be the Underwriters Laboratory (UL) for corporate data.
Now it is your chance
So the question is what do you believe? Does the CFO own analytics, data, and data quality as a part of their operational performance role? Or is it a group of people within the organization? Please share your opinions below.
Solution Brief: The Intelligent Data Platform
CFOs Move to Chief Profitability Officer
CFOs Discuss Their Technology Priorities
The CFO Viewpoint upon Data
How CFOs can change the conversation with their CIO?
New type of CFO represents a potent CIO ally
Competing on Analytics
The Business Case for Better Data Connectivity
California reported a total of 167 data breaches in 2013, which is up 28 percent from the 2012. Two major data breaches caused most of this uptick, including the Target attack that was reported in December 2013, and the LivingSocial attack that occurred in April 2013. This year, you can add the Home Depot data breach to that list, as well as the recent breach at the US Post Office.
So, what the heck is going on? And how does this new impact data integration? Should we be concerned, as we place more and more data on public clouds, or within big data systems?
Almost all of these breaches were made possible by traditional systems with security technology and security operations that fell far enough behind that outside attackers found a way in. You can count on many more of these attacks, as enterprises and governments don’t look at security as what it is; an ongoing activity that may require massive and systemic changes to make sure the data is properly protected.
As enterprises and government agencies stand up cloud-based systems, and new big data systems, either inside (private) or outside (public) of the enterprise, there are some emerging best practices around security that those who deploy data integration should understand. Here are a few that should be on the top of your list:
First, start with Identity and Access Management (IAM) and work your way backward. These days, most cloud and non-cloud systems are complex distributed systems. That means IAM is is clearly the best security model and best practice to follow with the emerging use of cloud computing.
The concept is simple; provide a security approach and technology that enables the right individuals to access the right resources, at the right times, for the right reasons. The concept follows the principle that everything and everyone gets an identity. This includes humans, servers, APIs, applications, data, etc.. Once that verification occurs, it’s just a matter of defining which identities can access other identities, and creating policies that define the limits of that relationship.
Second, work with your data integration provider to identify solutions that work best with their technology. Most data integration solutions address security in one way, shape, or form. Understanding those solutions is important to secure data at rest and in flight.
Finally, splurge on monitoring and governance. Many of the issues around this growing number of breaches exist with the system managers’ inability to spot and stop attacks. Creative approaches to monitoring system and network utilization, as well as data access, will allow those in IT to spot most of the attacks and correct the issues before the ‘go nuclear.’ Typically, there are an increasing number of breach attempts that lead up to the complete breach.
The issue and burden of security won’t go away. Systems will continue to move to public and private clouds, and data will continue to migrate to distributed big data types of environments. And that means the need data integration and data security will continue to explode.
If you’ve wondered why so many companies are eager to control data storage, the answer can be summed up in a simple term: data gravity. Ultimately, where data is determines where the money is. Services and applications are nothing without it.
Dave McCrory introduced his idea of Data Gravity with a blog post back in 2010. The core idea was – and is – Interesting. More recently, Data Gravity featured in this year’s EMC World keynote. But, beyond the observation that large or valuable agglomerations of data exert a pull that tends to see them grow in size or value, what is a recognition of Data Gravity actually good for?
As a concept, Data Gravity seems closely associated with current enthusiasm for Big Data. In addition, like Big Data, the term’s real-world connotations can be unhelpful almost as often as they are helpful. Big Data exhibits at least three characteristics, which are Volume, Velocity, and Variety. Various other V’s, including Value, is mentioned from time to time, but with less consistency. Yet, Big Data’s name says it’s all about size. The speed with which data must be ingested, processed, or excreted is less important. The complexity and diversity of the data doesn’t matter either.
On its own, the size of a data set is unimportant. Coping with lots of data certainly raises some not-insignificant technical challenges, but the community is actually doing a good job of coming up with technically impressive solutions. The interesting aspect of a huge data set isn’t its size, but the very different modes of working that become possible when you begin to unpick the complex interrelationships between data elements.
Sometimes, Big Data is the vehicle by which enough data is gathered about enough aspects of enough things from enough places for those interrelationships to become observable against the background noise. Other times, Big Data is the background noise, and any hope of insight is drowned beneath the unending stream of petabytes.
To a degree, Data Gravity falls into the same trap. More gravity must be good, right? And more mass leads to more gravity. Mass must be connected to volume, in some vague way that was explained when I was 11, and which involves STP. Therefore, bigger data sets have more gravity. This means that bigger data sets are better data sets. That assertion is clearly nonsense, but luckily, it’s not actually what McCrory is suggesting. His arguments are more nuanced than that, and potentially far more useful.
Instinctively, I like that the equation attempts to move attention away from ‘the application’ toward the pools of data that support many, many applications at once. The data is where the potential lies. Applications are merely the means to unlock that potential in various ways. So maybe notions of Potential Energy from elsewhere in Physics need to figure here.
But I’m wary of the emphasis given to real numbers that are simply the underlying technology’s vital statistics; network latency, bandwidth, request sizes, numbers of requests, and the rest. I realize that these are the measurable things that we have, but feel that more abstract notions of value need to figure just as prominently.
So I’m left reaffirming my original impression that Data Gravity is “interesting”. It’s also intriguing, and I keep feeling that it should be insightful. I’m just not — yet — sure exactly how. Is a resource with a Data Gravity of 6 twice as good as a resource with a Data Gravity of 3? Does a data set with a Data Gravity of 15 require three times as much investment/infrastructure/love as a data set scoring a humble 5? It’s unlikely to be that simple, but I do look forward to seeing what happens as McCrory begins to work with the parts of our industry that can lend empirical credibility to his initial dabbling in mathematics.
If real numbers show the equations to stand up, all we then need to do is work out what the numbers mean. Should an awareness of Data Gravity change our behavior, should it validate what gut feel led us to do already, or is it just another ‘interesting’ and ultimately self-evident number that doesn’t take us anywhere?
I don’t know, but I will continue to explore. You can contact me on twitter @bigdatabeat
The Rising CFO is Increasingly Business Oriented
At the CFO Rising West Conference on October 30th and 31st, there were sessions on managing capital expenditures, completing an IPO, and even managing margin and cash flow. However, the keynote presenters did not spend much of time on these topics. Instead, they focused on how CFOs need to help their firms execute better. Here is a quick summary of the suggestions made from CFOs in broadcasting, consumer goods, retail, healthcare, and medical devices.
The Modern CFO is Strategic
The Broadcasting CFO started his talk by saying he was not at the conference to share why CFOs need to move from being “bean counters to strategic advisors”. He said “let’s face it the modern CFO is a strategic CFO”. Agreeing with this viewpoint, the Consumer Goods CFO said that finance organizations have a major role to play in business transformation. He said that finance after all is the place to drive corporate improvement as well as business productivity and business efficiency.
CFOs Talked About Their Business’ Issues
The Retailer CFO talked like he was a marketing person. He said retail today is all about driving a multichannel customer experience. To do this, finance increasingly needs to provide real business value. He said, therefore, that data is critical to the retailer’s ability to serve customers better. He claimed that customers are changing how they buy, what they want to buy, and when they want to buy. We are being disrupted and it is better to understand and respond to these trends. We are trying, therefore, to build a better model of ecommerce.
Meanwhile, the Medical Devices CFO said that as a supplier to medical device vendors “what we do is compete with our customers engineering staffs”. And the Consumer Goods CFO added the importance of finance driving sustained business transformation.
CFOs Want To Improve Their Business’ Ability To Execute
The Medical Devices CFO said CFOs need to look for “earlier execution points”. They need to look for the drivers of behavior change. As a key element of this, he suggested that CFOs need to develop “early warning indicators”. He said CFOs need to actively look at the ability to achieve objectives. With sales, we need to ask what deals do we have in the pipe? At what size are these deals? And at what success rate will these deals be closed? Only with this information, can the CFO derive an expected company growth rate. He then asked CFOs in the room to identify themselves. With their hands in the air, he asked them are they helping to create a company that executes or not. He laid down the gauntlet for the CFOs in the room by then asserting that if you are not creating a company that executes then are going to be looking at cutting costs sooner rather than later.
The retailer CFO agreed with this CFO. He said today we need to focus on how to win a market. We need to be asking business questions including:
- How should we deploy resources to deliver against our firm’s value proposition?
- How do we know when we win?
CFOs Claim Ownership For Enterprise Performance Measurement
The Retail CFO said that finance needs to own “the facts for the organization”—the metrics and KPIs. This is how he claims CFOs will earn their seat at the CEOs table. He said in the past the CFO have tended to be stoic, but this now needs to change.
The Medical Devices CFO agreed and said enterprises shouldn’t be tracking 150 things—they need to pare it down to 12-15 things. They need to answer with what you measure—who, what, and when. He said in an execution culture people need to know the targets. They need measurable goals. And he asserted that business metrics are needed over financial metrics. The Consumer Goods CFO agreed by saying financial measures alone would find that “a house is on fire after half the house had already burned down”. The Healthcare CFO picked up on this idea and talked about the importance of finance driving value scorecards and monthly benchmarks of performance improvement. The broadcaster CFO went further and suggested the CFO’s role is one of a value optimizer.
CFOs Own The Data and Drive a Fact-based, Strategic Company Culture
The Retail CFOs discussed the need to drive a culture of insight. This means that data absolutely matters to the CFO. Now, he honestly admits that finance organizations have not used data well enough but he claims finance needs to make the time to truly become data centric. He said I do not consider myself a data expert, but finance needs to own “enterprise data and the integrity of this data”. He said as well that finance needs to ensure there are no data silos. He summarized by saying finance needs to use data to make sure that resources are focused on the right things; decisions are based on facts; and metrics are simple and understandable. “In finance, we need use data to increasingly drive business outcomes”.
CFOs Need to Drive a Culture That Executes for Today and the Future
Honestly, I never thought that I would hear this from a group of CFOs. The Retail CFO said we need to ensure that the big ideas do not get lost. We need to speed-up the prosecuting of business activities. We need to drive more exponential things (this means we need to position our assets and resources) and we need, at the same time, to drive the linear things which can drive a 1% improvement in execution or a 1% reduction in cost. Meanwhile, our Medical Device CFO discussed the present value, for example, of a liability for rework, lawsuits, and warranty costs. He said that finance leaders need to ensure things are done right today so the business doesn’t have problems a year from today. “If you give doing it right the first time a priority, you can reduce warranty reserve and this can directly impact corporate operating income”.
CFOs need to lead on ethics and compliance
The Medical Devices CFO said that CFOs, also, need to have high ethics and drive compliance. The Retail CFO discussed how finance needs to make the business transparent. Finance needs to be transparent about what is working and what is not working. The role of the CFO, at the same time, needs to ensure the integrity of the organization. The Broadcaster CFO asserted the same thing by saying that CFOs need to take a stakeholder approach to how they do business.
In whole, CFOs at CFO Rising are showing the way forward for the modern CFOs. This CFO is all about the data to drive present and future performance, ethics and compliance, and business transparency. This is a big change from the historical controller approach and mentality. I once asked a boss about what I needed to be promoted to a Vice President; my boss said that I needed to move from a technical specialist to a business person. Today’s CFOs clearly show that they are a business person first.
Solution Brief: The Intelligent Data Platform
CFOs Move to Chief Profitability Officer
CFOs Discuss Their Technology Priorities
The CFO Viewpoint upon Data
How CFOs can change the conversation with their CIO?
New type of CFO represents a potent CIO ally
Competing on Analytics
The Business Case for Better Data Connectivity
This article was originally published on www.federaltimes.com.
November – that time of the year. This year, November 1 was the start of Election Day weekend and the associated endless barrage of political ads. It also marked the end of Daylight Savings Time. But, perhaps more prominently, it marked the beginning of the holiday shopping season. Winter holiday decorations erupted in stores even before Halloween decorations were taken down. There were commercials and ads, free shipping on this, sales on that, singing, and even the first appearance of Santa Claus.
However, it’s not all joy and jingle bells. The kickoff to this holiday shopping season may also remind many of the countless credit card breaches at retailers that plagued last year’s shopping season and beyond. The breaches at Target, where almost 100 million credit cards were compromised, Neiman Marcus, Home Depot and Michael’s exemplify the urgent need for retailers to aggressively protect customer information.
In addition to the holiday shopping season, November also marks the next round of open enrollment for the ACA healthcare exchanges. Therefore, to avoid falling victim to the next data breach, government organizations as much as retailers, need to have data security top of mind.
According to the New York Times (Sept. 4, 2014), “for months, cyber security professionals have been warning that the healthcare site was a ripe target for hackers eager to gain access to personal data that could be sold on the black market. A week before federal officials discovered the breach at HealthCare.gov, a hospital operator in Tennessee said that Chinese hackers had stolen personal data for 4.5 million patients.”
Acknowledging the inevitability of further attacks, companies and organizations are taking action. For example, the National Retail Federation created the NRF IT Council, which is made up of 130 technology-security experts focused on safeguarding personal and company data.
Is government doing enough to protect personal, financial and health data in light of these increasing and persistent threats? The quick answer: no. The federal government as a whole is not meeting the data privacy and security challenge. Reports of cyber attacks and breaches are becoming commonplace, and warnings of new privacy concerns in many federal agencies and programs are being discussed in Congress, Inspector General reports and the media. According to a recent Government Accountability Office report, 18 out of 24 major federal agencies in the United States reported inadequate information security controls. Further, FISMA and HIPAA are falling short and antiquated security protocols, such as encryption, are also not keeping up with the sophistication of attacks. Government must follow the lead of industry and look for new and advanced data protection technologies, such as dynamic data masking and continuous data monitoring to prevent and thwart potential attacks.
These five principles can be implemented by any agency to curb the likelihood of a breach:
1. Expand the appointment and authority of CSOs and CISOs at the agency level.
3. Protect all environments from development to production, including backups and archives.
4. Data and application security must be prioritized at the same level as network and perimeter security.
5. Data security should follow data through downstream systems and reporting.
So, as the season of voting, rollbacks, on-line shopping events, free shipping, Black Friday, Cyber Monday and healthcare enrollment begins, so does the time for protecting personal identifiable information, financial information, credit cards and health information. Individuals, retailers, industry and government need to think about data first and stay vigilant and focused.
Account Executives update opportunities in Salesforce all the time. As opportunities close, payment information is received in the financial system. Normally, they spend hours trying to combine the data, to prepare it for differential analysis. Often, there is a prolonged, back-and-forth dialogue with IT. This takes time and effort, and can delay the sales process.
What if you could spend less time preparing your Salesforce data and more time analyzing it?
Informatica has a vision to solve this challenge by providing self-service data to non-technical users. Earlier this year, we announced our Intelligent Data Platform. One of the key projects in the IDP, code-named “Springbok“, uses an excel-like search interface to let business users find and shape the data they need.
Informatica’s Project Springbok is a faster, better and, most importantly, easier way to intelligently work with data for any purpose. Springbok guides non-technical users through a data preparation process in a self-service manner. It makes intelligent recommendations and suggestions, based on the specific data they’re using.
To see this in action, we welcome you to join us as we partner with Halak Consulting, LLC for an informative webinar. The webinar will take place on November 18th at 10am PST. You will learn from the Springbok VP of Strategy and from an experienced Springbok user about how Springbok can benefit you.
So REGISTER for the webinar today!
Well, it’s been a little over a week since the Strata conference so I thought I should give some perspective on what I learned. I think it was summed up at my first meeting, on the first morning of the conference. The meeting was with a financial services company who has significance experience with Hadoop. The first words out of their mouths were, “Hadoop is hard.”
Later in the conference, after a Western Union representative spoke about their Hadoop deployment, they were mobbed by end user questions and comments. The audience was thrilled to hear about an actual operational deployment: Not just a sandbox deployment, but an actual operational Hadoop deployment from a company that is over 160 years old.
The market is crossing the chasm from early adopters who love to hand code (and the macho culture of proving they can do the hard stuff) to more mainstream companies that want to use technology to solve real problems. These mainstream companies aren’t afraid to admit that it is still hard. For the early adopters, nothing is ever hard. They love hard. But the mainstream market doesn’t view it that way. They don’t want to mess around in the bowels of enabling technology. They want to use the technology to solve real problems. The comment from the financial services company represents the perspective of the vast majority of organizations. It is a sign Hadoop is hitting the mainstream market.
More proof we have moved to a new phase? Cloudera announced they were going from shipping six versions a year down to just three. I have been saying for awhile that we will know that Hadoop is real when the distribution vendors stop shipping every 2 months and go to a more typical enterprise software release schedule. It isn’t that Hadoop engineering efforts have slowed down. It is still evolving very rapidly. It is just that real customers are telling the Hadoop suppliers that they won’t upgrade as fast because they have real business projects running and they can’t do it. So for those of you who are disappointed by the “slow down,” don’t be. To me, this is news that Hadoop is reaching critical mass.
Technology is closing the gap to allow organizations to use Hadoop as a platform without having to actually have an army of Hadoop experts. That is what Informatica does for data parsing, data integration, data quality and data lineage (recent product announcement). In fact, the number one demo at the Informatica booth at Strata was the demonstration of “end to end” data lineage for data, going from the original source all the way to how it was loaded and then transformed within Hadoop. This is purely an enterprise-class capability that becomes more interesting and important when you actually go into true production.
Informatica’s goal is to hide the complexity of Hadoop so companies can get on with the work of using the platform with the skills they already have in house. And from what I saw from all of the start-up companies that were doing similar things for data exploration and analytics and all the talk around the need for governance, we are finally hitting the early majority of the market. So, for those of you who still drop down to the underlying UNIX OS that powers a Mac, the rest of us will keep using the GUI. To the extent that there are “fit for purpose” GUIs on top of Hadoop, the technology will get used by a much larger market.
So congratulations Hadoop, you have officially crossed the chasm!
P.S. See me on theCUBE talking about a similar topic at: youtu.be/oC0_5u_0h2Q
Recent published research shows that “faster” is better than “slower.” The point, ladies and gentlemen, is that speed, for lack of a better word, is good. But granted, you won’t always have the need for speed. My Lamborghini is handy when I need to elude the Bakersfield fuzz on I-5, but it does nothing for my Costco trips. There, I go with capacity and haul home my 30-gallon tubs of ketchup with my Ford F150. (Note: this is a fictitious example, I don’t actually own an F150.)
But if speed is critical, like in your data streaming application, then Informatica Vibe Data Stream and the MapR Distribution including Apache™ Hadoop® are the technologies to use together. But since Vibe Data Stream works with any Hadoop distribution, my discussion here is more broadly applicable. I first discussed this topic earlier this year during my presentation at Informatica World 2014. In that talk, I also briefly described architectures that include streaming components, like the Lambda Architecture and enterprise data hubs. I recommend that any enterprise architect should become familiar with these high-level architectures.
Data streaming deals with a continuous flow of data, often at a fast rate. As you might’ve suspected by now, Vibe Data Stream, based on the Informatica Ultra Messaging technology, is great for that. With its roots in high speed trading in capital markets, Ultra Messaging quickly and reliably gets high value data from point A to point B. Vibe Data Stream adds management features to make it consumable by the rest of us, beyond stock trading. Not surprisingly, Vibe Data Stream can be used anywhere you need to quickly and reliably deliver data (just don’t use it for sharing your cat photos, please), and that’s what I discussed at Informatica World. Let me discuss two examples I gave.
Large Query Support. Let’s first look at “large queries.” I don’t mean the stuff you type on search engines, which are typically no more than 20 characters. I’m referring to an environment where the query is a huge block of data. For example, what if I have an image of an unidentified face, and I want to send it to a remote facial recognition service and immediately get the identity? The image would be the query, the facial recognition system could be run on Hadoop for fast divide-and-conquer processing, and the result would be the person’s name. There are many similar use cases that could leverage a high speed, reliable data delivery system along with a fast processing platform, to get immediate answers to a data-heavy question.
Data Warehouse Onload. For another example, we turn to our old friend the data warehouse. If you’ve been following all the industry talk about data warehouse optimization, you know pumping high speed data directly into your data warehouse is not an efficient use of your high value system. So instead, pipe your fast data streams into Hadoop, run some complex aggregations, then load that processed data into your warehouse. And you might consider freeing up large processing jobs from your data warehouse onto Hadoop. As you process and aggregate that data, you create a data flow cycle where you return enriched data back to the warehouse. This gives your end users efficient analysis on comprehensive data sets.
Hopefully this stirs up ideas on how you might deploy high speed streaming in your enterprise architecture. Expect to see many new stories of interesting streaming applications in the coming months and years, especially with the anticipated proliferation of internet-of-things and sensor data.
To learn more about Vibe Data Stream you can find it on the Informatica Marketplace .