Category Archives: Big Data
There are organizations truly reaping the rewards of Big Data, and then there are those who are just trying to catch up. What are the Big Data “leaders” doing that the “laggards” are missing? (more…)
As reported here, “Every 30 to 40 percent increase in data volume usually forces an organization to re-look at infrastructure,” commented Venkat Lakshminarasimha, Global Big Data Integration Specialist, Informatica. He was addressing a gathering of information management professionals from the public sector in a workshop conducted by Informatica on maximizing return on data, as part of the activities surrounding the FutureGov Singapore Forum 2013.”
800 exabytes of potentially useful data were collected in the US in 2009, and 35 zettabytes are expected by 2020. “From a velocity perspective, some organizations have 50GB of real-time data streaming in per second at peak times — this means that you need to look at scalability of infrastructure, and Big Data solutions,” said Venkat.
The fact of the matter is that the rise of big data is only going to make the massive growth of data even more massive. At the same time, we need to figure out ways to get the data from point A (or points A), to point B (or points B). Moreover, do so in a manner that’s both scalable and resilient.
The core issue is that most enterprises are not at all ready for this kind of growth in data. While many point to lack of scalable storage, the reality is that the amount of data required to move between the data stores will quickly saturate the current approaches to data integration, as well as the enabling technology.
Considering what’s been stated above, what is an enterprise supposed to do to prepare for what’s sure to be called the data avalanche of 2014? It starts with an approach, and the right technology.
The real challenge is to create the right approach to data integration, looking at the changing requirements around the use of big data. This includes the ability to deal with both structured and unstructured data, the ability to integrate data leveraging distributed processing, and, most importantly, the ability to scale to an ever-increasing load.
The approach is requirements-driven. Those charged with managing data and data integration should have a complete understanding of where the growth in data will exist. Thus, using this as a jumping-off-point, align these requirements with a data storage and data integration architecture.
However, that’s only part of the story. You need to select a data integration solution that can provide the core integration services, such as transformation, translation, interface mediation, security, governance, etc.. The toughest part is to select and deploy technology that can provide the required scalability. This means providing data integration at speeds that all core business processes are able to access all of the information they need to see, when they need to see it, and at an increasing volume.
The truth of the matter is that few out there understand what’s coming. While data is expected to grow, I don’t think we understand how much. Moreover, we don’t understand how critical data integration is to the strategy.
Marge Breya, Informatica’s CMO in conversation with Rick Smolan, Former Time, Life, and National Geographic photographer and the CEO of Against All Odds Productions
I’m Marge Breya the EVP and chief marketing officer at Informatica and I couldn’t be more delighted to have talked with Rick Smolan about big data and the implications and benefits it can have on the human race. Rick as many of you know is the creator of the human face of big data project and he is going to be one of the featured speakers at Informatica World 2013. Follow this three part series to read the entire transcript of our conversation.
MB: Hi Rick welcome to the podcast.
RS: Pleased to be here, thank you.
MB: We have a few minutes of your time and we thought we wouldd just focus on a couple key points that you may be covering and our audience might be interested in in terms of big data. Let’s start at the beginning. What brought your interest to big data?
RS: I’ve been working on projects for many years with groups of journalists and every 18 months I gather the tribe to do a deep dive on emerging topics such as the Internet. The first year it was touching people’s lives. Another project focused on the global water crisis. Another on the effect of the microprocessor in the course if a typical day. Last year I was at the TED conference and I ran into Marissa Mayer who was still at Google and is now CEO of Yahoo and she asked me what my next project was. I told her I was struggling with what interesting emerging topics we should focus on and she said you guys should look at big data.
I asked her to explain that phase and her analogy piqued my interest. She said big data was like watching the planet grow a nervous system. She explained that through our smart phones, our Google searches, credit card and ATM transactions, for the first time the industry the human race has the ability to collect that data analyze it and visualize it and respond to it while things are still happening, almost in real-time.
MB: As you began to get started and look into this I’m sure you realized it was more than what you originally thought, for example from everything that’s happening with our citizens to what’s happening in terms of sensors on cars, machines, etc. Any big observations that you came to?
RS: We had almost two hundred people working on the Human Face of Big Data. And as I talked to people so much of what I’m sharing reminded me of what I recall hearing in the early days of the Internet in 1993. People were talking about cyberspace and the World Wide Web. People like Nicholas Negroponte were waxing eloquent that the Internet was going to change every aspect of life on earth and other people were saying that it’s just a better way to look at pornography. In many respects Nicholas Negroponte was prophetic in terms of his envisioning of how this technology connecting people was never possible before was going to really alter life on earth.
When I started looking at big data, I was hearing people on the one side saying big data equals Big Brother, just another way to oppress people, to track them and sell them stuff they don’t want. And then on the other hand I spoke with Marissa who saw this as a great benefit to humanity.
Now that I’ve spent a year and a half working with this wonderful team of journalists, writers, photographers, illustrators and researchers I’m much more on the glass half empty side. I think there’s certainly lots of things that we have to be careful of like privacy in these oceans of data, but I think the up side offers so much to our species. I deeply believe that big data is going to have 1,000 times more impact than the Internet on our lives. You can see it happening in every aspect of human behavior. Right now we’re in the caveman era of big data and I don’t think it’s going to be called big data for much longer. I think it’s going to be segmented and fragmented and called something else. I think we’re starting to get glimpses of how it’s starting to change our way of responding to how we use our resources much more wisely. My kids are 10 and 13 and my mother just turned 90 and I want to know how all this is going to affect their lives.
MB: This is fascinating, so as you think about Informatica World, this year will have nearly 2,000 people at the conference all of which are committed to sorting out this world of big data and frankly we use the phrase ‘how to help people achieve or reach their information potential.’ It’s not just from a people standpoint it could be from a business decision or machine standpoint. I wonder what your thoughts are in terms of any advice that you have for professionals in this industry who are trying to harness this change and help other people achieve that potential.
RS: Well you can almost choose any area of human endeavor, but one that I think we found lots of great stories in that people respond to is in the healthcare arena. I was at the TedMed conference last year and Francis Collins, head of the National Institute for Health, said something that was quite remarkable. He told the story of how when Steve Jobs first got sick a few years ago it cost $100,000 to sequence his DNA to try to understand what he had and what might be the best course of treatment for him.Now today four or five years later the cost of sequencing someone’s DNA has dropped to about $3,000. Francis Collins believes that in another five years it might cost as little as $40 at Walgreens – like getting a flu shot. And before your doctor prescribes anything to you – even an aspirin – your DNA will have to be sequenced, its personalized medicine where the medicine has been tailored specifically for you. And the reason that this is so important is that large pharmaceutical companies for years have been spending hundreds of millions of dollars working on addressing major illnesses that people have. But often when they do clinical trials on humans and they find that even though 99% of the people taking the drug will be helped, 1% might be adversely affected or killed by it. And therefore they can never release the product.
Collins’ point was that if each of our individual DNAs were sequenced a doctor would know who would benefit from one of these drugs and who would be harmed. If you’ve ever known anyone who’s gone through cancer or any other illness its often a process of trial and error. The doctors try lots of different things, some of them incredibly unpleasant and in some cases making someone’s life miserable and then it doesn’t really help. Big data has the potential of ushering in an era of personalized medicine to help doctors go right to the correct treatment instead of all this trial and error. Again it’s the idea of using a bullet instead of a shotgun to treat the problem. And you see this in industry after industry, healthcare is obviously one that affects us all very deeply.
Make sure to check back next week to read part two of three of this transcript.
Many software vendors, analyst and journalist are overusing the term “Data Governance” in today’s complex business and IT environments. However, it has become one of the primary goals and drivers for data-related IT projects whilst at the same time being one of the most difficult to define, measure and quantify. What real meaning can we give to the concept of Data Governance? What are its importance, impact and meaning for the enterprise?
To try returning some meaning and context to Data Governance, let’s go back to the semantics through an analogy understandable by everyone, and insightful in the smallest detail.
Welcome to Data Land… If data are its citizens, the governance of such a country would aim at ensuring that these data co-exist in a peaceful way, stayed healthy, enriched themselves, were not living on top of each other, did not destroy each other in the case of conflict, and most importantly work together every year at improving the GDP of Data Land. This means creating value by the use and action of everyone. Of course, bioethics laws would prevent the cloning or duplication of its inhabitants… Data governance would then define itself as a framework which intends to ensure the efficient management of the data in the enterprise. Putting data under governance prevents its chaotic generation and use.
In Data Land, governance implies:
A territory to govern
The scope of influence of governance must be clearly defined, the border of its country clearly delimited.
What type of data are we talking about? The question of the perimeter is not a trivial one, and its impact on the projects and tools to be implemented is big. Master data, so critical that it commands a particular investment for its management, forms a first consistent set and its governance leads to MDM projects.
What about transactions data or social interactions data? More and more popular, full of intelligence for the enterprise, they do not fit into the normal referential bucket, but can benefit from data quality initiatives, with their own specific concerns (volume and volatility, for instance). Also, Data Land is not free from globalization. Though it is important to establish borders for security reasons, “common market” initiatives with neighboring countries (partners, data vendors, data pools) are increasing and aim at surpassing the scope of the traditional enterprise, in favor of the “exterprise”.
Any Data Land (for instance the master data one) must have a leader, a sponsor who conveys a vision and ensures the alignment of all the members of his government who, like in real life, may be tempted by the will of handling its governed data in an autonomous or selfish way. This executive sponsorship is an important success factor of projects related to data governance. Its absence, source of the famous government hitches, leads to systematic failure. The governor is often an executive (CIO, COO, CEO), with enough power and respected enough to impose a choice in case of blocking.
A government and its supervisory body:
Every country needs a team who define the detail in terms of strategy and laws to put in place to ensure it is functioning correctly. Data Land requires nothing less. The organizational changes and the setup of dedicated enterprise-wide teams are among the most advertised collateral for data governance projects. The Data Governance Council is tasked with defining the rules governing the data i.e. the law. The Data Stewards ensure compliance with the law and, if not enforced, will take action to ensure compliance. In order for the initiative to be a success and just like national governments, they should theoretically be independent of particular interests and business lobbies. They do however need to have an intimate knowledge of the data and its use in the enterprise processes. This is why they often come from the “civil society”, meaning they were members of the business teams before, with a mission of surpassing their previous assignment for the greater good.
Laws and institutional processes
The first objective of the abovementioned government is to establish the governance scheme, the set of rules that govern the best practices around creating, using, modifying and removing data. These laws are of multiple types. The ones that establish property titles (data owners), easement rights (data consumers) and security rules (data custodians). There are also the ones that define the boundaries, restrictions or more positively the data standards. These rules will be enforced and data controlled by the data stewards. As in the civil society, an efficient management of the data involves orderly empowerment of the actors (prevention) as well as systematic control (repression). The enforcement of the law and its corrective aspect may be supported by processes orchestrating multiple users, according to the scheme defined by the Governance Council.
So what about IT tools? They are the infrastructures of Data Land. Vehicles, road signs, and even if it is less fun, speed cameras. They are here to facilitate the application of the governance scheme, to give tools to the government, to enforce order and the respect of the law. In any circumstances, they can help with the definition of the scheme. Data governance is an initiative taken by the enterprise for the enterprise, independently of any IT solution which will have to adapt (if sufficiently flexible).
As with any country-based government, data governance has an ambition to manage the enterprise data landscape with perfect efficiency.
Ambitious ? Surely.
Critical ? Definitively.
Let’s then ensure that the way to this ideal will deliver value by itself. This is what the relevance of IT tools should be judged against.
Special thanks to David Jordan for translating the original article from French to English.
Our Big Clean-up of Our Big Data
Informatica, the company for which I work, deals in big data challenges every day. It’s what we DO, help customers leverage their data into actionable business insights. When I took the helm as V.P. Global Talent Acquisition I was surprised to learn that the data within the talent acquisition function was not up to the standards Informatica lives by. Clearly, talent acquisition was not seeing the huge competitive advantage that data could bring – at least not the way sales, marketing and research were viewing it. And that, to me, seemed like a major problem, but also a terrific opportunity! This is the story of how Informatica Talent Acquisition became data-centric and used that centricity to our advantage to fix the problem.
Go to the Source
No matter how big or small your company, the data related to talent comes from varied and diverse roles within the talent acquisition function. The role may be named Researcher, Sourcer, Talent Lead Generator or even Recruiter. Putting the name aside, the data comes from the first person to connect with a potential candidate. Usually that person, or in Informatica’s case, that team, is the one who finds the data and captures it. Because talent acquisition in the past was largely about making a single hire, our data was captured haphazardly and stored with….let’s say, less than best practices. In addition, we didn’t know big data was about to hit us square in the face with more social data points than yesteryear’s Talent Sourcer could believe. I went to our sourcing team as well as our research department to begin assessing how we were acquiring, storing and accessing our data.
Data is at the heart of so many recruiting conversations today but it’s not just about the data, it’s the access to the right data at the right time by the right person, which is paramount to making good business or hiring decisions. This led me to Dave Mendoza, a Talent Acquisition Strategy consultant, who had developed a process called Talent Mapping which we applied to help us identity, retrieve and categorize our talent data. From that point he was able to create our Talent Knowledge Library. This library allows us to store, access and finally develop a talent data methodology aptly named, Future-casting. This methodology defines a process wherein Informatica can use its talent acquisition data for competitive intelligence, workforce planning and candidate nurturing.
The most valuable part of our transformation process was the implementation of our Talent Knowledge Library. It was apparent that the weakest point with this new solution was not the capturing or categorizing of our data, it was that we had no central repository that would allow for unstructured data to be housed, amended and retrieved by multiple Talent Sourcers. To solve this issue we implemented a Candidate Relationship Management (CRM) application named Avature. This tool allowed us to build a talent library – a single source repository of our global talent pools, which could then be accessed by all the roles within the talent acquisition organization. Having a centralized database has improved our hiring efficiencies such as, decreasing the time and cost to fill requisitions.
Because Informatica is a global company, it doesn’t make sense for us to house all of our data in a proprietary system. While the new social sourcing platforms are fast and powerful, the data doesn’t belong to the company once entered and that didn’t work for us, especially given we had teams all over the world working with different tools. With a practical approach to data capture and retrieval, we now have a central databank of very specific competitive intelligence that has the ability to withstand time because the tool can capture social and mobile data and thus is built for future proofing. Because the data is ours, we retain our competitive advantage, even during talent acquisition transition periods.
One truth became very clear as we took on this data-centric approach to talent acquisition – if you don’t set standards for processes and protocols around your data, you may as well use a bucket as no repository will be of much use without accurate and useable data that can be accessed consistently by everyone. Being able to search the data according to company wide standards was both obvious and mind-blowing. These four standards are what we put into place when creating our talent library:
1) Data must be usable and searchable,
2) Extraction and leverage of data must be easy,
3) Data can be migrated from multiple lead generation platforms with a “single source of truism”,
4) Data can be categorized, tagged and mapped to talent for ease of segmentation.
The goal of these standards is to match the data to each of our primary hiring verticals and to multiple social channels so that we can both attract and identify talent in a self-sustaining manner.
In today’s globalized world, people frequently change their physical address, their employer and their email addresses, but they rarely change their Twitter handle or Facebook name. This is why ‘people’ data quickly turns outdated and social data is the new commodity within the enterprise. People who use social networks are leaving a living, always-fresh data shadow making it easy for us to capture their most relevant contact data. It sounds a bit like we’ve become on-line stalkers, but marketers and business development professionals have been doing it for years. And just as we move toward predictive modeling on these pieces of personal data, so too are our competitors for talent. By configuring our CRM systems to accurately capture and search these social data points our sourcing team is more efficient and effective. It has reduced duplicate entries which caused candidate fatigue in our recruiting processes.
I think Dave says it perfectly in his recent white paper “Future-casting: How the rise of Big Social Data API is set to Transform the Business of Recruiting”: “Future-casting has the ability to review the career progression of both internal employees and external candidates. This stems directly from the ability to track candidates more accurately via their social data. Now, more than ever before, corporations and the talent acquisition professionals within them can keep fresh data on every candidate in their system, with a few simple tweaks. This new philosophy of future-casting puts dynamic data into the hands of the organization, reducing dependency on job boards and even social platforms so they can create their own convergent model that combines all three.”
Results Will Come
At Informatica we saw results very quickly because we had an expert dedicated to addressing the challenges, and we were committed to making our data work for us. But if you don’t have a global sourcing team or a full time consultant, you can still begin at the top of this list. Talk to your CRM or ATS vendors about how you can tweak your tracking systems. Assess and map your current talent process. Begin using products that allow you to own your OWN data. Finally, set standards such as the ones I mentioned previously and make sure everyone adheres to them.
This is original content published to ERE.net on May 8, 2013, and written by Brad Cook, Vice President, Global Talent Acquisition at Informatica.
The hype around big data is certainly top of mind with executives at most companies today but what I am really seeing are companies finally making the connection between innovation and data. Data as a corporate asset is now getting the respect it deserves in terms of a business strategy to introduce new innovative products and services and improve business operations. The most advanced companies have C-level executives responsible for delivering top and bottom line results by managing their data assets to their maximum potential. The Chief Data Officer and Chief Analytics Officer own this responsibility and report directly to the CEO. (more…)
In my recent blog posts, we have looked at ways that master data management can become an integral component to the enterprise architecture, and I would be remiss if I did not look at how MDM dovetails with an emerging data management imperative: big data and big data analytics. Fortunately, the value of identity resolution and MDM has the potential for both contributing to performance improvement while enabling efficient entity extraction and recognition. (more…)
We’ve been spending a lot of time here at Informatica preparing for Informatica World. That means taking a big step back to take the broader view of all the change happening in the world of information management and data integration today. New data sources and new data technologies are emerging almost daily, and the pace is only accelerating. Our mission is to help our customers and our market not only cope with all this change, but to harness it for competitive advantage.
But even as we’re putting together the latest take on the big picture, we’re also zooming in on the technology “secret sauce” which makes it possible to manage all this change. Informatica has the “secret sauce”– it’s what makes our architecture unique, and it’s what allows us to deliver the most value to our customers.
I’m not going to tell you what the “secret sauce” is now– you have to come to Informatica World to find out. Our executives including Sohaib Abbasi, Ivan Chong and James Markarian will be laying out the big picture, as well as revealing the “secret sauce.” And I’ll be diving in to more details in my Informatica Platform overview breakout session.
I hope to see you in Vegas next month. (by the way, the special hotel rate ends this Friday May 3rd, so register today!)
According to Doug Henschen, Executive Editor at InformationWeek, “Despite the weak economy and zero growth in many IT salary categories, business intelligence (BI), analytics, information-integration and data warehousing professionals are seeing a slow-but-steady rise in income.” (more…)
Evolving from Chaos to Competitiveness: The Emerging Architecture of Next-Generation Data Integration
To compete on Big Data and analytics, today’s always-on enterprise needs a well-designed evolving high-level architecture that continuously provides trusted data originating from a vast and fast-changing range of sources, often with different formats, and within different contexts.
To meet this challenge, the art and science of data integration is evolving, from duplicative, project-based silos that have consumed organizations’ time and resources to an architectural approach, in which data integration is based on sustainable and repeatable data integration practices – delivering data integration automatically anytime the business requires it. (more…)