Tag Archives: Data Integration
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.
Recently, I had the opportunity to talk to a number of CFOs about their technology priorities. These discussions represent an opportunity for CIOs to hear what their most critical stakeholder considers important. The CFOs did not hesitate or need to think much about this question. They said three things make their priority list. They are better financial system reliability, better application integration, and better data security and governance. The top two match well with a recent KPMG study which found the biggest improvement finance executives want to see—cited by 91% of survey respondents—is in the quality of financial and performance insight obtained from the data they produce, followed closely by the finance and accounting organization’s ability to proactively analyze that information before it is stale or out of date”
CFOs want to know that their systems work and are reliable. They want the data collected from their systems to be analyzed in a timely fashion. Importantly, CFOs say they are worried not only about the timeliness of accounting and financial data. This is because they increasingly need to manage upward with information. For this reason, they want timely, accurate information produced for financial and business decision makers. Their goal is to drive out better enterprise decision making.
In manufacturing, for example, CFOs say they want data to span from the manufacturing systems to the distribution system. They want to be able to push a button and get a report. These CFOs complain today about the need to manually massage and integrate data from system after system before they get what they and their business decision makers want and need.
CFOs really feel the pain of systems not talking to each other. CFOs know firsthand that they have “disparate systems” and that too much manual integration is going on. For them, they see firsthand the difficulties in connecting data from the frontend to backend systems. They personally feel the large number of manual steps required to pull data. They want their consolidation of account information to be less manual and to be more timely. One CFO said that “he wants the integration of the right systems to provide the right information to be done so they have the right information to manage and make decisions at the right time”.
Data Security and Governance
CFOs, at the same time, say they have become more worried about data security and governance. Even though CFOs believe that security is the job of the CIO and their CISO, they have an important role to play in data governance. CFOs say they are really worried about getting hacked. One CFO told me that he needs to know that systems are always working properly. Security of data matters today to CFOs for two reasons. First, data has a clear material impact. Just take a look at the out of pocket and revenue losses coming from the breach at Target. Second, CFOs, which were already being audited for technology and system compliance, feel that their audit firms will be obligated to extend what they were doing in security and governance and go as a part of regular compliance audits. One CFO put it this way. “This is a whole new direction for us. Target scared a lot of folks and will be to many respects a watershed event for CFOs”.
So the message here is that CFOs prioritize three technology objectives for their CIOs– better IT reliability, better application integration, and improved data security and governance. Each of these represents an opportunity to make the CFOs life easier but more important to enable them to take on a more strategic role. The CFOs, that we talked to, want to become one of the top three decision makers in the enterprise. Fixing these things for CFOs will enable CIOs to build a closer CFO and business relationships.
Solution Brief: The Intelligent Data Platform
Solution Brief: Secure at Source
Come and get it. For developers hungry to get their hands on Informatica on Hadoop, a downloadable free trial of Informatica Big Data Edition was launched today on the Informatica Marketplace. See for yourself the power of the killer app on Hadoop from the leader in data integration and quality.
Thanks to the generous help of our partners, the Informatica Big Data team has preinstalled the Big Data Edition inside the sandbox VMs of the two leading Hadoop distributions. This empowers Hadoop and Informatica developers to easily try the codeless, GUI driven Big Data Edition to build and execute ETL and data integration pipelines natively on Hadoop for Big Data analytics.
Informatica Big Data Edition is the most complete and powerful suite for Hadoop data pipelines and can increase productivity up to 5 times. Developers can leverage hundreds of out-of-the-box Informatica pre-built transforms and connectors for structured and unstructured data processing on Hadoop. With the Informatica Vibe Virtual Data Machine running directly on each node of the Hadoop cluster, the Big Data Edition can profile, parse, transform and cleanse data at any scale to prepare data for data science, business intelligence and operational analytics.
The Informatica Big Data Edition Trial Sandbox VMs will have a 60 day trial version of the Big Data Edition preinstalled inside a 1-node Hadoop cluster. The trials include sample data and mappings as well as getting started documentation and videos. It is possible to try your own data with the trials, but processing is limited to the 1-node Hadoop cluster and the machine you have it running on. Any mappings you develop in the trial can be easily moved on to a production Hadoop cluster running the Big Data Edition. The Informatica Big Data Edition also supports MapR and Pivotal Hadoop distributions, however, the trial is currently only available for Cloudera and Hortonworks.
Accelerate your ability to bring Hadoop from the sandbox into production by leveraging Informatica’s Big Data Edition. Informatica’s visual development approach means that more than one hundred thousand existing Informatica developers are now Hadoop developers without having to learn Hadoop or new hand coding techniques and languages. Informatica can help organizations easily integrate Hadoop into their enterprise data infrastructure and bring the PowerCenter data pipeline mappings running on traditional servers onto Hadoop clusters with minimal modification. Informatica Big Data Edition reduces the risk of Hadoop projects and increases agility by enabling more of your organization to interact with the data in your Hadoop cluster.
To get the Informatica Big Data Edition Trial Sandbox VMs and more information please visit Informatica Marketplace
This blog post initially appeared on Exterro and is reblogged here with their consent.
As data volumes increase and become more complex, having an integrated e-discovery environment where systems and data sources are automatically synching information and exchanging data with e-discovery applications has become critical for organizations. This is true of unstructured and semi-structured data sources, such as email servers and content management systems, as well as structured data sources, like databases and data archives. The topic of systems integration will be discussed on Exterro’s E-Discovery Masters series webcast, “Optimizing E-Discovery in a Multi-Vendor Environment.” The webcast is CLE-accredited and will air on Thursday, September 4 at 1pm ET/10am PT. Learn more and register here.
I recently interviewed Jim FitzGerald, Sr. Director for Exterro, and Josh Alpern, VP ILM Domain Experts for Informatica, about the important and often overlooked role structured data plays during the course of e-discovery.
Q: E-Discovery demands are often discussed in the context of unstructured data, like email. What are some of the complications that arise when a matter involves structured data?
Jim: A lot of e-discovery practitioners are comfortable with unstructured data sources like email, file shares, or documents in SharePoint, but freeze up when they have to deal with structured data. They are unfamiliar with the technology and terminology of databases, extracts, report generation, and archives. They’re unsure about the best ways to preserve or collect from these sources. If the application is an old one, this fear often gets translated into a mandate to keep everything just as it is, which translates to mothballed applications that just sit there in case data might be needed down the road. Beyond the costs, there’s also the issue that IT staff turnover means that it’s increasingly hard to generate the reports Legal and Compliance need from these old systems.
Josh: Until now, e-discovery has largely been applied to unstructured data and email for two main reasons: 1) a large portion of relevant data resides in these types of data stores, and 2) these are the data formats that everyone is most familiar with and can relate to most easily. We all use email, and we all use files and documents. So it’s easy for people to look at an email or a document and understand that everything is self-contained in that one “thing.” But structured data is different, although not necessarily any less relevant. For example, someone might understand conceptually what a “purchase order” is, but not realize that in a financial application a purchase order consists of data that is spread across 50 different database tables. Unlike with an email or a PDF document, there might not be an easy way to simply produce a purchase order, in this example, for legal discovery without understanding how those 50 database tables are related to each other. Furthermore, to use email as a comparison, everyone understands what an email “thread” is. It’s easy to ask for all the emails in single thread, and usually it’s relatively easy to identify all of those emails: they all have the same subject line. But in structured data the situation can be much more complicated. If someone asks to see every financial document related to a single purchase order, you would have to understand all of the connections between the many database tables that comprise all of those related documents and how they related back to the requested purchase order. Solutions that are focused on email and unstructured data have no means to do this.
Q: What types of matters tend to implicate structured data and are they becoming more or less common?
Jim: The ones I hear about most common are product liability cases where they need to look back at warranty claims or drug trial data, or employment disputes around pay history and practices, or financial cases where they need to look at pricing or trading patterns.
Josh: The ones that Jim mentioned are certainly prevalent. But in addition, I would add that all kinds of financial data are now governed by retention policies largely because of the same concerns that arise from potential legal situations: at some point, someone may ask for it. Anything related to consumer packaged goods, vehicle parts (planes, boats, cars, trucks, etc.) as well as industrial and durable goods, which tend to have very long lifecycles, are increasingly subject to these types of inquiries.
Q: Simply accessing legacy data to determine its relevance to a matter can present significant challenges. What are some methods by which organizations can streamline the process?
Jim: If you are keeping around mothballed applications and databases purely for reporting purposes, these are prime targets to migrate to a structured data archive. Cost savings from licenses, CPU, and storage can run to 65% per year, with the added benefit that it’s much easier to enforce a retention policy on this data, roll it off when it expires, and compliance reporting is easier to do with modern tools.
Josh: One huge challenge that comes from these legacy applications stems from the fact that there are typically a lot of them. That means that when a discovery request arises, someone – or more likely multiple people – have to go to each one of those applications one by one to search for and retrieve relevant data. Not only is that time consuming and cumbersome, but it also assumes that there are people with the skill sets and application knowledge necessary to interact with all of those different applications. In any given company, that might not be a problem *today*, shortly after the applications have been decommissioned, because all the people that used the applications when they were live are still around. But will that still be the case 5, 7, 10 or 20 years from now? Probably not. Retiring all of these legacy applications into a “platform neutral” format is a much more sustainable, not to mention cost effective, approach.
Q: How can e-discovery preservation and collection technologies be leveraged to help organizations identify and “lock down” structured data?
Jim: Integrating e-discovery — legal holds and collections — with your structured data archive can make it a lot easier to coordinate preservation and collection activities across the two systems. This reduces the chances of stranded holds — data under preservation that could have been released, and reduces the ambiguity about what needs to happen to the data to support the needs of legal and compliance teams.
Josh: Just as there are solutions for “locking down” unstructured and semi-structured (email) data, there are solutions for locking down structured data. The first and perhaps most important step is recognizing that the solutions for unstructured and semi-structured data are simply incapable of handling structured data. Without something that is purpose built for structured data, your discovery preservation and collection process is going to ignore this entire category of data. The good news is that some of the solutions that are purpose built for structured data have built in integrations to the leading e-discovery platforms.
You can hear more from Informatica’s Josh Alpern and Exterro’s Jim FitzGerald by attending Exterro’s CLE-accredited webcast, “Optimizing E-Discovery in a Multi-Vendor Environment,” airing on Thursday, September 4. Learn more and register here.
Get connected. Be connected. Make connections. Find connections. The Internet of Things (IoT) is all about connecting people, processes, data and, as the name suggests, things. The recent social media frenzy surrounding the ALS Ice Bucket Challenge has certainly reminded everyone of the power of social media, the Internet and a willingness to answer a challenge. Fueled by personal and professional connections, the craze has transformed fund raising for at least one charity. Similarly, IoT may potentially be transformational to the business of the public sector, should government step up to the challenge.
Government is struggling with the concept and reality of how IoT really relates to the business of government, and perhaps rightfully so. For commercial enterprises, IoT is far more tangible and simply more fun. Gaming, televisions, watches, Google glasses, smartphones and tablets are all about delivering over-the-top, new and exciting consumer experiences. Industry is delivering transformational innovations, which are connecting people to places, data and other people at a record pace.
It’s time to accept the challenge. Government agencies need to keep pace with their commercial counterparts and harness the power of the Internet of Things. The end game is not to deliver new, faster, smaller, cooler electronics; the end game is to create solutions that let devices connecting to the Internet interact and share data, regardless of their location, manufacturer or format and make or find connections that may have been previously undetectable. For some, this concept is as foreign or scary as pouring ice water over their heads. For others, the new opportunity to transform policy, service delivery, leadership, legislation and regulation is fueling a transformation in government. And it starts with one connection.
One way to start could be linking previously siloed systems together or creating a golden record of all citizen interactions through a Master Data Management (MDM) initiative. It could start with a big data and analytics project to determine and mitigate risk factors in education or linking sensor data across multiple networks to increase intelligence about potential hacking or breaches. Agencies could stop waste, fraud and abuse before it happens by linking critical payment, procurement and geospatial data together in real time.
This is the Internet of Things for government. This is the challenge. This is transformation.
Malcolm Gladwell wrote an article in The New Yorker magazine in January, 2007 entitled “Open Secrets.” In the article, he pointed out that a national-security expert had famously made a distinction between puzzles and mysteries.
Osama bin Laden’s whereabouts were, for many years, a puzzle. We couldn’t find him because we didn’t have enough information. The key to the puzzle, it was assumed, would eventually come from someone close to bin Laden, and until we could find that source, bin Laden would remain at large. In fact, that’s precisely what happened. Al-Qaida’s No. 3 leader, Khalid Sheikh Mohammed, gave authorities the nicknames of one of bin Laden’s couriers, who then became the linchpin to the CIA’s efforts to locate Bin Laden.
By contrast, the problem of what would happen in Iraq after the toppling of Saddam Hussein was a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much.
This was written before “Big Data” was a household word and it begs the very interesting question of whether organizations and corporations that are, by anyone’s standards, totally deluged with data, are facing puzzles or mysteries. Consider the amount of data that a company like Western Union deals with.
Western Union is a 160-year old company. Having built scale in the money transfer business, the company is in the process of evolving its business model by enabling the expansion of digital products, growth of web and mobile channels, and a more personalized online customer experience. Sounds good – but get this: the company processes more than 29 transactions per seconds on average. That’s 242 million consumer-to-consumer transactions and 459 million business payments in a year. Nearly a billion transactions – a billion! As my six-year-old might say, that number is big enough “to go to the moon and back.” Layer on top of that the fact that the company operates in 200+ countries and territories, and conducts business in 120+ currencies. Senior Director and Head of Engineering Abhishek Banerjee has said, “The data is speaking to us. We just need to react to it.” That implies a puzzle, not a mystery – but only if data scientists are able to conduct statistical modeling and predictive analysis, systematically noting trends in sending and receiving behaviors. Check out what Banerjee and Western Union CTO Sanjay Saraf have to say about it here.
Or consider General Electric’s aggressive and pioneering move into what’s dubbed as the industrial internet. In a white paper entitled “The Case for an Industrial Big Data Platform: Laying the Groundwork for the New Industrial Age,” GE reveals some of the staggering statistics related to the industrial equipment that it manufactures and supports (services comprise 75% of GE’s bottom line):
- A modern wind turbine contains approximately 50 sensors and control loops which collect data every 40 milliseconds.
- A farm controller then receives more than 30 signals from each turbine at 160-millisecond intervals.
- At every one-second interval, the farm monitoring software processes 200 raw sensor data points with various associated properties with each turbine.
Phew! I’m no electricity operations expert, and you probably aren’t either. And most of us will get no further than simply wrapping our heads around the simple fact that GE turbines are collecting a LOT of data. But what the paper goes on to say should grab your attention in a big way: “The key to success for this wind farm lies in the ability to collect and deliver the right data, at the right velocity, and in the right quantities to a wide set of well-orchestrated analytics.” And the paper goes on to recommend that anyone involved in the Industrial Internet revolution strongly consider its talent requirements, with the suggestion that Chief Data officers and/or Data Scientists may be the next critical hires.
Which brings us back to Malcolm Gladwell. In the aforementioned article, Gladwell goes on to pull apart the Enron debacle, and argues that it was a prime example of the perils of too much information. “If you sat through the trial of (former CEO) Jeffrey Skilling, you’d think that the Enron scandal was a puzzle. The company, the prosecution said, conducted shady side deals that no one quite understood. Senior executives withheld critical information from investors…We were not told enough—the classic puzzle premise—was the central assumption of the Enron prosecution.” But in fact, that was not true. Enron employed complicated – but perfectly legal–accounting techniques used by companies that engage in complicated financial trading. Many journalists and professors have gone back and looked at the firm’s regulatory filings, and have come to the conclusion that, while complex and difficult to identify, all of the company’s shenanigans were right there in plain view. Enron cannot be blamed for covering up the existence of its side deals. It didn’t; it disclosed them. As Gladwell summarizes:
“Puzzles are ‘transmitter-dependent’; they turn on what we are told. Mysteries are ‘receiver dependent’; they turn on the skills of the listener.”
I would argue that this extremely complex, fast moving and seismic shift that we call Big Data will favor those who have developed the ability to attune, to listen and make sense of the data. Winners in this new world will recognize what looks like an overwhelming and intractable mystery, and break that mystery down into small and manageable chunks and demystify the landscape, to uncover the important nuggets of truth and significance.
Informatica Cloud Summer ’14 Release Breaks Down Barriers with Unified Data Integration and Application Integration for Real Time and Bulk Patterns
This past week, Informatica Cloud marked an important milestone with the Summer 2014 release of the Informatica Cloud platform. This was the 20th Cloud release, and I am extremely proud of what our team has accomplished.
“SDL’s vision is to help our customers use data insights to create meaningful experiences, regardless of where or how the engagement occurs. It’s multilingual, multichannel and on a global scale. Being able to deliver the right information at the right time to the right customer with Informatica Cloud Summer 2014 is critical to our business and will continue to set us apart from our competition.”
– Paul Harris, Global Business Applications Director, SDL Pic
When I joined Informatica Cloud, I knew that it had the broadest cloud integration portfolio in the marketplace: leading data integration and analytic capabilities for bulk integration, comprehensive cloud master data management and test data management, and over a hundred connectors for cloud apps, enterprise systems and legacy data sources.. all delivered in a self-service design with point-and-click wizards for citizen integrators, without the need for complex and costly manual custom coding.
But, I also learned that our broad portfolio belies another structural advantage: because of Informatica Cloud’s unique, unified platform architecture, it has the ability to surface application (or real time) integration capabilities alongside its data integration capabilities with shared metadata across real time and batch workflows.
With the Summer 2014 release, we’ve brought our application integration capabilities to the forefront. We now provide the most-complete cloud app integration capability in the marketplace. With a design environment that’s meant not for just developers but also line of business IT, now app admins can also build real time process workflows that cut across on-premise and cloud and include built-in human workflows. And with the capability to translate these process workflows instantly into mobile apps for iPhone and Android mobile devices, we’re not just setting ourselves apart but also giving customers the unique capabilities they need for their increasingly mobile employees.
“Schneider’s strategic initiative to improve front-office performance relied on recording and measuring sales person engagement in real time on any mobile device or desktop. The enhanced real time cloud application integration features of Informatica Cloud Summer 2014 makes it all possible and was key to the success of a highly visible and transformative initiative.”
– Mark Nardella, Global Sales Process Director, Schneider Electric SE
With this release, we’re also giving customers the ability to create workflows around data sharing that mix and match batch and real time integration patterns. This is really important. Because unlike the past, where you had to choose between batch and real time, in today’s world of on-premise, cloud-based, transactional and social data, you’re now more than ever having to deal with both real time interactions and the processing of large volumes of data. For example, let’s surmise a typical scenario these days at high-end retail stores. Using a clienteling iPad app, the sales rep looks up bulk purchase history and inventory availability data in SAP, confirms availability and delivery date, and then processes the customer’s order via real time integration with NetSuite. And if you ask any customer, having a single workflow to unify all of that for instant and actionable insights is a huge advantage.
“Our industry demands absolute efficiency, speed and trust when dealing with financial information, and the new cloud application integration feature in the latest release of Informatica Cloud will help us service our customers more effectively by delivering the data they require in a timely fashion. Keeping call-times to a minimum and improving customer satisfaction in real time.”
– Kimberly Jansen, Director CRM, Misys PLC
We’ve also included some exciting new Vibe Integration packages or VIPs. VIPs deliver pre-built business process mappings between front-office and back-office applications. The Summer 2014 release includes new bidirectional VIPs for Siebel to Salesforce and SAP to Salesforce that make it easier for customers to connect their Salesforce with these mission-critical business applications.
And lastly, but not least importantly, the release includes a critical upgrade to our API Framework that provides the Informatica Cloud iPaaS end-to-end support for connectivity to any company’s internal or external APIs. With the newly available API creation, definition and consumption patterns, developers or citizen integrators can now easily expose integrations as APIs and users can consume them via integration workflows or apps, without the need for any additional custom code.
The features and capabilities released this summer are available to all existing Informatica Cloud customers, and everyone else through our free 30-day trial offer.
Since the survey was published, many enterprises have, indeed, leveraged the cloud to host business data in both IaaS and SaaS incarnations. Overall, there seems to be two types of enterprises: First are the enterprises that get the value of data integration. They leverage the value of cloud-based systems, and do not create additional data silos. Second are the enterprises that build cloud-based data silos without a sound data integration strategy, and thus take a few steps backward, in terms of effectively leveraging enterprise data.
There are facts about data integration that most in enterprise IT don’t yet understand, and the use of cloud-based resources actually makes things worse. The shame of it all is that, with a bit of work and some investment, the value should come back to the enterprises 10 to 20 times over. Let’s consider the facts.
Fact 1: Implement new systems, such as those being stood up on public cloud platforms, and any data integration investment comes back 10 to 20 fold. The focus is typically too much on cost and not enough on the benefit, when building a data integration strategy and investing in data integration technology.
Many in enterprise IT point out that their problem domain is unique, and thus their circumstances need special consideration. While I always perform domain-specific calculations, the patterns of value typically remain the same. You should determine the metrics that are right for your enterprise, but the positive values will be fairly consistent, with some varying degrees.
Fact 2: It’s not just about data moving from place-to-place, it’s also about the proper management of data. This includes a central understanding of data semantics (metadata), and a place to manage a “single version of the truth” when it comes to dealing massive amounts of distributed data that enterprises must typically manage, and now they are also distributed within public clouds.
Most of those who manage enterprise data, cloud or no-cloud, have no common mechanism to deal with the meaning of the data, or even the physical location of the data. While data integration is about moving data from place to place to support core business processes, it should come with a way to manage the data as well. This means understanding, protecting, governing, and leveraging the enterprise data, both locally and within public cloud providers.
Fact 3: Some data belongs on clouds, and some data belongs in the enterprise. Those in enterprise IT have either pushed back on cloud computing, stating that data outside the firewall is a bad idea due to security, performance, legal issues…you name it. Others try to move all data to the cloud. The point of value is somewhere in between.
The fact of the matter is that the public cloud is not the right fit for all data. Enterprise IT must carefully consider the tradeoff between cloud-based and in-house, including performance, security, compliance, etc.. Finding the best location for the data is the same problem we’ve dealt with for years. Now we have cloud computing as an option. Work from your requirements to the target platform, and you’ll find what I’ve found: Cloud is a fit some of the time, but not all of the time.
That second question is a killer because most people — no matter if they’re in marketing, sales or manufacturing — rely on incomplete, inaccurate or just plain wrong information. Regardless of industry, we’ve been fixated on historic transactions because that’s what our systems are designed to provide us.
“Moneyball: The Art of Winning an Unfair Game” gives a great example of what I mean. The book (not the movie) describes Billy Beane hiring MBAs to map out the factors that would win a baseball game. They discovered something completely unexpected: That getting more batters on base would tire out pitchers. It didn’t matter if batters had multi-base hits, and it didn’t even matter if they walked. What mattered was forcing pitchers to throw ball after ball as they faced an unrelenting string of batters. Beane stopped looking at RBIs, ERAs and even home runs, and started hiring batters who consistently reached first base. To me, the book illustrates that the most useful knowledge isn’t always what we’ve been programmed to depend on or what is delivered to us via one app or another.
For years, people across industries have turned to ERP, CRM and web analytics systems to forecast sales and acquire new customers. By their nature, such systems are transactional, forcing us to rely on history as the best predictor of the future. Sure it might be helpful for retailers to identify last year’s biggest customers, but that doesn’t tell them whose blogs, posts or Tweets influenced additional sales. Isn’t it time for all businesses, regardless of industry, to adopt a different point of view — one that we at Informatica call “Data-First”? Instead of relying solely on transactions, a data-first POV shines a light on interactions. It’s like having a high knowledge IQ about relationships and connections that matter.
A data-first POV changes everything. With it, companies can unleash the killer app, the killer sales organization and the killer marketing campaign. Imagine, for example, if a sales person meeting a new customer knew that person’s concerns, interests and business connections ahead of time? Couldn’t that knowledge — gleaned from Tweets, blogs, LinkedIn connections, online posts and transactional data — provide a window into the problems the prospect wants to solve?
That’s the premise of two startups I know about, and it illustrates how a data-first POV can fuel innovation for developers and their customers. Today, we’re awash in data-fueled things that are somehow attached to the Internet. Our cars, phones, thermostats and even our wristbands are generating and gleaning data in new and exciting ways. That’s knowledge begging to be put to good use. The winners will be the ones who figure out that knowledge truly is power, and wield that power to their advantage.