Category Archives: Big Data
Happy Holidays, Happy HoliData
In case you have missed our #HappyHoliData series on Twitter and LinkedIn, I decided to provide a short summary of best practices which are unleashing information potential. Simply scroll and click on the case study which is relevant for you and your business. The series touches on different industries and use cases. But all have one thing in common: All consider information quality as key value to their business to deliver the right services or products to the right customer.
Thanks a lot to all my great teammates, who made this series happen.
Happy Holidays, Happy HoliData.
What do all marketers have in common? Marketing guru Seth Godin famously said that all marketers are storytellers. Stories, not features and benefits, sell.
Anyone who buys a slightly more expensive brand of laundry detergent because it’s “better” proves this. Godin wrote that if someone buys shoes because he or she wants to be associated with a brand that is “cool,” that brand successfully told its story to the right market.
A story has heroes we identify with. It has a conflict, which the heroes try to overcome. A good story’s DNA is an ordinary person in unusual circumstances. When is the last time you had an unusual result from your marketing campaigns? Perhaps a pay-per-click ad does poorly in your A/B testing. Or, there’s a high bounce rate from your latest email campaign.
Many marketers aren’t data scientists. But savvy marketers know they have to deal with big data, since it has become a hot topic central to many businesses. Marketers simply want to do their jobs better — and big data should be seen as an opportunity, not a hindrance.
When you have big data that could unlock great insight into your business, look beyond complexity and start with your strength as a marketer: Storytelling.
To get you started, I took the needs of marketers and applied them to these “who, what, why and how” principles from a recent article in the Harvard Business Review by the author of Big Data at Work, Tom Davenport:
Who is your hero? He or she is likely your prospective or existing customer.
What problem did the hero have? This is the action of the story. Here’s a real-life example from the Harvard Business Review article: Your hero visits your website, and adds items to the shopping cart. However, when you look at your analytics dashboard, you notice he or she never finishes the transaction.
Why do you care about the hero’s problem? Identifying with the hero is important for a story’s audience. It creates tension, and gives you and other stakeholders the incentive you need to dig into your data for a resolution.
How do you resolve the problem? Now you see what big data can do — it solves marketing problems and gives you better results. In the abandoned shopping cart example, the company found that people in Ireland were not checking out. The resolution came from the discovery that the check-out process asked for a postal code. Some areas of Ireland have no postal codes, so visitors would give up.
Remember it’s possible that the data itself is the problem. If you have bad contact data, you can’t reach your customers. Find the source of your bad data, and then you can return to your marketing efforts with confidence.
While big data may sound complicated or messy, if you have a storytelling path like this to take, you can find the motivation you need to uncover the powerful information required to better engage with your audience.
Engaging your audience starts with having accurate, validated information about your audience. Marketers can use data to fuel their campaigns and make better decisions on strategy and planning. Learn more about data quality management in this white paper.
It takes a village to build mainstream big data solutions. We often get so caught up in Hadoop use cases and customer successes that sometimes we don’t talk enough about the innovative partner technologies and integrations that enable our customers to put the enterprise data hub at the core of their data architecture and innovate with confidence. Cloudera and Informatica have been working together to integrate our products to enable new levels of productivity and lower deployment and production risk.
Going from Hadoop to an enterprise data hub, means a number of things. It means that you recognize the business value of capturing and leveraging all your data for exploration and analytics. It means you’re ready to make the move from Hadoop pilot project to production. And it means your data is important enough that it’s worth securing and making data pipelines visible. It’s the visibility layer, and in particular, the unique integration between Cloudera Navigator and Informatica that I want to focus on in this post.
The era of big data has ushered in increased regulations in a number of industries – banking, retail, healthcare, energy – most of which deal in how data is managed throughout its lifecycle. Cloudera Navigator is the only native end-to-end solution for governance in Hadoop. It provides visibility for analysts to explore data in Hadoop, and enables administrators and managers to maintain a full audit history for HDFS, HBase, Hive, Impala, Spark and Sentry then run reports on data access for auditing and compliance.The integration of Informatica Metadata Manager in the Big Data Edition and Cloudera Navigator extends this level of visibility and governance beyond the enterprise data hub.
Today, only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems. And you can view it from a single pane within Informatica.
This is important because Hadoop, and the enterprise data hub in particular, doesn’t function in a silo. It’s an integrated part of a larger enterprise-wide data management architecture. The better the insight into where data originated, where it traveled, who had access to it and what they did with it, the greater our ability to report and audit. No other combination of technologies provides this level of audit granularity.
But more so than that, the visibility Cloudera and Informatica provides our joint customers with the ability to confidently stand up an enterprise data hub as a part of their production enterprise infrastructure because they can verify the integrity of the data that undergirds their analytics. I encourage you to check out a demo of the Informatica-Cloudera Navigator integration at this link: http://infa.media/1uBpPbT
You can also check out a demo and learn a little more about Cloudera Navigator and the Informatica integration in the recorded TechTalk hosted by Informatica at this link:
Data warehousing systems remain the de facto standard for high performance reporting and business intelligence, and there is no sign that will change soon. But Hadoop now offers an opportunity to lower costs by transferring infrequently used data and data preparation workloads off of the data warehouse and process entirely new sources of data coming from the explosion of industrial and personal devices. This is motivating interest in new concepts like the “data lake” as adjunct environments to traditional data warehousing systems.
Now, let’s be real. Between the evolutionary opportunity of preparing data more cost effectively and the revolutionary opportunity of analyzing new sources of data, the latter just sounds cooler. This revolutionary opportunity is what has spurred the growth of new roles like data scientists and new tools for self-service visualization. In the revolutionary world of pervasive analytics, data scientists have the ability to use Hadoop as a low cost and transient sandbox for data. Data scientists can perform exploratory data analysis by quickly dumping data from a variety of sources into a schema-on-read platform and by iterating dumps as new data comes in. SQL-on-Hadoop technologies like Cloudera Impala, Hortonworks Stinger, Apache Drill, and Pivotal HAWQ enable agile and iterative SQL-like queries on datasets, while new analysis tools like Tableau enable self-service visualization. We are merely in the early phases of the revolutionary opportunity of big data.
But while the revolutionary opportunity is exciting, there’s an equally compelling opportunity for enterprises to modernize their existing data environment. Enterprises cannot rely on an iterative dump methodology for managing operational data pipelines. Unmanaged “data swamps” are simply unpractical for business operations. For an operational data pipeline, the Hadoop environment must be a clean, consistent, and compliant system of record for serving analytical systems. Loading enterprise data into Hadoop instead of a relational data warehouse does not eliminate the need to prepare it.
Now I have a secret to share with you: nearly every enterprise adopting Hadoop today to modernize their data environment has processes, standards, tools, and people dedicated to data profiling, data cleansing, data refinement, data enrichment, and data validation. In the world of enterprise big data, schemas and metadata still matter.
I’ll share some examples with you. I attended a customer panel at Strata + Hadoop World in October. One of the participants was the analytics program lead at a large software company whose team was responsible for data preparation. He described how they ingest data from heterogeneous data sources by mandating a standardized schema for everything that lands in the Hadoop data lake. Once the data lands, his team profiles, cleans, refines, enriches, and validates the data so that business analysts have access to high quality information. Another data executive described how inbound data teams are required to convert data into Avro before storing the data in the data lake. (Avro is an emerging data format alongside other new formats like ORC, Parquet, and JSON). One data engineer from one of the largest consumer internet companies in the world described the schema review committee that had been set up to govern changes to their data schemas. The final participant was an enterprise architect from one of the world’s largest telecom providers who described how their data schema was critical for maintaining compliance with privacy requirements since data had to be masked before it could be made available to analysts.
Let me be clear – these companies are not just bringing in CRM and ERP data into Hadoop. These organizations are ingesting patient sensor data, log files, event data, clickstream data, and in every case, data preparation was the first task at hand.
I recently talked to a large financial services customer who proposed a unique architecture for their Hadoop deployment. They wanted to empower line of business users to be creative in discovering revolutionary opportunities while also evolving their existing data environment. They decided to allow line of businesses to set up sandbox data lakes on local Hadoop clusters for use by small teams of data scientists. Then, once a subset of data was profiled, cleansed, refined, enriched, and validated, it would be loaded into a larger Hadoop cluster functioning as an enterprise information lake. Unlike the sandbox data lakes, the enterprise information lake was clean, consistent, and compliant. Data stewards of the enterprise information lake could govern metadata and ensure data lineage tracking from source systems to sandbox to enterprise information lakes to destination systems. Enterprise information lakes balance the quality of a data warehouse with the cost-effective scalability of Hadoop.
Building enterprise information lakes out of data lakes is simple and fast with tools that can port data pipeline mappings from traditional architectures to Hadoop. With visual development interfaces and native execution on Hadoop, enterprises can accelerate their adoption of Hadoop for operational data pipelines.
No one described the opportunity of enterprise information lakes better at Strata + Hadoop World than a data executive from a large healthcare provider who said, “While big data is exciting, equally exciting is complete data…we are data rich and information poor today.” Schemas and metadata still matter more than ever, and with the help of leading data integration and preparation tools like Informatica, enterprises have a path to unleashing information riches. To learn more, check out this Big Data Workbook
Recently, I got to speak to a CIO at a Global 500 Company about the challenges of running his IT organization. He said that one of his biggest challenges is getting business leaders to understand technology better. “I want my business leaders to be asking for digital services that support and build upon their product and service offerings”. I think that his perspective provides real insight to how businesses should be thinking about the so called Internet of Things (IoT), but let me get you there first.
What is the IoT?
According to Frank Burkitt of @Strategy, by 2029, an estimated 50 billion devices around the globe will be connected to the Internet. Perhaps a third will be computers, smartphones, tablets, and TVs. The remaining two-thirds will be “things”–sensors, actuators, and intelligent devices that monitor, control, analyze, and optimize our world. Frank goes on to say if your company wants to stake a claim in the IoT, you first need to develop a distinctive “way to play”—a clear value proposition that you can offer customers. This should be consistent with your enterprise’s overall capabilities system: the things you do best when you go to market.
While what Frank suggests make great sense, they do not in my opinion provide the strategic underpinning that business leaders need to link the IoT to their business strategy. Last week an article in Harvard Business Review by Michael Porter and James E. Heppelmann shared what business leaders need to do to apply the IoT to their businesses. According to Porter and Hepplemann, historical, enterprises have defined their businesses by the physical attributes of the products and services they produce. And while products have been mostly composed of mechanical and electrical parts, they are increasingly becoming complex systems that combine hardware, sensors, data storage, microprocessors, software, and data connectivity.
The IoT is really about creating a system of systems
Porter and Hepplemann share in their article how connectivity allows companies to evolve from making point solutions, to making more complex, higher-value “systems of systems”. According to Russell Ackoff, a system’s orientation views customer problems “as a whole and not on their parts taken separate” (Ackoff’s Best, Russell Ackoff, John Wiley and Sons, page 47). This change means that market winners will tend to view business opportunities from a larger versus a smaller perspective. It reminds me a lot of what Xerox did when it transformed itself from commoditized copiers to high priced software based document management where the printer represent an input device to a larger system. Porter and Hepplemann’s give the example of a company that sells tractors. Once a tractor is smart and connected, it becomes part of a highly interconnected agricultural management solution.
According to Porter and Hepplemann, the key element of “smart, connected products” is they take advantage of ubiquitous wireless connectivity to unleash an era where competition is increasingly about the size of the business problem solved. Porter and Hepplemann claim that as smart, connected products take hold, the idea of industries being defined by physical products or services alone will cease to have meaning. What sense does it make to talk about a “tractor industry” when tractors represent just a piece of an integrated system of products, services, software, and data designed to help farmers increase their crop yield?
Porter and Hepplemann claim, therefore, the phrase “Internet of Things” is not very helpful in understanding the phenomenon or even its implications. They say after all what makes smart, connected products fundamentally different is not the Internet, it is a redefinition of what is a product and the capabilities smart, connected products provide and the data they generate. Companies, therefore, need to look at how the IoT will transform the competition within their specific industries.
Like a business slogan, the IoT is about putting IT inside
IT leaders have a role to play in the IoT. They need to move IT from just assisting business management drive improvements to the company value chain to organizations that as well embed IT in what become system oriented products. How perceptive, therefore, was my CIO friend.
Porter and Hepplemann claim connectivity serves two purposes. First, it allows information to be exchanged between a product and its operating environment, its maker, its users, and other products and systems. Second, connectivity enables some functions of the product to exist outside the physical device. Porter and Hepplemann give the example of Schindler’s PORT Technology that reduces elevator wait times by as much as 50% by predicting elevator demand patterns, calculating the fastest time to destination, and assigning the appropriate elevator to move passengers quickly. Porter and Hepplemann see as well intelligence and connectivity enabling an entirely new set of product functions and capabilities, which can be grouped into four categories: monitor, control, optimize, and autonomy. To be clear, a systems product can potentially incorporate all four.
- Monitored products alert users to changes in circumstances or performance. They can provide a product’s operating characteristics and history. A company must choose the set customer value and define its competitive positioning. This has implications design, marketing, service, and warranty.
- Controlled products can receive remote commands or have algorithms that are built into the device or reside in the product’s cloud. For example, “if pressure gets too high, shut off the valve” or “when traffic in a parking garage reaches a certain level, turn the overhead lighting on or off”.
- Optimized products apply algorithms and analytics to in-use or historical data to improve output, utilization, and efficiency. Real-time monitoring data on product condition and product control capability enables firms to optimize service.
- Autonomous product like are able to learn about their environment, self-diagnose their own service needs, and adapt to users’ preferences.
Smart, connected products expand opportunities for product differentiation
In a world where Geoffrey Moore sees differentiated products constantly being commoditized; smart, connected products dramatically expand opportunities for product differentiation and move the competition away from price alone. Knowing how customers actually use your products enhances a company’s ability to segment customers, customize products, set prices to better capture value, and extend value-added services. Smart, connected products, at the same time, create opportunities to broaden the value proposition beyond products per se, to include valuable data and enhanced service offerings. Broadening product definitions can raise barriers to entrants even higher. The powerful capabilities of smart, connected products not only reshape competition within an industry, but they can expand the very definition of the industry itself. For example, integrating smart, connected farm equipment—such as tractors, tillers, and planters—can enable better overall equipment performance.
Smart, connected products will not only reshape competition within an industry, but they can expand the very definition of the industry itself. Here Porter and Hepplemann are talking here about the competitive boundaries of an industry widen to encompass a set of related products that together meet a broader underlying need. The function of one product is optimized with other related products.
Porter and Hepplemann believe that smart, connected products allow as well companies to form new kinds of relationships with their customers. In many cases, this may require market participants to develop new marketing practices and skill sets. As companies accumulate and analyze product usage data, they will as well gain new insights into how products create value for customers, allowing better positioning of offerings and more effective communication of product value to customers. Using data analytics tools, firms will be able segment their markets in more-sophisticated ways, tailor product and service bundles that deliver greater value to each segment, and price those bundles to capture more of that value.
Some parting thoughts
So summarizing their position, Porter and Hepplemann believe the IoT is really about taking smart things and building solutions that solve bigger problems because one can architect the piece parts into a solution of solutions. This will impact marketplace dynamics and create competitive differentiators in a world of increasing product commodization. For me this is a roadmap forward especially for those at the later stages of product lifecycle curve.
Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”
Solution Brief: The Intelligent Data Platform
Author Twitter: @MylesSuer
Insurance companies serve as a fantastic example of big data technology use since data is such a pervasive asset in the business. From a cost savings and risk mitigation standpoint, insurance companies use data to assess the probable maximum loss of catastrophic events as well as detect the potential for fraudulent claims. From a revenue growth standpoint, insurance companies use data to intelligently price new insurance offerings and deploy cross-sell offers to customers to maximize their lifetime value.
New data sources are enabling insurance companies to mitigate risk and grow revenues even more effectively. Location-based data from mobile devices and sensors are being used inside insured properties to proactively detect exposure to catastrophic events and deploy preventive maintenance. For example, automobile insurance providers are increasingly offering usage-based driving programs, whereby insured individuals install a mobile sensor inside their car to relay the quality of their driving back to their insurance provider in exchange for lower premiums. Even healthcare insurance providers are starting to analyze the data collected by wearable fitness bands and smart watches to monitor insured individuals and inform them of personalized ways to be healthier. Devices can also be deployed in the environment that triggers adverse events, such as sensors to monitor earthquake and weather patterns, to help mitigate the costs of potential events. Claims are increasingly submitted with supporting information in a variety of formats like text files, spreadsheets, and PDFs that can be mined for insights as well. And with the growth on insurance sales online, web log and clickstream data is more important than ever to help drive online revenue.
Beyond the benefits of using new data sources to assess risk and grow revenues, big data technologies are enabling insurance companies to fundamentally rethink the basis of their analytical architecture. In the past, probable maximum loss modeling could only be performed on statistically aggregated datasets. But with big data technologies, insurance companies have the opportunity to analyze data at the level of an insured individual or a unique insurance claim. This increased depth of analysis has the potential to radically improve the quality and accuracy of risk models and market predictions.
Informatica is helping insurance companies accelerate the benefits of big data technologies. With multiple styles of ingestion available, Informatica enables insurance companies to leverage nearly any source of data. Informatica Big Data Edition provides comprehensive data transformations for ETL and data quality, so that insurance companies can profile, parse, integrate, cleanse, and refine data using a simple user-friendly visual development environment. With built-in data lineage tracking and support for data masking, Informatica helps insurance companies ensure regulatory compliance across all data.
To try out the Big Data Edition, download a free trial today in the Informatica Marketplace and get started with big data today!
In last 50-60 years, we have witnessed another revolution, through the invention of computing machines and the Internet – a digital revolution. It has transformed every industry and allowed us to operate at far greater scale – processing more transactions and in more locations – than ever before. New cities emerged on the map, migrations of knowledge workers throughout the world followed, and the standard of living increased again. And digitally available information transformed how we run businesses, cities, or countries.
Forces Shaping Digital Revolution
Over the last 5-6 years, we’ve witnessed a massive increase in the volume and variety of this information. Leading forces that contributed to this increase are:
- Next generation of software technology connecting data faster from any source
- Little to no hardware cost to process and store huge amount of data (Moore’s Law)
- A sharp increase in number of machines and devices generating data that are connected online
- Massive worldwide growth of people connecting online and sharing information
- Speed of Internet connectivity that’s now free in many public places
As a result, our engagement with the digital world is rising – both for personal and business purposes. Increasingly, we play games, shop, sign digital contracts, make product recommendations, respond to customer complains, share patient data, and make real time pricing changes to in-store products – all from a mobile device or laptop. We do so increasingly in a collaborative way, in real-time, and in a very personalized fashion. Big Data, Social, Cloud, and Internet of Things are key topics dominating our conversations and thoughts around data these days. They are altering our ways to engage with and expectations from each other.
This is the emergence of a new revolution or it is the next phase of our digital revolution – the democratization and ubiquity of information to create new ways of interacting with customers and dramatically speeding up market launch. Businesses will build new products and services and create new business models by exploiting this vast new resource of information.
The Quest for Great Data
But, there is work to do before one can unleash the true potential captured in data. Data is no more a by-product or transaction record. Neither it has anymore an expiration date. Data now flows through like a river fueling applications, business processes, and human or machine activities. New data gets created on the way and augments our understanding of the meaning behind this data. It is no longer good enough to have good data in isolated projects, but rather great data need to become accessible to everyone and everything at a moment’s notice. This rich set of data needs to connect efficiently to information that has been already present and learn from it. Such data need to automatically rid itself of inaccurate and incomplete information. Clean, safe, and connected – this data is now ready to find us even before we discover it. It understands the context in which we are going to make use of this information and key decisions that will follow. In the process, this data is learning about our usage, preference, and results. What works versus what doesn’t. New data is now created that captures such inherent understanding or intelligence. It needs to flow back to appropriate business applications or machines for future usage after fine-tuning. Such data can then tell a story about human or machine actions and results. Such data can become a coach, a mentor, a friend of kind to guide us through critical decision points. Such data is what we would like to call great data. In order to truly capitalize on the next step of digital revolution, we will pervasively need this great data to power our decisions and thinking.
Impacting Every Industry
By 2020, there’ll be 50 Billion connected devices, 7x more than human beings on the planet. With this explosion of devices and associated really big data that will be processed and stored increasingly in the cloud. More than size, this complexity will require a new way of addressing business process efficiency that renders agility, simplicity, and capacity. Impact of such transformation will spread across many industries. A McKinsey article, “The Future of Global Payments”, focuses on digital transformation of payment systems in the banking industry and ubiquity of data as a result. One of the key challenges for banks will be to shift from their traditional heavy reliance on siloed and proprietary data to a more open approach that encompasses a broader view of customers.
Industry executives, front line managers, and back office workers are all struggling to make the most sense of the data that’s available.
Closing Thoughts on Great Data
A “2014 PWC Global CEO Survey ” showed 81% ranked technology advances as #1 factor to transform their businesses over next 5 years. More data, by itself, isn’t enough for this transformation. A robust data management approach integrating machine and human data, from all sources and updated in real-time, among on-premise and cloud-based systems must be put in place to accomplish this mission. Such an approach will nurture great data. This end-to-end data management platform will provide data guidance and curate an organization’s one of the most valuable assets, its information. Only by making sense of what we have at our disposal, will we unleash the true potential of the information that we possess. The next step in the digital revolution will be about organizations of all sizes being fueled by great data to unleash their potential tapped.
Service and support is a critical part of this engagement strategy. Retail and consumer goods companies recognize the importance of support to the overall customer relationship. Subsequently, these companies have integrated their before and after-purchase support into their multi-channel marketing and omni-channel marketing strategies. While retail and consumer products companies have led the way on support an integral part of on-going customer engagement, B2B companies have begun to do the same. Enterprise IT companies, which are primarily B2B companies, have been expanding their service and support capabilities to create more engagement between their customers and themselves. Service offerings have expanded to include mobile tools, analytics-driven self-help, and support over social media and other digital channels. The goal of these investments has been to make interactions more productive for the customer, strengthen relationships through positive engagement, and to gather data that drives improvements in both the product and service.
A great example of an enterprise software company that understands the value in customer engagement though support is Informatica. Known primarily for their data integration products, Informatica has been quickly expanding their portfolio of data management and data access products over the past few years. This growth in their product portfolio has introduced many new types of customers Informatica and created more complex customer relationships. For example, the new SpringBok product is aimed at making data accessible to the business user, a new type of interaction for Informatica. Informatica has responded with a collection of new service enhancements that augment and extend existing service channels and capabilities.
What these moves say to me is that Informatica has made a commitment to deeper engagement with customers. For example, Informatica has expanded the avenues from which customers can get support. By adding social media and mobile capabilities, they are creating additional points of presence that address customer issues when and where customers are. Informatica provides support on the customers’ terms instead of requiring customers to do what is convenient for Informatica. Ultimately, Informatica is creating more value by making it easier for customers to interact with them. The best support is that which solves the problem quickest with the least amount of effort. Intuitive knowledge base systems, online support, sourcing answers from peers, and other tools that help find solutions immediately are more valued than traditional phone support. This is the philosophy that drives the new self-help portal, predicative escalation, and product adoption services.
Informatica is also shifting the support focus from products to business outcomes. They are manage problems holistically and are not simply trying to create product band-aids. This shows a recognition that technical problems with data are actually business problems that have broad effects on a customer’s business. Contrast this with the traditional approach to support that focuses fixing a technical issue but doesn’t necessarily address the wider organizational effects of those problems.
More than anything, these changes are preparation for a very different support landscape. With the launch of the Springbok data analytics tool, Informatica’s support organization is clearly positioning itself to help business analysts and similar semi-technical end-users. The expectations of these end-users have been set by consumer applications. They expect more automation and more online resources that help them to use and derive value from their software and are less enamored with fixing technical problems.
In the past, technical support was mostly charged with solving immediate technical issues. That’s still important since the products have to work first to be useful. Now, however, support organizations has an expanded mission to be part of the overall customer experience and to enhance overall engagement. The latest enhancements to the Informatica support portfolio reflects this mission and prepares them for the next generation of non-IT Informatica customers.
A couple comments on the importance of integration platforms like Informatica in an EDW/Hadoop environment.
- Hadoop does mean you can do some quick and inexpensive exploratory analysis with little or no ETL. The issue is that it will not perform at the level you need to take it to production. As the webinar points out, applying some structure to the data with columnar files (not RDBMS) will dramatically speed up query performance.
- The other thing that makes an integration platform more important than ever is the explosion of data complexity. As Dr. Kimball put it:
“Integration is even more important these days because you are looking at all sorts of data sources coming in from all sorts of directions.”
To perform interesting analyses, you are going to have to be able to join data with different formats and different semantic meaning. And that is going to require integration tools.
- Thirdly, if you are going to put this data into production, you will want to incorporate data cleansing, metadata management, and possibly formal data governance to ensure that your data is trustworthy, auditable, and has business context. There is no point in serving up bad data quickly and inexpensively. The result will be poor business decisions and flawed analyses.
For Data Warehouse Architects
The challenge is to deliver actionable content from the exploding amount of data available. You will need to be constantly scanning for new sources of data and looking for ways to quickly and efficiently deliver that to the point of analysis.
For Enterprise Architects
The challenge with adding Big Data to Your EDW Architecture is to define and drive a coherent enterprise data architecture across your organization that standardizes people, processes, and tools to deliver clean and secure data in the most efficient way possible. It will also be important to automate as much as possible to offload routine tasks from the IT staff. The key to that automation will be the effective use of metadata across the entire environment to not only understand the data itself, but how it is used, by whom, and for what business purpose. Once you have done that, then it will become possible to build intelligence into the environment.
For more on Informatica’s vision for an Intelligent Data Platform and how this fits into your enterprise data architecture see Think “Data First” to Drive Business Value
With the Winter 2015 Release, Informatica Cloud Advances Real Time and Batch Integration for Citizen Integrators Everywhere
The first of these is in the area of connectivity and brings a whole new set of features and capabilities to those who use our platform to connect with Salesforce, Amazon Redshift, NetSuite and SAP.
Starting with Amazon, the Winter 2015 release leverages the new Redshift Unload Command, giving any user the ability to securely perform bulk queries, and quickly scan and place multiple columns of data in the intended target, without the need for ODBC or JDBC connectors. We are also ensuring the data is encrypted at rest on the S3 bucket while loading data into Redshift tables; this provides an additional layer of security around your data.
For SAP, we’ve added the ability to balance the load across all applications servers. With the new enhancement, we use a Type B connection to route our integration workflows through a SAP messaging server, which then connects with any available SAP application server. Now if an application server goes down, your integration workflows won’t go down with it. Instead, you’ll automatically be connected to the next available application server.
Additionally, we’ve expanded the capability of our SAP connector by adding support for ECC5. While our connector came out of the box with ECC6, ECC5 is still used by a number of our enterprise customers. The expanded support now provides them with the full coverage they and many other larger companies need.
Finally, for Salesforce, we’re updating to the newest versions of their APIs (Version 31) to ensure you have access to the latest features and capabilities. The upgrades are part of an aggressive roadmap strategy, which places updates of connectors to the latest APIs on our development schedule the instant they are announced.
The second major platform enhancement for the Winter 2015 release has to do with our Cloud Mapping Designer and is sure to please those familiar with PowerCenter. With the new release, PowerCenter users can perform secure hybrid data transformations – and sharpen their cloud data warehousing and data analytic skills – through a familiar mapping and design environment and interface.
Specifically, the new enhancement enables you to take a mapplet you’ve built in PowerCenter and bring it directly into the Cloud Mapping Designer, without any additional steps or manipulations. With the PowerCenter mapplets, you can perform multi-group transformations on objects, such as BAPIs. When you access the Mapplet via the Cloud Mapping Designer, the groupings are retained, enabling you to quickly visualize what you need, and navigate and map the fields.
Additional productivity enhancements to the Cloud Mapping Designer extend the lookup and sorting capabilities and give you the ability to upload or delete data automatically based on specific conditions you establish for each target. And with the new feature supporting fully parameterized, unconnected lookups, you’ll have increased flexibility in runtime to do your configurations.
The third and final major Winter release enhancement is to our Real Time capability. Most notable is the addition of three new features that improve the usability and functionality of the Process Designer.
The first of these is a new “Wait” step type. This new feature applies to both processes and guides and enables the user to add a time-based condition to an action within a service or process call step, and indicate how long to wait for a response before performing an action.
When used in combination with the Boundary timer event variation, the Wait step can be added to a service call step or sub-process step to interrupt the process or enable it to continue.
The second is a new select feature in the Process Designer which lets users create their own service connectors. Now when a user is presented with multiple process objects created when the XML or JSON is returned from a service, he or she can select the exact ones to include in the connector.
An additional Generate Process Objects feature automates the creation of objects, thus eliminating the tedious task of replicating hold service responses containing hierarchical XML and JSON data for large structures. These can now be conveniently auto generated when testing a Service Connector, saving integration developers a lot of time.
The final enhancement for the Process Designer makes it simpler to work with XML-based services. The new “Simplified XML” feature for the “Get From” field treats attributes as children, removing the namespaces and making sibling elements into an object list. Now if a user only needs part of the returned XML, they just have to indicate the starting point for the simplified XML.
While those conclude the major enhancements, additional improvements include:
- A JMS Enqueue step is now available to submit an XML or JSON message to a JMS Queue or Topic accessible via the a secure agent.
- Dequeuing (queue and topics) of XML or JSON request payloads is now fully supported.