Category Archives: Data Transformation
This creative thinking to solve a problem came from a request to build a soldier knife from the Swiss Army. In the end, the solution was all about getting the right tool for the right job in the right place. In many cases soldiers didn’t need industrial strength tools, all they really needed was a compact and lightweight tool to get the job at hand done quickly.
Putting this into perspective with today’s world of Data Integration, using enterprise-class data integration tools for the smaller data integration project is over kill and typically out of reach for the smaller organization. However, these smaller data integration projects are just as important as those larger enterprise projects, and they are often the innovation behind a new way of business thinking. The traditional hand-coding approach to addressing the smaller data integration project is not-scalable, not-repeatable and prone to human error, what’s needed is a compact, flexible and powerful off-the-shelf tool.
Thankfully, over a century after the world embraced the Swiss Army Knife, someone at Informatica was paying attention to revolutionary ideas. If you’ve not yet heard the news about the Informatica platform, a version called PowerCenter Express has been released and it is free of charge so you can use it to handle an assortment of what I’d characterize as high complexity / low volume data integration challenges and experience a subset of the Informatica platform for yourself. I’d emphasize that PowerCenter Express doesn’t replace the need for Informatica’s enterprise grade products, but it is ideal for rapid prototyping, profiling data, and developing quick proof of concepts.
PowerCenter Express provides a glimpse of the evolving Informatica platform by integrating four Informatica products into a single, compact tool. There are no database dependencies and the product installs in just under 10 minutes. Much to my own surprise, I use PowerCenter express quite often going about the various aspects of my job with Informatica. I have it installed on my laptop so it travels with me wherever I go. It starts up quickly so it’s ideal for getting a little work done on an airplane.
For example, recently I wanted to explore building some rules for an upcoming proof of concept on a plane ride home so I could claw back some personal time for my weekend. I used PowerCenter Express to profile some data and create a mapping. And this mapping wasn’t something I needed to throw away and recreate in an enterprise version after my flight landed. Vibe, Informatica’s build once / run anywhere metadata driven architecture allows me to export a mapping I create in PowerCenter Express to one of the enterprise versions of Informatica’s products such as PowerCenter, DataQuality or Informatica Cloud.
As I alluded to earlier in this article, being a free offering I honestly didn’t expect too much from PowerCenter Express when I first started exploring it. However, due to my own positive experiences, I now like to think of PowerCenter Express as the Swiss Army Knife of Data Integration.
To start claiming back some of your personal time, get started with the free version of PowerCenter Express, found on the Informatica Marketplace at: https://community.informatica.com/solutions/pcexpress
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.
The interesting thing is that many of the upstarts do not even intend to take on the market leader in the segment. Christensen cites the classic example of Digital Equipment Corporation in the 1980s, which was unable to make the transition from large, expensive enterprise systems to smaller, PC-based equipment. The PC upstarts in this case did not take on Digital directly – rather they addressed unmet needs in another part of the market.
Christensen wrote and published The Innovator’s Dilemma more than 17 years ago, but his message keeps reverberating across the business world. Lately, Jill Lapore questioned some of thinking that has evolved around disruptive innovation in a recent New Yorker article. “Disruptive innovation is a theory about why businesses fail. It’s not more than that. It doesn’t explain change. It’s not a law of nature,” she writes. Christensen responded with a rebuttal to Lapore’s thesis, noting that “disruption doesn’t happen overnight,” and that “[Disruptive innovation] is not a theory about survivability.”
There is something Lapore points out that both she and Christensen can agree on: “disruption” is being oversold and misinterpreted on a wide scale these days. Every new product that rolls out is now branded as “disruptive.” As stated above, the true essence of disruption is creating new markets where the leaders would not tread.
Data itself can potentially be a source of disruption, as data analytics and information emerge as strategic business assets. While the ability to provide data analysis at real-time speeds, or make new insights possible isn’t disruption in the Christensen sense, we are seeing the rise of new business models built around data and information that could bring new leaders to the forefront. Data analytics can either play a role in supporting this movement, or data itself may be the new product or service disrupting existing markets.
We’ve already been seeing this disruption taking place within the publishing industry, for example – companies or sites providing real-time or near real-time services such as financial updates, weather forecasts and classified advertising have displaced traditional newspapers and other media as information sources.
Employing data analytics as a tool for insights never before available within an industry sector also may be part of disruptive innovation. Tesla Motors, for example, is disruptive to the automotive industry because it manufactures entirely electric cars. But the formula to its success is its employment of massive amounts of data from its array of vehicle in-devices to assure quality and efficiency.
Likewise, data-driven disruption may be occurring in places that may have been difficult to innovate. For example, it’s long been speculated that some of the digital giants, particularly Google, are poised to enter the long-staid insurance industry. If this were to happen, Google would not enter as a typical insurance company with a new web-based spin. Rather, the company would be employing new techniques of data gathering, insight and analysis to offer an entirely new model to consumers – one based on data. As Christopher Hernaes recently related in TechCrunch, Google’s ability to collect and mine data on homes, business and autos give it a unique value proposition n the industry’s value chain.
We’re in an era in which Christensen’s mode of disruptive innovation has become a way of life. Increasingly, it appears that enterprises that are adept and recognizing and acting upon the strategic potential of data may be joining the ranks of the disruptors.
Even in “good” data there is a lot of garbage. For example a person’s name. John could also be spelled as Jon or Von (I have a high school sports trophy to prove it). Schmidt could become Schmitt or Smith. In Hungarian my name is Janos Kovacs. Human beings entering data make errors in spelling, phonetics, and keypunching. We also have to deal with variations associated with compound and account names, abbreviations, nicknames, prefix & suffix variations, foreign names, and missing elements. As long as humans are involved in entering data there will be a significant amount of garbage in any database. So how do we turn this gibberish into gems of information?
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)
In a recent Aberdeen Group Analyst Insight paper it was identified that 50% of their survey respondents were currently integrating Hierarchical data sources with 13% planning to implement this capability in the next 12 months. But the changing trend is that of those organisations currently integrating XML data where nearly a third are using or are planning to integrate other Hierarchical sources with the need to integrate JSON coming out in the lead with COBOL records and Google Protocol Buffers close behind. Apache AVRO has not been integrated much currently but shows the biggest growth in planned integration and also number of projects. (more…)
For once hype could be a good thing. Well it is if you’re reading the latest Gartner – Hype Cycle for Application Infrastructure published last month, in July; because in it you will see how two important technology trends – the areas of BGS (sorry, another TLA for you to learn – B2B Gateway Software (and it’s even a nested TLA!)), and Managed File Transfer (MFT) have now made it out of the Trough of Disillusionment and up onto the Slope of Enlightenment. Why do I suddenly feel like John Bunyan’s Pilgrim?
Anyway, the key points that Gartner identifies are that centrally managing B2B interactions provides:
- Economies of scale and deeper insight into the technical aspects of data integration, transaction delivery, process integration and SOA interoperability, such as consolidating, tracking, storing and auditing files, messages, process events, acknowledgments, receipts, and errors and exceptions.
The B2B communications process with your external business partners, suppliers, etc. is not a static process and you need to be able to have visibility of these communications for not only regulatory compliance and auditability issues but also to manage the dynamic process.
- A single point through which to troubleshoot B2B integration issues.
B2B Gateways are now a mature technology and like standards most organisations use a number of them. Integrating them provides significant benefit and enables the organisation to have visibility of their business relationships and transactions and also know where to go to when things go wrong and need managing as they definitely will.
- A central, reusable repository for external business partner profiles and Web services APIs. This is particularly valuable when dealing with a large number of external business partners and cloud APIs, and when multiple business units interact with the same partners or cloud services.
The number of business partners we all have to deal with is increasing rapidly as we outsource, subcontract, farm-out and generally rely more on external specialist organisations. Having visibility of these relationships and making the most from new integration methodologies and processes can generate great savings and also give visibility of our business exposure to these suppliers.
- Support for the myriad data formats, transport and communication protocols, and security standards.
As the old saying goes “I love standards, there are so many to choose from.” Well our business processes are not getting any simpler; data standards are under constant change and revision with data formats becoming increasingly complex. So to be able to handle not just a few but all key formats and to be able to reuse previous transformation experience, utilise already developed libraries and lever new complex hierarchical data structures makes the difference between a stove-piped and soon to be redundant system and one that is flexible and supports new and ever changing business requirements.
As one of the vendors identified that can actively compete with offerings positioned to address the broader set of usage scenarios, Informatica’s B2B Data Exchange solution not only supports the B2B functional requirements an organisation will have but also integrates this process and data into the wider internal data integration platform and management process. (Look at this presentation for the new 9.5 functionality of key new featues.)
So now the reality not hype of B2B solutions can be delivered.
Sources: Gartner Hype Cycle for Application Infrastructure, 2012. Published: 24 July 2012
Analyst: Jess Thompson
Gartner Disclaimer re the Hype Cycle
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
One up-and-coming use case in the Capital Markets that we are excited about is front office real-time risk analytics on streaming market data, to decrease risk by informing traders in real time about potential changes to trading strategies, based on the most up-to-date data possible.
Remote Data Collection and Transformation – with Ultra Messaging Cache Option and B2B Data Transformation
Sometimes when I drive past an electronic tollway collection sensor, I wonder about the amount of data it must generate. I’m no expert on such technology, but at a minimum, the RFID sensor has to read the chip in your car, and log the date and time plus your RFID info, and then a camera takes a picture to catch any potential violators. Now multiply that data times the hundreds of thousands of cars that drive such roads every day, times the number of sensors they pass, and I’m quite sure this number exceeds several million messages per day. (more…)
Today, agility and timely visibility are critical to the business. No wonder CIO.com, states that business intelligence (BI) will be the top technology priority for CIOs in 2012. However, is your data architecture agile enough to handle these exacting demands?
In his blog Top 10 Business Intelligence Predictions For 2012, Boris Evelson of Forrester Research, Inc., states that traditional BI approaches often fall short for the two following reasons (among many others):
- BI hasn’t fully empowered information workers, who still largely depend on IT
- BI platforms, tools and applications aren’t agile enough (more…)