Category Archives: Data Transformation
Even in “good” data there is a lot of garbage. For example a person’s name. John could also be spelled as Jon or Von (I have a high school sports trophy to prove it). Schmidt could become Schmitt or Smith. In Hungarian my name is Janos Kovacs. Human beings entering data make errors in spelling, phonetics, and keypunching. We also have to deal with variations associated with compound and account names, abbreviations, nicknames, prefix & suffix variations, foreign names, and missing elements. As long as humans are involved in entering data there will be a significant amount of garbage in any database. So how do we turn this gibberish into gems of information?
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)
In a recent Aberdeen Group Analyst Insight paper it was identified that 50% of their survey respondents were currently integrating Hierarchical data sources with 13% planning to implement this capability in the next 12 months. But the changing trend is that of those organisations currently integrating XML data where nearly a third are using or are planning to integrate other Hierarchical sources with the need to integrate JSON coming out in the lead with COBOL records and Google Protocol Buffers close behind. Apache AVRO has not been integrated much currently but shows the biggest growth in planned integration and also number of projects. (more…)
For once hype could be a good thing. Well it is if you’re reading the latest Gartner – Hype Cycle for Application Infrastructure published last month, in July; because in it you will see how two important technology trends – the areas of BGS (sorry, another TLA for you to learn – B2B Gateway Software (and it’s even a nested TLA!)), and Managed File Transfer (MFT) have now made it out of the Trough of Disillusionment and up onto the Slope of Enlightenment. Why do I suddenly feel like John Bunyan’s Pilgrim?
Anyway, the key points that Gartner identifies are that centrally managing B2B interactions provides:
- Economies of scale and deeper insight into the technical aspects of data integration, transaction delivery, process integration and SOA interoperability, such as consolidating, tracking, storing and auditing files, messages, process events, acknowledgments, receipts, and errors and exceptions.
The B2B communications process with your external business partners, suppliers, etc. is not a static process and you need to be able to have visibility of these communications for not only regulatory compliance and auditability issues but also to manage the dynamic process.
- A single point through which to troubleshoot B2B integration issues.
B2B Gateways are now a mature technology and like standards most organisations use a number of them. Integrating them provides significant benefit and enables the organisation to have visibility of their business relationships and transactions and also know where to go to when things go wrong and need managing as they definitely will.
- A central, reusable repository for external business partner profiles and Web services APIs. This is particularly valuable when dealing with a large number of external business partners and cloud APIs, and when multiple business units interact with the same partners or cloud services.
The number of business partners we all have to deal with is increasing rapidly as we outsource, subcontract, farm-out and generally rely more on external specialist organisations. Having visibility of these relationships and making the most from new integration methodologies and processes can generate great savings and also give visibility of our business exposure to these suppliers.
- Support for the myriad data formats, transport and communication protocols, and security standards.
As the old saying goes “I love standards, there are so many to choose from.” Well our business processes are not getting any simpler; data standards are under constant change and revision with data formats becoming increasingly complex. So to be able to handle not just a few but all key formats and to be able to reuse previous transformation experience, utilise already developed libraries and lever new complex hierarchical data structures makes the difference between a stove-piped and soon to be redundant system and one that is flexible and supports new and ever changing business requirements.
As one of the vendors identified that can actively compete with offerings positioned to address the broader set of usage scenarios, Informatica’s B2B Data Exchange solution not only supports the B2B functional requirements an organisation will have but also integrates this process and data into the wider internal data integration platform and management process. (Look at this presentation for the new 9.5 functionality of key new featues.)
So now the reality not hype of B2B solutions can be delivered.
Sources: Gartner Hype Cycle for Application Infrastructure, 2012. Published: 24 July 2012
Analyst: Jess Thompson
Gartner Disclaimer re the Hype Cycle
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
One up-and-coming use case in the Capital Markets that we are excited about is front office real-time risk analytics on streaming market data, to decrease risk by informing traders in real time about potential changes to trading strategies, based on the most up-to-date data possible.
Remote Data Collection and Transformation – with Ultra Messaging Cache Option and B2B Data Transformation
Sometimes when I drive past an electronic tollway collection sensor, I wonder about the amount of data it must generate. I’m no expert on such technology, but at a minimum, the RFID sensor has to read the chip in your car, and log the date and time plus your RFID info, and then a camera takes a picture to catch any potential violators. Now multiply that data times the hundreds of thousands of cars that drive such roads every day, times the number of sensors they pass, and I’m quite sure this number exceeds several million messages per day. (more…)
Today, agility and timely visibility are critical to the business. No wonder CIO.com, states that business intelligence (BI) will be the top technology priority for CIOs in 2012. However, is your data architecture agile enough to handle these exacting demands?
In his blog Top 10 Business Intelligence Predictions For 2012, Boris Evelson of Forrester Research, Inc., states that traditional BI approaches often fall short for the two following reasons (among many others):
- BI hasn’t fully empowered information workers, who still largely depend on IT
- BI platforms, tools and applications aren’t agile enough (more…)
If you haven’t already, I think you should read The Forrester Wave™: Data Virtualization, Q1 2012. For several reasons – one, to truly understand the space, and two, to understand the critical capabilities required to be a solution that solves real data integration problems.
At the very outset, let’s clearly define Data Virtualization. Simply put, Data Virtualization is foundational to Data Integration. It enables fast and direct access to the critical data and reports that the business needs and trusts. It is not to be confused with simple, traditional Data Federation. Instead, think of it as a superset which must complement existing data architectures to support BI agility, MDM and SOA. (more…)
Today, Informatica is announcing the immediate availability of Informatica HParser, the first enterprise-class data parsing transformation solution for Hadoop environments. Available in a free community edition and commercial editions, Informatica HParser empowers organizations to maximize their Return on Data by extracting the value of complex, unstructured data traditionally under-exploited in the enterprise. Please view how Ronen Schwartz, Vice President of Products, B2B Data Exchange and Data Transformation explains what drove Informatica to build and release HParser. Why We Built HParser.
To understand why this is important to the Hadoop community, let’s look at how organizations are using Hadoop today. In 2011, Ventana Research completed a benchmark research survey among 163 large scale data users. (more…)
The phrase ‘Data Tsunami’ has been used by numerous authors in the last few months and it’s difficult to find another suitable analogy because what’s approaching is of such an increased order of magnitude that the IT industries continued expectations for data growth will be swamped in the next few years.
However impressive a spectacle a Tsunami is, it still wreaks havoc to those who are unprepared or believe they can tread water and simply float to the surface when the trouble has passed.