Introducing HParser

Today, Informatica is announcing the immediate availability of Informatica HParser, the first enterprise-class data parsing transformation solution for Hadoop environments. Available in a free community edition and commercial editions, Informatica HParser empowers organizations to maximize their Return on Data by extracting the value of complex, unstructured data traditionally under-exploited in the enterprise.  Please view how Ronen Schwartz, Vice President of Products, B2B Data Exchange and Data Transformation explains what drove Informatica to build and release HParser. Why We Built HParser.

To understand why this is important to the Hadoop community, let’s look at how organizations are using Hadoop today. In 2011, Ventana Research completed a benchmark research survey among 163 large scale data users. There is a wealth of insights that can be drawn from this research. One thing that stood out immediately for me was that Hadoop was present in more than half of the big data audience. Especially the research revealed that 22% of the respondents were using Hadoop in production.  It also refers to the data types most commonly used in Hadoop such as application logs, network logs, Web logs, event data and log files.  What was also illuminating was that many users are using Hadoop in conjunction with other third party tools including Informatica being mentioned as one of the top 10 vendors.  In a nutshell, this benchmark validates that organizations are getting smarter about adopting big data processing and infrastructure technologies by selecting the right technologies, open source and proprietary technologies alike.  Furthermore, the frequent use of Hadoop to analyze log data to a growing need for organizations to deploy a parsing alternative that can graciously co-exist with the rest of the IT infrastructure, transforming complex, unstructured data like logs and event data into  a structure that is easier to analyze and process.

As organizations evolve in their Hadoop deployment from R&D to production or from departmental to enterprise-wide, they often turn to a data specialist like Informatica to ensure data is trustworthy, actionable, and authoritative to improve analytical insights and business operations. Informatica continues to execute on its vision of turning big data into big opportunities and unleashing the power of Hadoop. Earlier in 2011, Informatica delivered its native connector to Hadoop, PowerExchange for Hadoop.  This enables customers to leverage Informatica’s universal connectivity to deliver virtually any type of data into and out of Hadoop at various latencies (e.g. batch, near real-time, real-time).   With the HParser release, we are enabling customers to use pre-built parsers or design their own parsers (via a user-friendly visual integrated design environment) for processing complex unstructured or semi-structured data formats (e.g. web logs, XML, JSON, FIX, SWIFT, HL7, CDR (Call Detailed Records), WORD, PDF, XLS, etc.) on Hadoop.

Informatica’s Hadoop solution is geared toward helping organizations get more from their Hadoop investments and leverage their existing data integration skill sets and enterprise IT investments.  We are excited about this new addition to our platform, HParser, standing ready to helping organizations get more from big data.  To learn more, please visit www.informatica.com/HParser.  You can also check out the chalktalk and product demonstration.

This entry was posted in B2B, B2B Data Exchange, Big Data, Data Transformation and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>