Tag Archives: open-source
As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.
Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.
As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.
Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.
We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?
Today is an exciting day for technology in high performance electronic trading. By the time you read this, the CME Group, Real Logic Ltd., and Informatica will have announced a new open source initiative. I’ve been collaborating on this work for a few months and I feel it is some great technology. I hope you will agree.
Simple Binary Encoding (SBE) is an encoding for FIX that is being developed by the FIX protocol community as part of their High Performance Working Group. The goal is to produce a binary encoding representation suitable for low-latency financial trading. The CME Group, Real Logic, and Informatica have sponsored the development of an open source implementation of an early version of the SBE specification undertaken by Martin Thompson (of Real Logic, formerly of LMAX) and myself, Todd Montgomery (of Informatica). The implementation methodology has been a very high performance encoding/decoding mechanism for data layout that is tailored to not just high performance application demands in low-latency trading. But has implications for all manner of serialization and marshaling in use cases from Big Data analytics to device data capture.
Financial institutions, and other businesses, need to serialize data structures for purposes of transmission over networks as well as for storage. SBE is a developing standard for how to encode/decode FIX data structures over a binary media at high speeds with low-latency. The SBE project is most similar to Google Protocol Buffers. However, looks are quite deceiving. SBE is an order of magnitude faster and immensely more efficient for encoding and decoding. This focus on performance means application developers can turn their attention to the application logic instead of the details of serialization. There are a number of advantages to SBE beyond speed, although, speed is of primary concern.
- SBE provides a strong typing mechanism in the form of schemas for data objects
- SBE only generates the overhead of versioning if the schema needs to handle versioning and if so, only on decode
- SBE uses an Intermediate Representation (IR) for decoupling schema specification, optimization, and code generation
- SBEs use of IR will allow it to provide various data layout optimizations in the near future
- SBE initially provides Java, C++98, and C# code generators with more on the way
What breakthrough has lead to SBE being so fast?
It isn’t new or a breakthrough. SBE has been designed and implemented with the concepts and tenants of Mechanical Sympathy. Most software is developed with abstractions to mask away the details of CPU architecture, disk access, OS concepts, etc. Not so for SBE. It’s been designed with Martin and I utilizing everything we know about how CPUs, memory, compilers, managed runtimes, etc. work and making it very fast and work _with_ the hardware instead of against it.
Martin’s Blog will have a more detailed-oriented, technical discussion sometime later on SBE. But I encourage you to look at it and try it out. The work is open to the public under an Apache Public License.
Todd L. Montgomery is a Vice President of Architecture for Informatica and the chief designer and implementer of the 29West low latency messaging products. The Ultra Messaging product family (formerly known as LBM) has over 190 production deployments within electronic trading across many asset classes and pioneered the broker-less messaging paradigm. In the past, Todd has held architecture positions at TIBCO and Talarian as well as lecture positions at West Virginia University, contributed to the IETF, and performed research for NASA in various software fields. With a deep background in messaging systems, high performance systems, reliable multicast, network security, congestion control, and software assurance, Todd brings a unique perspective tempered by over 20 years of practical development experience.
I recently had the pleasure of participating in a big data panel at the Pacific Crest investor’s conference (the replay available here.) I was joined on the panel by Hortonworks, MapR, Datastax and Microsoft. There is clearly a lot of interest in the world of big data and how the market is evolving. I came away from the panel with four fundamental thoughts: (more…)