Category Archives: Big Data
In recent times, the big Internet companies – the Googles, Yahoos and eBays – have proven that it is possible to build a sustainable business on data analytics, in which corporate decisions and actions are being seamlessly guided via an analytics culture, based on data, measurement and quantifiable results. Now, two of the top data analytics thinkers say we are reaching a point that non-tech, non-Internet companies are on their way to becoming analytics-driven organizations in a similar vein, as part of an emerging data economy.
In a report written for the International Institute for Analytics, Thomas Davenport and Jill Dyché divulge the results of their interviews with 20 large organizations, in which they find big data analytics to be well integrated into the decision-making cycle. “Large organizations across industries are joining the data economy,” they observe. “They are not keeping traditional analytics and big data separate, but are combining them to form a new synthesis.”
Davenport and Dyché call this new state of management “Analytics 3.0, ” in which the concept and practices of competing on analytics are no longer confined to data management and IT departments or quants – analytics is embedded into all key organizational processes. That means major, transformative effects for organizations. “There is little doubt that analytics can transform organizations, and the firms that lead the 3.0 charge will seize the most value,” they write.
Analytics 3.0 is the current of three distinct phases in the way data analytics has been applied to business decision making, Davenport and Dyché say. The first two “eras” looked like this:
- Analytics 1.0, prevalent between 1954 and 2009, was based on relatively small and structured data sources from internal corporate sources.
- Analytics 2.0, which arose between 2005 and 2012, saw the rise of the big Web companies – the Googles and Yahoos and eBays – which were leveraging big data stores and employing prescriptive analytics to target customers and shape offerings. This time span was also shaped by a growing interest in competing on analytics, in which data was applied to strategic business decision-making. “However, large companies often confined their analytical efforts to basic information domains like customer or product, that were highly-structured and rarely integrated with other data,” the authors write.
- In the Analytics 3.0 era, analytical efforts are being integrated with other data types, across enterprises.
This emerging environment “combines the best of 1.0 and 2.0—a blend of big data and traditional analytics that yields insights and offerings with speed and impact,” Davenport and Dyché say. The key trait of Analytics 3.0 “is that not only online firms, but virtually any type of firm in any industry, can participate in the data-driven economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as supporting internal decisions with big data.”
Davenport and Dyché describe how one major trucking and transportation company has been able to implement low-cost sensors for its trucks, trailers and intermodal containers, which “monitor location, driving behaviors, fuel levels and whether a trailer/container is loaded or empty. The quality of the optimized decisions [the company] makes with the sensor data – dispatching of trucks and containers, for example – is improving substantially, and the company’s use of prescriptive analytics is changing job roles and relationships.”
New technologies and methods are helping enterprises enter the Analytics 3.0 realm, including “a variety of hardware/software architectures, including clustered parallel servers using Hadoop/MapReduce, in-memory analytics, and in-database processing,” the authors adds. “All of these technologies are considerably faster than previous generations of technology for data management and analysis. Analyses that might have taken hours or days in the past can be done in seconds.”
In addition, another key characteristic of big data analytics-driven enterprises is the ability to fail fast – to deliver, with great frequency, partial outputs to project stakeholders. With the rise of new ‘agile’ analytical methods and machine learning techniques, organizations are capable of delivering “insights at a much faster rate,” and provide for “an ongoing sense of urgency.”
Perhaps most importantly, big data and analytics are integrated and embedded into corporate processes across the board. “Models in Analytics 3.0 are often being embedded into operational and decision processes, dramatically increasing their speed and impact,” Davenport and Dyché state. “Some are embedded into fully automated systems based on scoring algorithms or analytics-based rules. Some are built into consumer-oriented products and features. In any case, embedding the analytics into systems and processes not only means greater speed, but also makes it more difficult for decision-makers to avoid using analytics—usually a good thing.”
The report is available here.
People are obsessed with data. Data captured from our smartphones. Internet data showing how we shop and search — and what marketers do with that data. Big Data, which I loosely define as people throwing every conceivable data point into a giant Hadoop cluster with the hope of figuring out what it all means.
Too bad all that attention stems from fear, uncertainty and doubt about the data that defines us. I blame the technology industry, which — in the immortal words of “Cool Hand Luke” — has had a “failure to communicate.” For decades we’ve talked the language of IT and left it up to our direct customers to explain the proper care-and-feeding of data to their business users. Small wonder it’s way too hard for regular people to understand what we, as an industry, are doing. After all, how we can expect others to explain the do’s and don’ts of data management when we haven’t clearly explained it ourselves?
I say we need to start talking about the ABC’s of handling data in a way that’s easy for anyone to understand. I’m convinced we can because — if you think about it — everything you learned about data you learned in kindergarten: It has to be clean, safe and connected. Here’s what I mean:
Data cleanliness has always been important, but assumes real urgency with the move toward Big Data. I blame Hadoop, the underlying technology that makes Big Data possible. On the plus side, Hadoop gives companies a cost-effective way to store, process and analyze petabytes of nearly every imaginable data type. And that’s the problem as companies go through the enormous time suck of cataloging and organizing vast stores of data. Put bluntly, big data can be a swamp.
The question is, how to make it potable. This isn’t always easy, but it’s always, always necessary. It begins, naturally, by ensuring the data is accurate, de-deduped and complete.
Now comes the truly difficult part: Knowing where that data originated, where it’s been, how it’s related to other data and its lineage. That data provenance is absolutely vital in our hyper-connected world where one company’s data interacts with data from suppliers, partners, and customers. Someone else’s dirty data, regardless of origin, can ruin reputations and drive down sales faster than you can say “Target breach.” In fact, we now know that hackers entered Target’s point-of-sales terminals through a supplier’s project management and electronic billing system. We won’t know for a while the full extent of the damage. We do know the hack affected one-third of the entire U.S. population. Which brings us to:
Obviously, being safe means keeping data out of the hands of criminals. But it doesn’t stop there. That’s because today’s technologies make it oh so easy to misuse the data we have at our disposal. If we’re really determined to keep data safe, we have to think long and hard about responsibility and governance. We have to constantly question the data we use, and how we use it. Questions like:
- How much of our data should be accessible, and by whom?
- Do we really need to include personal information, like social security numbers or medical data, in our Hadoop clusters?
- When do we go the extra step of making that data anonymous?
And as I think about it, I realize that everything we learned in kindergarten boils down to down to the ethics of data: How, for example, do we know if we’re using data for good or for evil?
That question is especially relevant for marketers, who have a tendency to use data to scare people, for crass commercialism, or to violate our privacy just because technology makes it possible. Use data ethically, and we can help change the use.
In fact, I believe that the ethics of data is such an important topic that I’ve decided to make it the title of my new blog.
Stay tuned for more musings on The Ethics of Data.
With the growing prominence of big data as both a strategic and tactical resource for enterprises, there’s been a growing shift in the scope of business intelligence. Not too long ago, BI’s world was in tools that ran on individual workstations or PCs, providing filtered reports on limited sets of data, or stacking the data into analytical cubes.
Now, BI encompasses a range of data and analytics from across the enterprise, and is increasingly likely to be online, supported in the cloud, as it is in a local PC. However, as it has been for years, BI adoption still tends to be limited, not reaching its full potential. In a recent interview, BI analyst Cindi Howson, asks the question, what’s holding companies back from achieving a big impact with BI? In a recent Q&A with TDWI’s Linda Briggs, she discussed the issues raised in her new book, Successful Business Intelligence: Unlock the Value of BI and Big Data.
The success of BI depends, more than anything, on one factor, she says: corporate culture. Some organizations have achieved an analytic culture that reaches across their various business limes, but for many, it’s a challenge. “Leadership means not just the CIO but also the CEO, the lines of business, the COO, and the VP of marketing,” says Howson. “Culture and leadership are closely related, and it’s hard to separate one from the other.”
While corporate culture has always been important to success, it take on even a more critical role in efforts to compete on analytics. For example, she illustrates, “companies have a lot of data, and certainly they value the data, but there is sometimes a fear of sharing it. Once you start exposing the data, somebody’s job might be on the line, or it can show that someone made some bad decisions. Maybe the data will reveal that you’ve spent millions of dollars and you’re not really getting the returns that you thought you would in pursuing a particular market segment or product.”
It’s important to see an analytics culture as focusing on data as a tool to see problems and make course corrections, or act on opportunities – not to punish or expose individuals or departments.
Another point of corporate resistance is employing BI in the cloud, a challenge recently explored by Brad Peters, CEO of Birst. Here again, corporate culture may hold back efforts to move to the cloud, which offers greater scalability and availability for BI and analytics initiatives. In a recent interview in Diginomica, he says that IT departments, for example, may throw up roadblocks, for fear of being disintermediated. Plus, there is also a recognition that once BI data is in the cloud, it often gets “harder to work with.” Multi-tenant sites, for example, have security systems and protocols that may limit users’ ability to manipulate or parse the data.
The increasing adoption of cloud-based services – such as those from Amazon or Salesforce – are gradually melting resistance to the idea of cloud-based BI, Peters adds. He particular;y sees advantages for geographically-dispersed workforces.”
For his part, he admits that “has never been under any illusion that the shift of enterprise analytics to the cloud was going to happen overnight.”
Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
In a previous blog post, I wrote about when business “history” is reported via Business Intelligence (BI) systems, it’s usually too late to make a real difference. In this post, I’m going to talk about how business history becomes much more useful when combined operationally and in real time.
E. P. Thompson, a historian pointed out that all history is the history of unintended consequences. His idea / theory was that history is not always recorded in documents, but instead is ultimately derived from examining cultural meanings as well as the structures of society through hermeneutics (interpretation of texts) semiotics and in many forms and signs of the times, and concludes that history is created by people’s subjectivity and therefore is ultimately represented as they REALLY live.
The same can be extrapolated for businesses. However, the BI systems of today only capture a miniscule piece of the larger pie of knowledge representation that may be gained from things like meetings, videos, sales calls, anecdotal win / loss reports, shadow IT projects, 10Ks and Qs, even company blog posts – the point is; how can you better capture the essence of meaning and perhaps importance out of the everyday non-database events taking place in your company and its activities – in other words, how it REALLY operates.
One of the keys to figuring out how businesses really operate is identifying and utilizing those undocumented RULES that are usually underlying every business. Select company employees, often veterans, know these rules intuitively. If you watch them, and every company has them, they just have a knack for getting projects pushed through the system, or making customers happy, or diagnosing a problem in a short time and with little fanfare. They just know how things work and what needs to be done.
These rules have been, and still are difficult to quantify and apply or “Data-ify” if you will. Certain companies (and hopefully Informatica) will end up being major players in the race to datify these non-traditional rules and events, in addition to helping companies make sense out of big data in a whole new way. But in daydreaming about it, it’s not hard to imagine business systems that will eventually be able to understand the optimization rules of a business, accounting for possible unintended scenarios or consequences, and then apply them in the time when they are most needed. Anyhow, that’s the goal of a new generation of Operational Intelligence systems.
In my final post on the subject, I’ll explain how it works and business problems it solves (in a nutshell). And if I’ve managed to pique your curiosity and you want to hear about Operational Intelligence sooner, tune in to to a webinar we’re having TODAY at 10 AM PST. Here’s the link.
- Business and IT: Stop dissing each other. We all do it. Despite any platitudes about business-IT alignment, there is always griping behind closed doors. Let’s all promise to go the entire month of January without saying anything negative about the other team, and on a weekly basis express gratitude or provide positive feedback.
- Don’t let the hype fool you. Big Data. Internet of Things. Cloud/Social/Mobile (which has seemingly morphed into a single word). Hype? Definitely yes. Vaporware? Sometimes. Ignore it until “it’s real”? Definitely not. There are kernels of reality hidden in most of the hype. You have to find those kernels, and then let your mind open up to what the potential is in your own realm.
- Marry right brain with left brain. Most of us are heavy left brain people when we’re on the job. And while being data-driven, analytical and methodical are important, what separates the innovators from the followers is the spark of intuition, wisdom or creativity that is based on facts and knowledge, but not bound by it.
- Use social to discuss issues and gain knowledge rather than while away time. Social media has been extremely powerful for connecting people. But a shockingly high percentage of the social content is trivial—following celebrities; sharing selfies; updating friends on the latest meal eaten; lodging complaints about various first world problems. What if we diverted 25% of the social media time we spend on frivolous trivia to intellectual engagement and intelligent discussions about real issues? What could we change in our society if that power was unleashed?
- Use data for good. There are many uses for all the data flowing around us. Many of them are transformative— changing business models, revolutionizing industries, in some cases changing society. However, most of the ones being discussed today focus on corporate profit as opposed to societal good—think of all the investment in better targeting marketing offers to consumers. There is absolutely nothing wrong with utilizing data to grow business, and healthy businesses provide jobs, foster innovation and drive economic growth. However, business profit should not be our only goal. A few people and organizations (such as DataKind) are thinking about how to use data for good. Meaning societal good. If a few more of us carve out a portion of our time and brain power to focus on potential ways data can be harnessed to benefit our broader community, imagine the impact we could have on education, healthcare, the environment, economic hardship and the other myriad challenges we face around the world. Perhaps this wish is the most pollyanaish of them all, but I’ll keep doing what I can to forward the cause.
Shhhh… RulePoint Programmer Hard at Work
End of year. Out with the old, in with the new. A time where everyone gets their ducks in order, clears the pipe and gets ready for the New Year. For R&D, one of the gating events driving the New Year is the annual sales kickoff event where we present to Sales the new features so they can better communicate a products’ road map and value to potential buyers. All well and good. But part of the process is to fill out a Q and A that explains the product “Value Prop” and they only gave us 4 lines. I think the answer also helps determine speaking slots and priority.
So here’s the question I had to fill out -
FOR SALES TO UNDERSTAND THE PRODUCT BETTER, WE ASK THAT YOU ANSWER THE FOLLOWING QUESTION:
WHAT IS THE PRODUCT VALUE PROPOSITION AND ARE THERE ANY SIGNIFICANT DEPLOYMENTS OR OTHER CUSTOMER EXPERIENCES YOU HAVE HAD THAT HAVE HELPED TO DEFINE THE PRODUCT OFFERING?
Here’s what I wrote:
Informatica RULEPOINT is a real-time integration and event processing software product that is deployed very innovatively by many businesses and vertical industries. Its value proposition is that it helps large enterprises discover important situations from their droves of data and events and then enables users to take timely action on discovered business opportunities as well as stop problems while or before they happen.
Here’s what I wanted to write:
RulePoint is scalable, low latency, flexible and extensible and was born in the pure and exotic wilds of the Amazon from the minds of natives that have never once spoken out loud – only programmed. RulePoint captures the essence of true wisdom of the greatest sages of yesteryear. It is the programming equivalent and captures what Esperanto linguistically tried to do but failed to accomplish.
As to high availability, (HA) there has never been anything in the history of software as available as RulePoint. Madonna’s availability only pales in comparison to RulePoint’s availability. We are talking 8 Nines cubed and then squared ( ). Oracle = Unavailable. IBM = Unavailable. Informatica RulePoint = Available.
RulePoint works hard, but plays hard too. When not solving those mission critical business problems, RulePoint creates Arias worthy of Grammy nominations. In the wee hours of the AM, RulePoint single-handedly prevented the outbreak and heartbreak of psoriasis in East Angola.
One of the little known benefits of RulePoint is its ability to train the trainer, coach the coach and play the player. Via chalk talks? No, RulePoint uses mind melds instead. Much more effective. RulePoint knows Chuck Norris. How do you think Chuck Norris became so famous in the first place? Yes, RulePoint. Greenpeace used RulePoint to save dozens of whales, 2 narwhal, a polar bear and a few collateral penguins (the bear was about to eat the penguins). RulePoint has been banned in 16 countries because it was TOO effective. “Veni, Vidi, RulePoint Vici” was Julius Caesar’s actual quote.
The inspiration for Gandalf in the Lord of the Rings? RulePoint. IT heads worldwide shudder with pride when they hear the name RulePoint mentioned and know that they acquired it. RulePoint is stirred but never shaken. RulePoint is used to train the Sherpas that help climbers reach the highest of heights. RulePoint cooks Minute rice in 20 seconds.
The running of the bulls in Pamplona every year - What do you think they are running from? Yes, RulePoint. RulePoint put the Vinyasa back into Yoga. In fact, RulePoint will eventually create a new derivative called Full Contact Vinyasa Yoga and it will eventually supplant gymnastics in the 2028 Summer Olympic games.
The laws of physics were disproved last year by RulePoint. RulePoint was drafted in the 9th round by the LA Lakers in the 90s, but opted instead to teach math to inner city youngsters. 5 years ago, RulePoint came up with an antivenin to the Black Mamba and has yet to ask for any form of recompense. RulePoint’s rules bend but never break. The stand-in for the “Mind” in the movie “A Beautiful Mind” was RulePoint.
RulePoint will define a new category for the Turing award and will name it the 2Turing Award. As a bonus, the 2Turing Award will then be modestly won by RulePoint and the whole category will be retired shortly thereafter. RulePoint is… tada… the most interesting software in the world.
But I didn’t get to write any of these true facts and product differentiators on the form. No room.
Hopefully I can still get a primo slot to talk about RulePoint.
And so from all the RulePoint and Emerging Technologies team, including sales and marketing, here’s hoping you have great holiday season and a Happy New Year!
Big Data means many things to many people – it all depends on their place and perspective in the organization. But there is something for everyone.
I recently explored the advantages being seen across the enterprise in a recent special report in Database Trends & Applications, and have distilled the key points below:
For data managers, it’s all about choice. The rise of the Big Data environment has brought with it a new generation of solutions, including open source, NoSQL and NewSQL databases – not to mention Apache Hadoop and cloud-based data environments. Big Data is extremely accessible now because of low-cost solutions to capturing and analyzing unstructured forms of data that haven’t been available until recently. Consider all the sensor data – from RFID tags, from machines – that’s been floating around for the past decade. Previously, capturing and managing such data was never cheap. Now with more more inexpensive databases and tools such as Hadoop, such data is now within the realm of the smallest organizations. In addition, cloud provides almost unlimited capacity, and can support and provide big data analytics in a way that is prohibitive for most organizations.
For data scientists, analysts and quants in organizations, it’s all about capabilities. The new Big Data world is all about diving deep into datasets and being able to engage in storytelling as a way to make data come alive for the business. Open source plays a key role, through frameworks such as Hadoop and MapReduce. There is also the highly versatile R language, which is well-suited for building analytics against large and highly diverse data sets. Predictive analytics is also is another key capability made real by Big Data.
For business users across the enterprise, it’s all about collaboration. There has been a growing movement to open up analytics across the organization – pushing Big Data analysis down to all levels of decision-makers, from front-line customer service representatives to information workers. New capabilities such as cloud services, visualization and self-service enable end users without statistical training to build their own queries and draw at their own conclusions. Along with user-friendly interfaces to Big Data, there’s been a rise in pervasive BI and analytics running in the background, embedded within applications or devices, in which the end-user is oblivious to the software and data sources feeding the applications. Cloud opens up business intelligence and analytics to more users. In addition, more organizations are focusing on providing Big Data analytics through apps on mobile devices, accelerating the move toward simplified access.
For the members of the executive suite, it’s all about competitiveness. Most executives grasp the power that big data can bring to their operations, especially with performance analytics, predictive analytics and customer analytics. Employing these analytics against Big Data means better understanding customers and markets, as well as becoming aware of trends that are still bubbling beneath the surface.
Today is an exciting day for technology in high performance electronic trading. By the time you read this, the CME Group, Real Logic Ltd., and Informatica will have announced a new open source initiative. I’ve been collaborating on this work for a few months and I feel it is some great technology. I hope you will agree.
Simple Binary Encoding (SBE) is an encoding for FIX that is being developed by the FIX protocol community as part of their High Performance Working Group. The goal is to produce a binary encoding representation suitable for low-latency financial trading. The CME Group, Real Logic, and Informatica have sponsored the development of an open source implementation of an early version of the SBE specification undertaken by Martin Thompson (of Real Logic, formerly of LMAX) and myself, Todd Montgomery (of Informatica). The implementation methodology has been a very high performance encoding/decoding mechanism for data layout that is tailored to not just high performance application demands in low-latency trading. But has implications for all manner of serialization and marshaling in use cases from Big Data analytics to device data capture.
Financial institutions, and other businesses, need to serialize data structures for purposes of transmission over networks as well as for storage. SBE is a developing standard for how to encode/decode FIX data structures over a binary media at high speeds with low-latency. The SBE project is most similar to Google Protocol Buffers. However, looks are quite deceiving. SBE is an order of magnitude faster and immensely more efficient for encoding and decoding. This focus on performance means application developers can turn their attention to the application logic instead of the details of serialization. There are a number of advantages to SBE beyond speed, although, speed is of primary concern.
- SBE provides a strong typing mechanism in the form of schemas for data objects
- SBE only generates the overhead of versioning if the schema needs to handle versioning and if so, only on decode
- SBE uses an Intermediate Representation (IR) for decoupling schema specification, optimization, and code generation
- SBEs use of IR will allow it to provide various data layout optimizations in the near future
- SBE initially provides Java, C++98, and C# code generators with more on the way
What breakthrough has lead to SBE being so fast?
It isn’t new or a breakthrough. SBE has been designed and implemented with the concepts and tenants of Mechanical Sympathy. Most software is developed with abstractions to mask away the details of CPU architecture, disk access, OS concepts, etc. Not so for SBE. It’s been designed with Martin and I utilizing everything we know about how CPUs, memory, compilers, managed runtimes, etc. work and making it very fast and work _with_ the hardware instead of against it.
Martin’s Blog will have a more detailed-oriented, technical discussion sometime later on SBE. But I encourage you to look at it and try it out. The work is open to the public under an Apache Public License.
Todd L. Montgomery is a Vice President of Architecture for Informatica and the chief designer and implementer of the 29West low latency messaging products. The Ultra Messaging product family (formerly known as LBM) has over 190 production deployments within electronic trading across many asset classes and pioneered the broker-less messaging paradigm. In the past, Todd has held architecture positions at TIBCO and Talarian as well as lecture positions at West Virginia University, contributed to the IETF, and performed research for NASA in various software fields. With a deep background in messaging systems, high performance systems, reliable multicast, network security, congestion control, and software assurance, Todd brings a unique perspective tempered by over 20 years of practical development experience.