Tag Archives: latency
Data integration is one of the most important concepts that enterprises should be dealing within 2013. Data integration provides the ability to extract information from source systems, and move the information to target systems that need to act upon the information in some way. As the number of systems have increased and become more complex, the need to dive deeper into data integration becomes more apparent and urgent.
Data integration allows us to approach the real-time enterprise, where all processes and systems have the ability to see into all other processes and systems, and react to optimize the business in (near) real-time. This concept has been mulled over for years, but has yet to become a reality for most enterprises. (more…)
You can think of big data as approaches and mechanisms that can manage and process petabytes of structured and unstructured data that may be centralized or distributed. Or, a single approach and technology for getting at most relevant data no matter the size or the structure.
Indeed, big data provides us with the power to leverage information that would normally not be accessible, or cause way too much latency than practical when searching through the data. (more…)
Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 3
OK, so in our last two discussions, we looked at the memory bottleneck and how even in high performance environments, there are still going to be situations in which streaming the data to where it needs to be will introduce latencies that throttle processing speed. And I also noted that we needed some additional strategies to address that potential bottleneck, so here goes: (more…)
Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 2
In my last posting, I suggested that the primary bottleneck for performance computing of any type, including big data applications, is the latency associated with getting data from where it is to where it needs to be. If the presumptive big data analytics platform/programming model is Hadoop, (which is also often presumed to provide in-memory analytics), though, there are three key issues: (more…)
Darren Cunningham, in his recent blog post How to Migrate To The Cloud, made some great points around the use of staging for data integration for cloud computing. The reasons he would leverage a staging area for cloud computing include:
- It enables better business control before the data is pushed from one system to the other.
- It enables tracking and reconciliation of a business process.
- It enables the addition of new sources or targets with reuse instead of building the spaghetti plate of point to point direct interfaces. It responds to the SOA paradigm.
- It breaks the dependencies between the two systems enabling asynchronous synchronization or synchronous with different size of data set (single message or bulk). (more…)
None of our code runs in the OS kernel, but many kernel and device driver settings impact the performance of our code in user space. One such setting is the size of the receive ring buffer on a NIC. This buffer holds Ethernet frames that have arrived until the OS can read them. Interrupt coalescing settings can change the demand for ring buffer space (or “descriptors”). We’ve recently refreshed one of our documentation pages on interrupt coalescing, but we’ll stick to the specific question of sizing the receive ring buffer in this post.
The big benefit of increasing the size of the receive ring buffer is loss avoidance when the kernel is slow at servicing interrupts. Some of the Intel NICs we’ve used max out at 2048 “descriptors” in the ring buffer. A ring buffer that size would take 24 ms to fill at 1 gbps with 1500-byte frames, but only 1 ms to fill with 64-byte frames (the smallest-possible frames). Those times are computed from ring buffer size in frames * max frame size / wire speed.
You can see that the kernel has between 1 ms and 24 ms (depending on frame size) to service a receive interrupt while still avoiding loss due to ring buffer overflow. It may not make sense to add more latency in the name of loss avoidance than you’d suffer repairing the loss. Going for efficient loss repair can add more latency than the largest-possible ring buffer latency, so going for the max often makes sense.
This also means that your application may be reading data from the NIC that arrived 24 ms or more ago and not know it. Don’t go for a large ring buffer if your application would rather have loss than buffering latency. It’s best not to think of latency as a consequence of increasing the ring buffer size. Instead, think of it as a consequence of having a kernel that takes time to service an interrupt. The buffer is just there for loss avoidance. Any latency apparently due to the ring buffer is really due to kernel interrupt service latency.
One other thing to keep in mind is that the ring buffer probably occupies physical memory on the machine. In the case of 2048 descriptors of 2048-bytes each, that’s 4 MB of physical memory. Probably not a biggie on a box with 4 GB of ram.
Finally, you might want to check and see if there are independent send and receive ring buffers on your NIC or if they’re shared. A box that did a lot of sending and a lot of receiving on a NIC with shared buffers might not want to give too many to the receive side.
STAC Research has published a new STAC Report that shows LBM average latency under 20 microseconds using 10-gigabit Ethernet and kernel bypass technology from Solarflare and a Cisco 4900M switch. This is 30 microseconds faster than what we typically measure for 1-gigabit Ethernet using a standard Linux kernel stack. Our customers are always looking to save microseconds and this report demonstrates a way our customers on gigE today can slash their latency by over half as they upgrade to 10 gigE tomorrow.
In addition to low average latency, the report showed consistently low variance in latency from message to message (latency jitter) for rates up to 125,000 messages per second. The 99.9th percentile did not exceed 63 microseconds and the standard deviation of the latency was always 6 microseconds or less. This is impressively low jitter for our customers who care most about consistent operation from message to message.
Some applications require higher single-stream rates than this system could deliver without taking efficiency into account. The report goes on to perform tests with the system configured for best efficiency, thus allowing higher throughput at the expense of some additional latency as rates rise. Single-stream rates of 860,000 messages per second were reached. The latency varied with test conditions, but it was rare to see average latencies in excess of 60 microseconds.
The report tested throughput as well as latency. Using small messages (64-bytes), 3 publishing applications on a single server could generate almost 2.5 million messages per second. With larger (1204-byte) messages, 3 publishing applications on a single server could generate 550,000 messages per second driving the network to over 4.5 gbps.
The full report contains a wealth of detail on the equipment and test methodologies used. Of particular interest to the latency obsessed will be section 2.2. It details the system tuning including MTU, System Management Interrupt (SMI) settings, core shielding and interrupt affinity, and other settings used to achieve these results.
Many Cisco, Solarflare, and 29West customers use UDP multicast for market data distribution so these tests were all performed with UDP multicast even though there was only one receiver for each source. Other benchmarks as well as many customer deployments have shown scalable and stable results with our reliable multicast protocols.
These tests were run using Novell’s SUSE Linux Enterprise Real Time (SLERT) 10 SP2 64-bit on IBM x3650 servers with dual quad Xeon Clovertowns at 2.66 GHz. Many of our customers are showing interest in using real-time operating systems as a way of reducing latency jitter so we were eager to have this opportunity to test with SLERT.
Please see the full report for more details.