Everlast Software Technical

Monday, July 24, 2006

Remote Communication


Almost every piece of software written today requires some form of remote communication over a network. Communication across various hardware mediums can often be the source of major bottlenecks. While distributed communication speeds have improved drastically the past decade, they never-the-less are still the bane of many systems. There is no way to get around the laws of physics.

One of the best ways to improve the performance of application requiring remote communication is to decrease the need for remote calls. This is an obvious statement that is often underestimated. Put more of the load on the computer itself and less on the network.

So, how does one go about reducing the dependency on remote calls? By batching up multiple calls into a single call. Unfortunately, this can often lead to a less generic mechanism for communication. However, if we've learned anything, it's that optimization can sometimes come with a price. Sometimes the price is more than worth it though, like in the case of reducing network traffic.

One may ask how batching up calls can improve performance when the same total size of information is send back? Well, there are many factors that come into play:

  • Minimizes latency
  • Reduces overall load on computer because of less processing
  • Compression algorithms work better when they have more data to work with
  • All calls must have source/destination information which does take up a little bandwidth

Latency is the number one factor as to why batching should be used instead of multiple calls. The absolute maximum speed one can obtain on any kind of network would be the speed of light (at least with today's technology). Most of the time, this is reduced quite a bit by electric resistance, internal router latencies, etc. But even at the speed of light, when making thousands or millions of calls, the latency adds up. It can end up being a few seconds or more depending on the situation. All of this latency could be eliminated by a single call. The single call does not solve bandwidth issues (although compression does usually work better when there is more data to deal with, thus a single call can help with bandwidth issues in that regard), but it does address the latency issue.

The other reasons mentioned for batching, as well as others not mentioned, should also not be taken lightly. They all add up to drastically improve the performance.

A sample Java program has been created to demonstrate how much of a difference batching a call can make. The example is a very simple Client/Server application that calls a remote computer to obtain a name. The first timing is for the retrieval of the name in 3 parts (first, middle, last name). The second timing is the batch single call retrieving the name all at once (full name). Each set of calls are done 100 times to emulate heavy network traffic.

The network utilized was a 100 megabit Ethernet setup utilizing a hub. The full name obtained is 'John Wayne Doe'.

The following is a comparison of the multiple calls (300 total) vs the single call (100 total):

Multiple calls:
2434 milliseconds.

Single batch call:
631 milliseconds.

As you can see, the 100 single batch calls was roughly 4 times faster than the 100 multiple calls.

This was a very simple test just to demonstrate how much of a difference batching remote calls can make. Increasing the amount of data being transferred generally decreases the potential performance improvements. However, as mentioned earlier, compression can then be utilized to gain even more improvements when a larger amount of data is being transfered.

The example source code can be obtained here:

http://www.everlastsoftware.com/examples/source/java/ClientServerExample.zip

In order to execute, run 'java Server' on one machine and 'java Client SERVER' on another machine. The 'SERVER' is the host name on which the Server class is executing. Port 31111 is utilized by default.

A few sample batch files were supplied for execution on Windows.