Multi-Threading
Most computers today are coming with either two processors, dual core processors (which is very similar to two processors), or hyperthreading (again, similar to two processors although not as good). If an application isn't designed to take advantage of this, it is not taking advantage where it should be. Having the ability to process in parallel is where one of the hugest performance gains can be made.
Some applications, such as a word editor, do not require much usage of the processor. Having multiple processors will most likely not help. Many other applications, such as compression programs, development tools, and most, games, should really take advantage of parallel processing whenever possible.
How does one take advantage of parallel processing? As mentioned in several previous articles, by making the application multi-threaded.
For those who don't know what a thread is, it is basically a lightweight process that has it's own local stack memory. A thread shares global memory (heap memory) with all the other threads in the same process. A thread has a unique identifier just like a process. The operating system uses these identifiers to allow certain ones to run at various times. With 2 or more processors, the operating system has the ability to run these at the same time instead of shuffling through them and giving each a little bit of time on the single processor. Sounds easy enough right? Wrong! Multi-threaded programming is very difficult for many people to get used to. The fact that two or more things can be going on at exactly the same time often leads to confusion. Any more information about threading (except performance) is out of scope for this article.
The practical example that shall be demonstrated in this blog is about reading multiple files into memory from disk at the same time and compressing them with Java's ZIP libraries. This could be useful for perhaps multiple uploading files to a remote location. The performance comparisons will be between a single thread doing all the work vs. two threads doing the work. The processor being used for testing is a 32-bit Intel Pentium Dual-Core running at 2.13 GHz on Windows Vista.
The files that will be used as test data (Canterbury Corpus) can be downloaded from the Archive Comparison Test site: http://compression.ca/act/act-files.html
The first Java program created processed using a single thread. The program processed each file as a batch 5 times. This largely removes the disk access from the equation since the operating system caches the files after their first access.
Here are the results:
Single thread timings:
Read 148481 bytes and compressed to 53635 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Total time: 345 ms.
Read 148481 bytes and compressed to 53635 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Total time: 337 ms.
Read 148481 bytes and compressed to 53635 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Total time: 325 ms.
Read 148481 bytes and compressed to 53635 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Total time: 336 ms.
Read 148481 bytes and compressed to 53635 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Total time: 329 ms.
The second Java program utilizes two threads per batch of files. Two new threads are started after one batch has been processed. Still 5 iterations.
Multiple thread timings:
Read 513216 bytes and compressed to 56465 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Read 148481 bytes and compressed to 53635 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Total time: 289 ms.
Read 513216 bytes and compressed to 56465 bytes.
Read 148481 bytes and compressed to 53635 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Total time: 290 ms.
Read 24603 bytes and compressed to 7961 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 148481 bytes and compressed to 53635 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 125179 bytes and compressed to 48899 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Total time: 239 ms.
Read 125179 bytes and compressed to 48899 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 148481 bytes and compressed to 53635 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Total time: 263 ms.
Read 125179 bytes and compressed to 48899 bytes.
Read 3721 bytes and compressed to 1222 bytes.
Read 419235 bytes and compressed to 143107 bytes.
Read 513216 bytes and compressed to 56465 bytes.
Read 4227 bytes and compressed to 1737 bytes.
Read 148481 bytes and compressed to 53635 bytes.
Read 1029744 bytes and compressed to 203992 bytes.
Read 11150 bytes and compressed to 3123 bytes.
Read 38240 bytes and compressed to 12990 bytes.
Read 471162 bytes and compressed to 193734 bytes.
Read 24603 bytes and compressed to 7961 bytes.
Total time: 231 ms.
The results speak for themselves. The total average time for the single thread was 334 milliseconds while the total average time for the two threads was 264 milliseconds. One would expect the time to be cut in half, but that is usually not the case. That would be the best possible scenario. There are many other things going on that decrease the chance of reaching this ideal. For example, the operating system must shuffle through the threads, there are input/output operations occurs to read the files, etc. Never-the-less, this is about a 20% increase in performance which is very substantial. Of course all optimizations and design decisions have a cost (complexity in this case).
The example source code to these two examples can be found here:
http://www.everlastsoftware.com/examples/source/java/MultiThreadExample.zip
In order to execute single threaded, run 'java SingleThreadExample %FILE1% %FILE2% ..."
- %FILEx% is a file name to read and compress.
In order to execute multi-threaded, run 'java MultiThreadExample %FILE1% %FILE2% ..."
- %FILEx% is a file name to read and compress.
