Speed and size often have some type of impact on each other. Usually something smaller is faster. Why is this? Anytime less data needs to be compared, loaded, etc., there will be less execution statements for the CPU to handle. But, size can often be misleading as we will see in this article. There were two sample Java programs created for speed vs size comparisons. One example will demonstrate how smaller means faster while another will show the opposite. Sometimes increasing the size of something can cleverly speed up execution time based on hidden factors.
Example #1
The first example demonstrates how larger member variable names in Java can have a signficiant impact on speed. Many may assume that member variable names would have no impact, but they would be dead wrong. Unless an obfuscator is used, Java will embed the names of the variables inside the byte code (.class files). This is so reflection may be used to discover the names of variables and utilize them in fancy ways. While this is very powerful and allows for very flexible programming, it does so at a price. The price is an increase in size and a decrease in speed.
There are three phases to the example. All phases loop a total of 10,000,000 times for the class with the smaller variable name and 10,000,000 for the class with the larger variable name. Phase one simply instantiates both classes. Phase two instantiates and calls 'get' and 'set' methods to retrieve and set the value of the inner member variables. Phase three instantiates outside the 10,000,000 count loop but does 'get' and 'set' the variable value inside.
The variable names used specifically are 'var' and 'this_is_a_very_long_variable_name'. Both are data types of 'int'.
The following is the comparison of times for the smaller variable name vs. the larger variable name:
Instantiation looping:
- Loop count 10000000 for small variable: 551 ms.
- Loop count 10000000 for large variable: 8142 ms.
- Loop count 10000000 for small variable: 490 ms.
- Loop count 10000000 for large variable: 8553 ms.
- Loop count 10000000 for small variable: 490 ms.
- Loop count 10000000 for large variable: 8663 ms.
- Loop count 10000000 for small variable: 601 ms.
- Loop count 10000000 for large variable: 8492 ms.
- Loop count 10000000 for small variable: 501 ms.
- Loop count 10000000 for large variable: 9123 ms.
Instantiation and get/set looping:
- Loop count 10000000 for small variable: 521 ms.
- Loop count 10000000 for large variable: 8552 ms.
- Loop count 10000000 for small variable: 541 ms.
- Loop count 10000000 for large variable: 8472 ms.
- Loop count 10000000 for small variable: 531 ms.
- Loop count 10000000 for large variable: 8502 ms.
- Loop count 10000000 for small variable: 531 ms.
- Loop count 10000000 for large variable: 9053 ms.
- Loop count 10000000 for small variable: 540 ms.
- Loop count 10000000 for large variable: 8513 ms.
Get/set looping:
- Loop count 10000000 for small variable: 30 ms.
- Loop count 10000000 for large variable: 330 ms.
- Loop count 10000000 for small variable: 40 ms.
- Loop count 10000000 for large variable: 321 ms.
- Loop count 10000000 for small variable: 30 ms.
- Loop count 10000000 for large variable: 320 ms.
- Loop count 10000000 for small variable: 30 ms.
- Loop count 10000000 for large variable: 311 ms.
- Loop count 10000000 for small variable: 40 ms.
- Loop count 10000000 for large variable: 320 ms.
The results clearly indicate that the shorter variable leads to major speed improvements. Of course, it must be mentioned that this is when looping 10,000,000 times. However, any optimization is an improvement, and little changes here and there can lead to large overall improvements for an application.
The example source code can be obtained here:
http://www.everlastsoftware.com/examples/source/java/SizeSpeedDecreaseExample.zip
In order to execute, run 'java VariableSizeTest'
A batch file was supplied for execution on Windows.
Example #2
The second example also demonstrates that larger can have an impact on speed. This time, it's the other way around however. Increasing the size of the compiled Java code leads to an improvement in speed. How is this possible? A simple concept called loop unrolling. Loops are used in almost every single program. It saves the developer a lot of time by preventing having to type statements over and over. It also allows for dealing with dynamic sizes of data very easily. This ultimately means the size of the program will be much smaller when loops are utilized. The condition that causes this decrease in size to increase execution time is a simple variable check and a jump (goto). All loops must check some condition and jump to a different execution point in the program in order to repeat or exit. If the check wasn't performed, the loop would be an infinite loop (which all of us have experienced as beginners). Also, if the jump wasn't performed, the code would only loop one time (useless loop).
Loop unrolling is the process of manually coding each statement that would be performed in the loop multiple times. This is generally only possible if the total number of loop executions is known and will always be the same. The other possible usage is when one knows the absolute max number of loop executions. If the latter is the case, nested if statements can be placed before each set of statements, having the same effect as the condition checks of a loop. How does this actually improve speed? You eliminate the manditory jump (goto) at the bottom of every loop since the program counter of an application automatically goes to the next instruction if not told otherwise.
There are four comparisons done in example #2. There is one example of traditional looping, followed by three examples of doing loop unrolling. The first loop unrolling (slowest of the three) involves dynamic sizes of up to 15 loops. Of course, our specific example always does 10 because of achieving fair time comparisons. Nevertheless, the concept of dynamic loop unrolling will be made obvious. The second loop unrolling example utilizes nested 'if' statements. This leads to a significant performance enchancement over the previous unrolling method. It does lead to messier code however (sometimes the price to pay for optimizing). Lastly, a static unrolled loop shows its raw speed benefits. It is much more limited in it's application however, since the number of loop executions must always be the same and it must be known ahead of time.
The following is the comparison of times for the four looping methods:
Traditional looping:
- Loop count 20000000 for traditional loop: 691 ms.
- Loop count 20000000 for traditional loop: 671 ms.
- Loop count 20000000 for traditional loop: 682 ms.
- Loop count 20000000 for traditional loop: 691 ms.
- Loop count 20000000 for traditional loop: 821 ms.
- Loop count 20000000 for traditional loop: 771 ms.
- Loop count 20000000 for traditional loop: 701 ms.
- Loop count 20000000 for traditional loop: 671 ms.
- Loop count 20000000 for traditional loop: 671 ms.
- Loop count 20000000 for traditional loop: 691 ms.
Dynamic unrolled looping:
- Loop count 20000000 for unrolled loop: 321 ms.
- Loop count 20000000 for unrolled loop: 440 ms.
- Loop count 20000000 for unrolled loop: 411 ms.
- Loop count 20000000 for unrolled loop: 471 ms.
- Loop count 20000000 for unrolled loop: 440 ms.
- Loop count 20000000 for unrolled loop: 431 ms.
- Loop count 20000000 for unrolled loop: 421 ms.
- Loop count 20000000 for unrolled loop: 440 ms.
- Loop count 20000000 for unrolled loop: 451 ms.
- Loop count 20000000 for unrolled loop: 441 ms.
Dynamic nested unrolled looping:
- Loop count 20000000 for unrolled loop: 200 ms.
- Loop count 20000000 for unrolled loop: 180 ms.
- Loop count 20000000 for unrolled loop: 170 ms.
- Loop count 20000000 for unrolled loop: 171 ms.
- Loop count 20000000 for unrolled loop: 190 ms.
- Loop count 20000000 for unrolled loop: 180 ms.
- Loop count 20000000 for unrolled loop: 181 ms.
- Loop count 20000000 for unrolled loop: 170 ms.
- Loop count 20000000 for unrolled loop: 170 ms.
- Loop count 20000000 for unrolled loop: 190 ms.
Static unrolled looping:
- Loop count 20000000 for unrolled loop: 51 ms.
- Loop count 20000000 for unrolled loop: 60 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
- Loop count 20000000 for unrolled loop: 40 ms.
The tests show that traditional looping is by far the slowest. The friendly dynamic unrolled loop is roughly 60% faster. The dynamic nested unrolled loop is more than twice as fast as the friendly dynamic version. Finally, the static unrolled loop is approximately four times faster than the best dynamic loop and more than 16 times faster than the traditional loop!
If we compare the size of the Java class files (bytecode), we see the following:
- Traditional loop: 980
- Dynamic Unrolled: 1298
- Dynamic Nested Unrolled: 1324
- Static Unrolled: 1065
The smallest performed the worst while the largest performed second best (best if you consider flexibility important). Even though the static unrolled example is not much larger than the traditional loop example, it provided the largest increase in speed (at the cost of flexibility and much more limited application).
The example source code can be obtained here:
http://www.everlastsoftware.com/examples/source/java/SizeSpeedIncreaseExample.zip
In order to execute, run 'java LoopSizeTest'
A batch file was supplied for execution on Windows.
This article was created to help one understand that while size and speed are very closely related, increases and decreases in size can both have positive impacts on speed depending on the situation.
