Everlast Software Technical

Saturday, January 06, 2007

Data Structures


Data structures are one of the first classes taught at a university. They are universal in nature, and span all languages and development environments. The concepts apply to all applications developed that rely on storing/retrieving data and doing some type of processing. These are the most numerous applications in existence. Therefore, data structures are a huge part in application performance.

Data structures should be utilized based on the type of task that will be performed. For example, if one is wanting to quickly search a list of items many times, some type of hashing data structure is the best choice. If one is only going to be storing information, and processing each item one at a time in a sequential fashion, an array may be the best choice.

It must be said that choosing any particular data structure will always have trade offs. If a data structure is useful in a particular case, that means it won't be as useful in a different case. Hashing mechanisms aren't great at processing each item one at a time in a sequential manner. Likewise, arrays aren't great for searching quickly many times.

Not only are there different types of data structures, but there are different flavors of types.

The comparison that will be performed in this particular article is between a Java Hashtable and HashMap. They are both hashing data structures. They are both great for quickly finding items. The only difference between the two is one is synchronized (Hashtable) and the other isn't (HashMap).

The synchronized aspect of the Hashtable means it is safe to use in multi-threaded environments. The HashMap is not safe unless the developer provides his/her own synchronization. We will look at a couple different small programs that test the performance using multiple threads and single threads.

The first Java program compares the execution times for looping 5,000,000 times (with 5 iterations) while reading and writing to a HashMap and Hashtable on a single thread. Remember, the Hashtable is only beneficial if using multiple threads. Here are the results:

HashMap timings:

8432 ms.
8492 ms.
8582 ms.
8442 ms.
8453 ms.

Hashtable timings:

10455 ms.
10415 ms.
10465 ms.
10515 ms.
10485 ms.

It is obvious that even though the program is only using a single thread, Java does not make a distinction when doing synchronization locks with the Hashtable.

Now we will compare the times for 5 threads trying to read and write to the data structures at the same time. Since the HashMap has no synchronizing mechanism, one was provided around an inner loop (read and increment) iteration in order to provide safety.

The results are as follows:

Hashtable time: 8251 ms.
Hashtable time: 8792 ms.
Hashtable time: 9543 ms.
Hashtable time: 9854 ms.
Hashtable time: 11817 ms.
Hashtable time: 8603 ms.
Hashtable time: 8863 ms.
Hashtable time: 9934 ms.
Hashtable time: 8422 ms.
Hashtable time: 12428 ms.
Hashtable time: 9634 ms.
Hashtable time: 10135 ms.
Hashtable time: 9033 ms.
Hashtable time: 9664 ms.
Hashtable time: 9754 ms.
Hashtable time: 9343 ms.
Hashtable time: 8102 ms.
Hashtable time: 9984 ms.
Hashtable time: 9523 ms.
Hashtable time: 11677 ms.
HashMap time: 45255 ms.
Hashtable time: 10265 ms.
Hashtable time: 8792 ms.
Hashtable time: 9383 ms.
Hashtable time: 9043 ms.
Hashtable time: 5187 ms.
HashMap time: 48890 ms.
HashMap time: 48910 ms.
HashMap time: 49140 ms.
HashMap time: 49220 ms.
HashMap time: 14230 ms.
HashMap time: 11466 ms.
HashMap time: 11186 ms.
HashMap time: 11557 ms.
HashMap time: 11757 ms.
HashMap time: 11847 ms.
HashMap time: 11246 ms.
HashMap time: 11647 ms.
HashMap time: 12058 ms.
HashMap time: 12067 ms.
HashMap time: 11917 ms.
HashMap time: 11837 ms.
HashMap time: 11397 ms.
HashMap time: 12398 ms.
HashMap time: 12748 ms.
HashMap time: 11968 ms.
HashMap time: 11276 ms.
HashMap time: 11096 ms.
HashMap time: 11476 ms.
HashMap time: 11316 ms.

If we add up all the times for the HashMap execution vs Hashtable, it is clear Hashtable is the clear winner in a multi-threaded environment (some of the HashMap threads were idle for quite a while).

The example source code can be obtained here:

http://www.everlastsoftware.com/examples/source/java/HashExample.zip

In order to execute single threaded, run 'java -Xmx512M HashExample %TIMES_TO_LOOP%'

  • %TIMES_TO_LOOP% = Number of times to loop.

In order to execute multi threaded, run 'java -Xmx512M HashMultiThreadExample %TIMES_TO_LOOP%'

  • %TIMES_TO_LOOP% = Number of times to loop.

A batch file was supplied for execution on Windows. Simply modify the filename and time to loop in the batch file before execution.