<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-23234885</id><updated>2011-06-13T07:23:44.724-05:00</updated><title type='text'>Everlast Software Technical</title><subtitle type='html'>Computer concepts and technical articles for software engineers, developers, architects, etc. Examples provided in Java.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://everlastsoftware.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>6</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-23234885.post-1756069315945497410</id><published>2007-12-15T05:57:00.000-06:00</published><updated>2007-12-15T09:35:27.764-06:00</updated><title type='text'></title><content type='html'>&lt;div align="center"&gt;&lt;span style="color:#ff0000;"&gt;Multi-Threading&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Most computers today are coming with either two processors, dual core processors (which is very similar to two processors), or hyperthreading (again, similar to two processors although not as good).  If an application isn't designed to take advantage of this, it is not taking advantage where it should be.  Having the ability to process in parallel is where one of the hugest performance gains can be made.&lt;br /&gt;&lt;br /&gt;Some applications, such as a word editor, do not require much usage of the processor.  Having multiple processors will most likely not help.  Many other applications, such as compression programs, development tools, and most, games, should really take advantage of parallel processing whenever possible.&lt;br /&gt;&lt;br /&gt;How does one take advantage of parallel processing?  As mentioned in several previous articles, by making the application multi-threaded.&lt;br /&gt;&lt;br /&gt;For those who don't know what a thread is, it is basically a lightweight process that has it's own local stack memory.  A thread shares global memory (heap memory) with all the other threads in the same process.  A thread has a unique identifier just like a process.  The operating system uses these identifiers to allow certain ones to run at various times.  With 2 or more processors, the operating system has the ability to run these at the same time instead of shuffling through them and giving each a little bit of time on the single processor.  Sounds easy enough right?  Wrong!  Multi-threaded programming is very difficult for many people to get used to.  The fact that two or more things can be going on at exactly the same time often leads to confusion.  Any more information about threading (except performance) is out of scope for this article.&lt;br /&gt;&lt;br /&gt;The practical example that shall be demonstrated in this blog is about reading multiple files into memory from disk at the same time and compressing them with Java's ZIP libraries.  This could be useful for perhaps multiple uploading files to a remote location.  The performance comparisons will be between a single thread doing all the work vs. two threads doing the work.  The processor being used for testing is a 32-bit Intel Pentium Dual-Core running at 2.13 GHz on Windows Vista.&lt;br /&gt;&lt;br /&gt;The files that will be used as test data (Canterbury Corpus) can be downloaded from the Archive Comparison Test site: &lt;a href="http://compression.ca/act/act-files.html"&gt;http://compression.ca/act/act-files.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The first Java program created processed using a single thread.  The program processed each file as a batch 5 times.  This largely removes the disk access from the equation since the operating system caches the files after their first access.&lt;br /&gt;&lt;br /&gt;Here are the results:&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#3333ff;"&gt;Single thread timings:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 345 ms.&lt;br /&gt;&lt;/span&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 337 ms.&lt;br /&gt;&lt;/span&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 325 ms.&lt;br /&gt;&lt;/span&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 336 ms.&lt;br /&gt;&lt;/span&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 329 ms.&lt;/span&gt;&lt;br /&gt;&lt;span style="color:#ff0000;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color:#000000;"&gt;The second Java program utilizes two threads per batch of files.  Two new threads are started after one batch has been processed.  Still 5 iterations.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#3333ff;"&gt;Multiple thread timings:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 289 ms.&lt;br /&gt;&lt;/span&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 290 ms.&lt;br /&gt;&lt;/span&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 239 ms.&lt;br /&gt;&lt;/span&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;&lt;span style="font-size:100%;color:#ff0000;"&gt;Total time: 263 ms.&lt;br /&gt;&lt;/span&gt;Read 125179 bytes and compressed to 48899 bytes.&lt;br /&gt;Read 3721 bytes and compressed to 1222 bytes.&lt;br /&gt;Read 419235 bytes and compressed to 143107 bytes.&lt;br /&gt;Read 513216 bytes and compressed to 56465 bytes.&lt;br /&gt;Read 4227 bytes and compressed to 1737 bytes.&lt;br /&gt;Read 148481 bytes and compressed to 53635 bytes.&lt;br /&gt;Read 1029744 bytes and compressed to 203992 bytes.&lt;br /&gt;Read 11150 bytes and compressed to 3123 bytes.&lt;br /&gt;Read 38240 bytes and compressed to 12990 bytes.&lt;br /&gt;Read 471162 bytes and compressed to 193734 bytes.&lt;br /&gt;Read 24603 bytes and compressed to 7961 bytes.&lt;br /&gt;&lt;/span&gt;&lt;span style="color:#ff0000;"&gt;Total time: 231 ms.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="color:#ff0000;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color:#000000;"&gt;The results speak for themselves.  The total average time for the single thread was 334 milliseconds while the total average time for the two threads was 264 milliseconds.  One would expect the time to be cut in half, but that is usually not the case.  That would be the best possible scenario.  There are many other things going on that decrease the chance of reaching this ideal.  For example, the operating system must shuffle through the threads, there are input/output operations occurs to read the files, etc.  Never-the-less, this is about a 20% increase in performance which is very substantial.  Of course all optimizations and design decisions have a cost (complexity in this case).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The example source code to these two examples can be found here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/MultiThreadExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/MultiThreadExample.zip&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In order to execute single threaded, run 'java SingleThreadExample %FILE1% %FILE2% ..."&lt;br /&gt;&lt;ul&gt;&lt;li&gt;%FILEx% is a file name to read and compress.&lt;/li&gt;&lt;/ul&gt;The program can take an unlimited number of files (within memory limitations of course).&lt;br /&gt;&lt;br /&gt;In order to execute multi-threaded, run 'java MultiThreadExample %FILE1% %FILE2% ..."&lt;br /&gt;&lt;ul&gt;&lt;li&gt;%FILEx% is a file name to read and compress.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-1756069315945497410?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/1756069315945497410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/1756069315945497410'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2007/12/multi-threading-most-computers-today.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-23234885.post-116809031823965847</id><published>2007-01-06T07:22:00.000-06:00</published><updated>2007-01-27T09:20:13.306-06:00</updated><title type='text'></title><content type='html'>&lt;div align="center"&gt;&lt;span style="color:#ff0000;"&gt;Data Structures&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;Data structures are one of the first classes taught at a university. They are universal in nature, and span all languages and development environments. The concepts apply to all applications developed that rely on storing/retrieving data and doing some type of processing. These are the most numerous applications in existence. Therefore, data structures are a huge part in application performance.&lt;br /&gt;&lt;br /&gt;Data structures should be utilized based on the type of task that will be performed. For example, if one is wanting to quickly search a list of items many times, some type of hashing data structure is the best choice. If one is only going to be storing information, and processing each item one at a time in a sequential fashion, an array may be the best choice.&lt;br /&gt;&lt;br /&gt;It must be said that choosing any particular data structure will always have trade offs. If a data structure is useful in a particular case, that means it won't be as useful in a different case. Hashing mechanisms aren't great at processing each item one at a time in a sequential manner. Likewise, arrays aren't great for searching quickly many times.&lt;br /&gt;&lt;br /&gt;Not only are there different types of data structures, but there are different flavors of types.&lt;br /&gt;&lt;br /&gt;The comparison that will be performed in this particular article is between a Java Hashtable and HashMap. They are both hashing data structures. They are both great for quickly finding items. The only difference between the two is one is synchronized (Hashtable) and the other isn't (HashMap).&lt;br /&gt;&lt;br /&gt;The synchronized aspect of the Hashtable means it is safe to use in multi-threaded environments. The HashMap is not safe unless the developer provides his/her own synchronization. We will look at a couple different small programs that test the performance using multiple threads and single threads. &lt;/p&gt;&lt;p&gt;The first Java program compares the execution times for looping 5,000,000 times (with 5 iterations) while reading and writing to a HashMap and Hashtable on a single thread. Remember, the Hashtable is only beneficial if using multiple threads. Here are the results:&lt;/p&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;HashMap timings:&lt;/span&gt;&lt;/p&gt;8432 ms.&lt;br /&gt;8492 ms.&lt;br /&gt;8582 ms.&lt;br /&gt;8442 ms.&lt;br /&gt;8453 ms.&lt;br /&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Hashtable timings:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;10455 ms.&lt;br /&gt;10415 ms.&lt;br /&gt;10465 ms.&lt;br /&gt;10515 ms.&lt;br /&gt;10485 ms. &lt;/p&gt;&lt;p&gt;It is obvious that even though the program is only using a single thread, Java does not make a distinction when doing synchronization locks with the Hashtable.&lt;/p&gt;&lt;p&gt;Now we will compare the times for 5 threads trying to read and write to the data structures at the same time. Since the HashMap has no synchronizing mechanism, one was provided around an inner loop (read and increment) iteration in order to provide safety.&lt;/p&gt;&lt;p&gt;The results are as follows:&lt;/p&gt;&lt;p&gt;Hashtable time: 8251 ms.&lt;br /&gt;Hashtable time: 8792 ms.&lt;br /&gt;Hashtable time: 9543 ms.&lt;br /&gt;Hashtable time: 9854 ms.&lt;br /&gt;Hashtable time: 11817 ms.&lt;br /&gt;Hashtable time: 8603 ms.&lt;br /&gt;Hashtable time: 8863 ms.&lt;br /&gt;Hashtable time: 9934 ms.&lt;br /&gt;Hashtable time: 8422 ms.&lt;br /&gt;Hashtable time: 12428 ms.&lt;br /&gt;Hashtable time: 9634 ms.&lt;br /&gt;Hashtable time: 10135 ms.&lt;br /&gt;Hashtable time: 9033 ms.&lt;br /&gt;Hashtable time: 9664 ms.&lt;br /&gt;Hashtable time: 9754 ms.&lt;br /&gt;Hashtable time: 9343 ms.&lt;br /&gt;Hashtable time: 8102 ms.&lt;br /&gt;Hashtable time: 9984 ms.&lt;br /&gt;Hashtable time: 9523 ms.&lt;br /&gt;Hashtable time: 11677 ms.&lt;br /&gt;HashMap time: 45255 ms.&lt;br /&gt;Hashtable time: 10265 ms.&lt;br /&gt;Hashtable time: 8792 ms.&lt;br /&gt;Hashtable time: 9383 ms.&lt;br /&gt;Hashtable time: 9043 ms.&lt;br /&gt;Hashtable time: 5187 ms.&lt;br /&gt;HashMap time: 48890 ms.&lt;br /&gt;HashMap time: 48910 ms.&lt;br /&gt;HashMap time: 49140 ms.&lt;br /&gt;HashMap time: 49220 ms.&lt;br /&gt;HashMap time: 14230 ms.&lt;br /&gt;HashMap time: 11466 ms.&lt;br /&gt;HashMap time: 11186 ms.&lt;br /&gt;HashMap time: 11557 ms.&lt;br /&gt;HashMap time: 11757 ms.&lt;br /&gt;HashMap time: 11847 ms.&lt;br /&gt;HashMap time: 11246 ms.&lt;br /&gt;HashMap time: 11647 ms.&lt;br /&gt;HashMap time: 12058 ms.&lt;br /&gt;HashMap time: 12067 ms.&lt;br /&gt;HashMap time: 11917 ms.&lt;br /&gt;HashMap time: 11837 ms.&lt;br /&gt;HashMap time: 11397 ms.&lt;br /&gt;HashMap time: 12398 ms.&lt;br /&gt;HashMap time: 12748 ms.&lt;br /&gt;HashMap time: 11968 ms.&lt;br /&gt;HashMap time: 11276 ms.&lt;br /&gt;HashMap time: 11096 ms.&lt;br /&gt;HashMap time: 11476 ms.&lt;br /&gt;HashMap time: 11316 ms.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;If we add up all the times for the HashMap execution vs Hashtable, it is clear Hashtable is the clear winner in a multi-threaded environment (some of the HashMap threads were idle for quite a while).&lt;/p&gt;&lt;p&gt;The example source code can be obtained here:&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/HashExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/HashExample.zip&lt;/a&gt;&lt;/p&gt;&lt;p&gt;In order to execute single threaded, run 'java -Xmx512M HashExample %TIMES_TO_LOOP%'&lt;/p&gt;&lt;ul&gt;&lt;li&gt;%TIMES_TO_LOOP% = Number of times to loop.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;In order to execute multi threaded, run 'java -Xmx512M HashMultiThreadExample %TIMES_TO_LOOP%'&lt;/p&gt;&lt;ul&gt;&lt;li&gt;%TIMES_TO_LOOP% = Number of times to loop.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;A batch file was supplied for execution on Windows. Simply modify the filename and time to loop in the batch file before execution.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-116809031823965847?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/116809031823965847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/116809031823965847'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2007/01/data-structures-data-structures-are.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-23234885.post-115375507586041061</id><published>2006-07-24T09:59:00.000-05:00</published><updated>2006-07-27T12:30:04.146-05:00</updated><title type='text'></title><content type='html'>&lt;div align="center"&gt;&lt;span style="color:#ff0000;"&gt;Remote Communication&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;Almost every piece of software written today requires some form of remote communication over a network. Communication across various hardware mediums can often be the source of major bottlenecks. While distributed communication speeds have improved drastically the past decade, they never-the-less are still the bane of many systems. There is no way to get around the laws of physics.&lt;br /&gt;&lt;br /&gt;One of the best ways to improve the performance of application requiring remote communication is to decrease the need for remote calls. This is an obvious statement that is often underestimated. Put more of the load on the computer itself and less on the network.&lt;br /&gt;&lt;br /&gt;So, how does one go about reducing the dependency on remote calls? By batching up multiple calls into a single call. Unfortunately, this can often lead to a less generic mechanism for communication. However, if we've learned anything, it's that optimization can sometimes come with a price. Sometimes the price is more than worth it though, like in the case of reducing network traffic.&lt;br /&gt;&lt;br /&gt;One may ask how batching up calls can improve performance when the same total size of information is send back? Well, there are many factors that come into play:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Minimizes latency&lt;/li&gt;&lt;li&gt;Reduces overall load on computer because of less processing&lt;/li&gt;&lt;li&gt;Compression algorithms work better when they have more data to work with&lt;/li&gt;&lt;li&gt;All calls must have source/destination information which does take up a little bandwidth&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Latency is the number one factor as to why batching should be used instead of multiple calls. The absolute maximum speed one can obtain on any kind of network would be the speed of light (at least with today's technology). Most of the time, this is reduced quite a bit by electric resistance, internal router latencies, etc. But even at the speed of light, when making thousands or millions of calls, the latency adds up. It can end up being a few seconds or more depending on the situation. All of this latency could be eliminated by a single call. The single call does not solve bandwidth issues (although compression does usually work better when there is more data to deal with, thus a single call can help with bandwidth issues in that regard), but it does address the latency issue.&lt;br /&gt;&lt;br /&gt;The other reasons mentioned for batching, as well as others not mentioned, should also not be taken lightly. They all add up to drastically improve the performance.&lt;br /&gt;&lt;br /&gt;A sample Java program has been created to demonstrate how much of a difference batching a call can make. The example is a very simple Client/Server application that calls a remote computer to obtain a name. The first timing is for the retrieval of the name in 3 parts (first, middle, last name). The second timing is the batch single call retrieving the name all at once (full name). Each set of calls are done 100 times to emulate heavy network traffic.&lt;br /&gt;&lt;br /&gt;The network utilized was a 100 megabit Ethernet setup utilizing a hub. The full name obtained is 'John Wayne Doe'.&lt;br /&gt;&lt;br /&gt;The following is a comparison of the multiple calls (300 total) vs the single call (100 total):&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#3333ff;"&gt;Multiple calls:&lt;br /&gt;&lt;/span&gt;2434 milliseconds.&lt;/p&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Single batch call:&lt;br /&gt;&lt;/span&gt;631 milliseconds.&lt;br /&gt;&lt;br /&gt;As you can see, the 100 single batch calls was roughly 4 times faster than the 100 multiple calls.&lt;br /&gt;&lt;br /&gt;This was a very simple test just to demonstrate how much of a difference batching remote calls can make. Increasing the amount of data being transferred generally decreases the potential performance improvements. However, as mentioned earlier, compression can then be utilized to gain even more improvements when a larger amount of data is being transfered.&lt;br /&gt;&lt;br /&gt;The example source code can be obtained here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/ClientServerExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/ClientServerExample.zip&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In order to execute, run 'java Server' on one machine and 'java Client SERVER' on another machine. The 'SERVER' is the host name on which the Server class is executing. Port 31111 is utilized by default.&lt;br /&gt;&lt;br /&gt;A few sample batch files were supplied for execution on Windows.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-115375507586041061?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/115375507586041061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/115375507586041061'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2006/07/remote-communication-almost-every.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-23234885.post-114392351289336251</id><published>2006-04-01T13:49:00.000-06:00</published><updated>2006-04-01T21:59:57.416-06:00</updated><title type='text'></title><content type='html'>&lt;div align="center"&gt;&lt;span style="color:#ff0000;"&gt;&lt;strong&gt;Speed vs Size&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;Speed and size often have some type of impact on each other. Usually something smaller is faster. Why is this? Anytime less data needs to be compared, loaded, etc., there will be less execution statements for the CPU to handle. But, size can often be misleading as we will see in this article. There were two sample Java programs created for speed vs size comparisons. One example will demonstrate how smaller means faster while another will show the opposite. Sometimes increasing the size of something can cleverly speed up execution time based on hidden factors. &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Example #1&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The first example demonstrates how larger member variable names in Java can have a signficiant impact on speed. Many may assume that member variable names would have no impact, but they would be dead wrong. Unless an obfuscator is used, Java will embed the names of the variables inside the byte code (.class files). This is so reflection may be used to discover the names of variables and utilize them in fancy ways. While this is very powerful and allows for very flexible programming, it does so at a price. The price is an increase in size and a decrease in speed. &lt;/p&gt;&lt;p&gt;There are three phases to the example. All phases loop a total of 10,000,000 times for the class with the smaller variable name and 10,000,000 for the class with the larger variable name. Phase one simply instantiates both classes. Phase two instantiates and calls '&lt;span style="color:#330099;"&gt;get&lt;/span&gt;' and '&lt;span style="color:#330099;"&gt;set&lt;/span&gt;' methods to retrieve and set the value of the inner member variables. Phase three instantiates outside the 10,000,000 count loop but does '&lt;span style="color:#330099;"&gt;get&lt;/span&gt;' and '&lt;span style="color:#330099;"&gt;set&lt;/span&gt;' the variable value inside.&lt;/p&gt;&lt;p&gt;The variable names used specifically are '&lt;span style="color:#330099;"&gt;var'&lt;/span&gt; and '&lt;span style="color:#330099;"&gt;this_is_a_very_long_variable_name'&lt;/span&gt;. Both are data types of '&lt;span style="color:#330099;"&gt;int'&lt;/span&gt;.&lt;/p&gt;&lt;p&gt;The following is the comparison of times for the smaller variable name vs. the larger variable name:&lt;/p&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Instantiation looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 10000000 for small variable: 551 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8142 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 490 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8553 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 490 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8663 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 601 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8492 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 501 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 9123 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Instantiation and get/set looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 10000000 for small variable: 521 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8552 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 541 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8472 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 531 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8502 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 531 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 9053 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 540 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 8513 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Get/set looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 10000000 for small variable: 30 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 330 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 321 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 30 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 320 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 30 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 311 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for small variable: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 10000000 for large variable: 320 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The results clearly indicate that the shorter variable leads to major speed improvements. Of course, it must be mentioned that this is when looping 10,000,000 times. However, any optimization is an improvement, and little changes here and there can lead to large overall improvements for an application.&lt;/p&gt;&lt;p&gt;The example source code can be obtained here: &lt;/p&gt;&lt;p&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/SizeSpeedDecreaseExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/SizeSpeedDecreaseExample.zip&lt;/a&gt;&lt;/p&gt;&lt;p&gt;In order to execute, run 'java VariableSizeTest'&lt;/p&gt;&lt;p&gt;A batch file was supplied for execution on Windows.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Example #2&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The second example also demonstrates that larger can have an impact on speed. This time, it's the other way around however. Increasing the size of the compiled Java code leads to an improvement in speed. How is this possible? A simple concept called loop unrolling. Loops are used in almost every single program. It saves the developer a lot of time by preventing having to type statements over and over. It also allows for dealing with dynamic sizes of data very easily. This ultimately means the size of the program will be much smaller when loops are utilized. The condition that causes this decrease in size to increase execution time is a simple variable check and a jump (goto). All loops must check some condition and jump to a different execution point in the program in order to repeat or exit. If the check wasn't performed, the loop would be an infinite loop (which all of us have experienced as beginners). Also, if the jump wasn't performed, the code would only loop one time (useless loop).&lt;br /&gt;&lt;br /&gt;Loop unrolling is the process of manually coding each statement that would be performed in the loop multiple times. This is generally only possible if the total number of loop executions is known and will always be the same. The other possible usage is when one knows the absolute max number of loop executions. If the latter is the case, nested if statements can be placed before each set of statements, having the same effect as the condition checks of a loop. How does this actually improve speed? You eliminate the manditory jump (goto) at the bottom of every loop since the program counter of an application automatically goes to the next instruction if not told otherwise. &lt;/p&gt;&lt;p&gt;There are four comparisons done in example #2. There is one example of traditional looping, followed by three examples of doing loop unrolling. The first loop unrolling (slowest of the three) involves dynamic sizes of up to 15 loops. Of course, our specific example always does 10 because of achieving fair time comparisons. Nevertheless, the concept of dynamic loop unrolling will be made obvious. The second loop unrolling example utilizes nested '&lt;span style="color:#330099;"&gt;if&lt;/span&gt;' statements. This leads to a significant performance enchancement over the previous unrolling method. It does lead to messier code however (sometimes the price to pay for optimizing). Lastly, a static unrolled loop shows its raw speed benefits. It is much more limited in it's application however, since the number of loop executions must always be the same and it must be known ahead of time.&lt;/p&gt;&lt;p&gt;The following is the comparison of times for the four looping methods:&lt;/p&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Traditional looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 691 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 671 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 682 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 691 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 821 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 771 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 701 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 671 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 671 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for traditional loop: 691 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Dynamic unrolled looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 321 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 440 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 411 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 471 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 440 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 431 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 421 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 440 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 451 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 441 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Dynamic nested unrolled looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 200 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 180 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 170 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 171 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 190 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 180 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 181 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 170 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 170 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 190 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="color:#3333ff;"&gt;Static unrolled looping:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 51 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 60 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;li&gt;Loop count 20000000 for unrolled loop: 40 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The tests show that traditional looping is by far the slowest. The friendly dynamic unrolled loop is roughly 60% faster. The dynamic nested unrolled loop is more than twice as fast as the friendly dynamic version. Finally, the static unrolled loop is approximately four times faster than the best dynamic loop and more than 16 times faster than the traditional loop!&lt;/p&gt;&lt;p&gt;If we compare the size of the Java class files (bytecode), we see the following:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Traditional loop: 980&lt;/li&gt;&lt;li&gt;Dynamic Unrolled: 1298&lt;/li&gt;&lt;li&gt;Dynamic Nested Unrolled: 1324&lt;/li&gt;&lt;li&gt;Static Unrolled: 1065&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The smallest performed the worst while the largest performed second best (best if you consider flexibility important). Even though the static unrolled example is not much larger than the traditional loop example, it provided the largest increase in speed (at the cost of flexibility and much more limited application).&lt;/p&gt;&lt;p&gt;The example source code can be obtained here: &lt;/p&gt;&lt;p&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/SizeSpeedIncreaseExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/SizeSpeedIncreaseExample.zip&lt;/a&gt;&lt;/p&gt;&lt;p&gt;In order to execute, run 'java LoopSizeTest'&lt;/p&gt;&lt;p&gt;A batch file was supplied for execution on Windows.&lt;/p&gt;&lt;p&gt;This article was created to help one understand that while size and speed are very closely related, increases and decreases in size can both have positive impacts on speed depending on the situation.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-114392351289336251?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114392351289336251'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114392351289336251'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2006/04/speed-vs-size-speed-and-size-often.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-23234885.post-114269106282977476</id><published>2006-03-18T07:59:00.000-06:00</published><updated>2006-03-18T10:24:02.673-06:00</updated><title type='text'></title><content type='html'>&lt;p align="center"&gt;&lt;span style="color:#ff0000;"&gt;&lt;strong&gt;Caching&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p align="left"&gt;As mentioned in a previous article, caching is an excellent way to drastically improve performance with minimal changes/development. Caching accomplishes this goal by shifting resource loads from one type to another. A CD drive is much slower than a mechanical hard drive in general. A mechanical hard drive is much slower than system memory in general. Accessing a remote machine's system memory is much slower than the local machine's system memory. The amount of comparisons between different types of storage mechanisms would be an endless list. Luckily, most developers only deal with a few storage mechanisms when designing/developing a given system.&lt;/p&gt;&lt;p align="left"&gt;Just for general information, the following is a list of storage mechanisms and their general quickness. A higher number indicates a faster mechanism, while a lower number indicates a slower mechanism. Again, this is in general. It isn't always the case. The numbers are anywhere from 1 to 100. Also, the general number used is an overall rating for latency and bandwidth. Some devices are better with latency while poor with bandwidth. Some are better with bandwidth and worse with latency. Some are worse with both. Keep in mind that the mechanisms can be used in many different ways. This is simply an indicator on things to look at in order to improve performance. The first section compares local access. The second section compares remote access. Most systems must utilize both types in order to be useful. This means caching becomes more complicated, but nevertheless, the concepts remain the same. Usually preventing remote calls (or keeping them to a minimum) can render the largest performance enhancements.&lt;/p&gt;&lt;p align="left"&gt;&lt;span style="color:#3333ff;"&gt;Local Access&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;CPU Register 100&lt;/li&gt;&lt;li&gt;CPU Cache Memory 90&lt;/li&gt;&lt;li&gt;System Memory 70&lt;/li&gt;&lt;li&gt;Database 60&lt;/li&gt;&lt;li&gt;Hard Drive 40&lt;/li&gt;&lt;li&gt;DVD Drive 30&lt;/li&gt;&lt;li&gt;CD Drive 20&lt;/li&gt;&lt;/ul&gt;&lt;p align="left"&gt;&lt;span style="color:#3333ff;"&gt;Remote Access&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Direct Ethernet Connection 100&lt;/li&gt;&lt;li&gt;Ethernet LAN 90&lt;/li&gt;&lt;li&gt;Wireless LAN 70&lt;/li&gt;&lt;li&gt;T1 Line 50&lt;/li&gt;&lt;li&gt;Broadband 30&lt;/li&gt;&lt;li&gt;Dial-up 5&lt;/li&gt;&lt;/ul&gt;&lt;p align="left"&gt;Now that a few storage mechanisms and access mechanisms have been listed, it is now time to look at a specific caching concept and see a real working example. The following Java class file compares the performance of reading a file from the file system (with the aid of the operating system of course) vs reading a file from a cache in system memory. The type of caching mechanism used to utilize the system memory is a HashMap. A HashMap was chosen over a Hashtable because of the removal of synchronization. It would be up to the developer to determine if the code needed to be synchronized. For the example code, it is not required, so a minor performance improvement can be achieved with the HashMap class.&lt;/p&gt;&lt;p align="left"&gt;The code was tested against a 73 MB file. The application was executed a total of 2 times, with 5 reads within each execution. A reboot occurred before each execution. The reason for this is because the operating system will utilize its own cache in order to prevent actually fetching the file from the hard drive when it had already been recently read. &lt;/p&gt;&lt;p align="left"&gt;The following is a list of the benchmark timings for the executions: &lt;/p&gt;&lt;p align="left"&gt;&lt;span style="color:#3333ff;"&gt;Execution #1&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the file system: 18036 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the file system: 4507 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the file system: 4727 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the file system: 5287 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the file system: 5128 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p align="left"&gt;&lt;span style="color:#3333ff;"&gt;Execution #2&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Read 76699621 bytes from the file system: 15973 ms.&lt;/li&gt;&lt;li&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the file system: 6029 ms.&lt;/li&gt;&lt;li&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the file system: 4837 ms.&lt;/li&gt;&lt;li&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the file system: 4717 ms.&lt;/li&gt;&lt;li&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the file system: 4726 ms.&lt;/li&gt;&lt;li&gt;Write 76699621 bytes into the cache: 0 ms.&lt;/li&gt;&lt;li&gt;Read 76699621 bytes from the cache: 0 ms.&lt;/li&gt;&lt;/ul&gt;&lt;p align="left"&gt;You will notice the time to read from the cache always says 0. The reason for this is because Java's default system clock is inaccurate on Windows for anything less than approximately 10 milliseconds. In other words, the read from the cache was so fast, it couldn't be registered. Compare those times to the approximate average of 5000 millisecond timings for each fetch from the operating system's internal cache. If the application had many users attempting to read the same files frequently, it is plain to see how much of an improvement that memory cache would make. Notice that the first read from each fresh reboot resulted in a much higher read time. Again, this is because the operating system uses its own internal cache on subsequent reads to avoid accessing the hard disk as much as possible.&lt;/p&gt;&lt;p align="left"&gt;The example source code can be obtained here: &lt;/p&gt;&lt;p align="left"&gt;&lt;a href="http://www.everlastsoftware.com/examples/source/java/CacheExample.zip"&gt;http://www.everlastsoftware.com/examples/source/java/CacheExample.zip&lt;/a&gt;&lt;/p&gt;&lt;p align="left"&gt;In order to execute, run 'java -Xmx512M CacheExample %FILE% %TIMES_TO_LOOP%'&lt;/p&gt;&lt;ul&gt;&lt;li&gt;%FILE% = The file to do the read test again.&lt;/li&gt;&lt;li&gt;%TIMES_TO_LOOP% = Number of times to read the file.&lt;/li&gt;&lt;/ul&gt;&lt;p align="left"&gt;A batch file was supplied for execution on Windows. Simply modify the filename and time to loop in the batch file before execution.&lt;/p&gt;&lt;p align="left"&gt;Now it must be mentioned, of course, that system memory is a limited resource. Some machines may already be starved for free memory, so creating a memory cache would not help performance much. In fact, it may even hurt performance. So it is always important to consider the environment that the application will be in as well as how it will be used. The concepts discussed in this article simply demonstrate potential speed improvements through the use of caching.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-114269106282977476?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114269106282977476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114269106282977476'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2006/03/cachingas-mentioned-in-previous.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-23234885.post-114178656423923087</id><published>2006-03-07T20:45:00.000-06:00</published><updated>2007-02-21T04:45:07.386-06:00</updated><title type='text'></title><content type='html'>&lt;div align="center"&gt;&lt;span style="color:#ff0000;"&gt;&lt;strong&gt;Optimization Concepts &lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;br /&gt;Most of us have experienced spending months or years, developing applications that end up being almost useless from a practical sense. Usually it is either a lack in the design process, or an incomplete design. Sometimes it is because of inefficient coding. Usually it’s a combination of both.&lt;br /&gt;Unfortunately, most of the cases occur because of lack of experience. This deficit of experience may be related to the individual learning a new programming language. It could be because of not completely understanding core concepts. Perhaps it isn’t the developer’s fault but the actual runtime environment that is the problem. The list is endless.&lt;br /&gt;I would like to pass on a few key concepts, as well as specific tips (future article), to help increase one’s chance of developing a high performance (or at least acceptable performance) application. The concepts cross all boundaries of languages, environments, etc. The tips will be specific examples for Java, but can be translated to other languages as well.&lt;br /&gt;The following concepts shall be discussed at a high level. The purpose is simply to introduce the main concepts, or things to be aware of, as more details are disclosed at a later time. It’s very important to know about the possibilities at a high level before digging too deep. Otherwise, one may be lost in a sea of confusion. Please keep in mind that there are more optimization concepts than discussed in this article.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Speed vs Size &lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Speed and size are often closely related when it comes to performance. Obviously we want applications to be as fast as possible as well as small as possible. Historically, decreasing the size of something meant it would typically execute slower. Conversely, making something faster typically meant an increase in size. While this is sometimes still the case, it is not exclusively true.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Compiled vs Just-In-Time (JIT) Compiled vs Interpreted &lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;There are three typical ways applications execute on a given machine. Compiled into native code ahead of time (before execution), compiled on the fly (during execution) into native code, and interpreted (each instruction translated on the fly). While there are advantages and disadvantages to each of these methods, just make a mental note that the types of optimizations performed can have a larger or smaller impact on the application depending on how it will be executed. Sometimes doing an optimization for compiled code will make things worse for JIT code. Sometimes an optimization will be beneficial regardless of the execution method. Just being aware of these facts can provide tremendous benefits. Just to give a few examples, C/C++ is compiled, Java can execute as interpreted or JIT compiled (default), and VB Script is interpreted. A general rule of thumb is any language/environment that has a Virtual Machine is either interpreted or JIT compiled. Any language that is a scripting language is almost always interpreted (but this is not always the case). Any language that has a runtime library is almost always compiled to native code.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Caching &lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Caching is simply the process of making something more frequently used quicker to retrieve in the future. The actual media, storage mechanism, etc., doesn’t matter as long as it is faster than the default. Typically, a cache is thought of in terms of memory. This does not mean a hard disk cannot be a cache as well however. If the main storage area for an image is on a CD, a hard disk is indeed a cache because of its large performance improvement over the CD. Caching is always a design optimization of some kind. It’s always best to plan what to cache ahead of time, but if something is missed (which is often the case), caching is usually an easy way to drastically improve the performance with a relatively small degree of effort compared to other optimizations. It’s also important to note that a cache can often optimize speed as well as size (reusing the same data instead of having copies of the same data).&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Remote Communication&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Networks have become the backbone for almost all major computer systems in existence. Networks must rely on remote communication for computers to work together. Remote communication can often be an area where major optimizations can occur. Often times, too much information is being transmitted than what is really required. This is often referred to as a bandwidth issue. There is also an issue of how long it takes for data to arrive based on the laws of physics. This is referred to as latency. There is little one can do to optimize for latency, but there is often a lot one can do to optimize bandwidth. Most of latency issues are because of a design that didn’t take into account distances, usage, etc., ahead of time. These can be extremely hard, if not impossible, to improve. Bandwidth is also largely an issue of a design issue, but sometimes there are fairly simple ways to still utilize the same design while improving bandwidth usage.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Parallel Execution &lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Parallel execution is the ability for an application to utilize more than one CPU, machine, hardware device, etc., in order to split the work up and reduce the overall time it takes to complete a task. There are several ways to develop for parallel execution. Some are much more automatic and handled by the operating system, others are much more difficult and require complex logic and sophisticated code. Again, most of these issues need to be addressed at design time. However, if a system has already been developed without taking into account parallel execution, there is still hope. Sometimes, the largest problem areas can be specifically re-factored to allow for parallel execution.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Inefficient Compilers and Interpreters&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Unfortunately, compilers and interpreters aren’t perfect. Experience can teach someone what to expect when utilizing a specific one. If there are known performance issues, developing in a particular way can help get around the issue. Basically, the developer must work around the flaws in the compilers/interpreters. Another unfortunate issue is the developer may spend time working around the flaw only to find out their workaround doesn’t work in the future, or the makers of the compiler/interpreter fix the flaw. More likely than not, however, one may know with high certainty that the flaw is because of design. If that is the case, he/she may feel much more comfortable investing the time to do the workaround, because design issues are much harder to fix in general. If it appears to be a bug, time may be better spent looking at other optimization avenues (unless of course all others have already been considered).&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Hardware vs Software&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Hardware will always be able to outperform software. Therefore, sometimes it may be more cost effective to simply throw more powerful hardware at the problem. In fact, this ends up being the case fairly often (unfortunately). It’s always beneficial to analyze the application/system however. There are usually fairly cheap/quick optimizations that can be done with software, sometimes avoiding the need for expensive hardware purchases.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Maximizing Hardware Potential&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Software instructions are eventually translated directly to hardware instructions. The hardware instructions are the ones that actually accomplish real tasks. Without manipulating hardware, or physical devices, software is useless. Some environments/languages allow the usage of special hardware instructions in order to speed up code execution. Sometimes it is necessary to tap into the deepest hardware potential, other times it is not. This largely depends on the type of application. Games, for example, must almost always attempt to maximize hardware usage because of the complexity and massive number of instructions they produce. A business application rarely needs to be optimized for the latest and greatest hardware.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;External Libraries and Calls&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Calling external libraries can be very time consuming. The operating system usually needs to be involved in order to allow applications to link to other libraries, execute other programs, etc. The flexibility to call external libraries comes with a cost: speed loss. Often times, a library may be called more times than it really needs to be if the number of calls can be batched up into a single call. Sometimes this is not a possibility, other times it is. Knowing that an external call has a speed price tag can perhaps make one design their applications to take advantage of minimal calls to begin with.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Multi-threading&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Multi-threading is almost a necessity in today’s complex environments. Many systems (especially servers) come with at least 2 processors. Even desktops are now being built with hyper-threading on a single processor, essentially making the processor a dual (not exactly, but similar performance gains). Developing an application to utilize multiple threads taps into the power of multiple processors and/or hyper-threading. Multi-threading is a subset of parallel processing, but fortunately, is much easier to develop in general. Almost all modern operating systems have multi-threading support built in. This means applications can usually perform parallel processing fairly easily. Multi-threading is often determined at design time, but occasionally adding extra threads may be a possibility to an already existing application. Using multiple threads can drastically improve the performance (scalability) of an application when there is more than one processor available.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Single User Perception vs Multi-User Perception &lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Perception is usually the key to optimization choices. After all, if the users of the application are happy with the performance, it’s usually not worth further optimizing. This is where user perception comes in to play. Some systems/applications will only have one simultaneous user (desktop application for example), whereas others may frequently have multiple users accessing simultaneously. The latter is often related to a concept called scalability. Scalability is the ability for an application to take on more and more users and not become overloaded (within reason of course). This is usually achieved by multithreading or load balancing across multiple machines. This means scalability is almost always a design issue. It’s very difficult to load balance an application that has not been designed for parallel processing from the beginning. Getting back to the main point, optimizing an application/system for maximum throughput for a single user usually involves a completely different process than optimizing for multiple simultaneous users. One thing that is always true is, optimizing for a single user will improve performance for multiple users, if CPU time is the bottleneck. The converse of that is not true however. In fact, sometimes optimizing for multiple users can hurt performance for a single user. Keep those thoughts in the back of your mind when we further discuss optimizing for user perception. Another thing to keep in mind is that sometimes a real improvement in performance is not needed, if the user can be “tricked” or made to believe it’s faster. This is why many applications display splash screens on startup, or display progress bars while doing a time consuming task. If the user can be reassured productive work is being done, they will often be more patient without even realizing it.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Data Structures&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Choosing the right data structures to use for particular situations can make a huge difference in not only speed, but also size. There are many different types of data structures: Arrays, Hash Tables, Stacks, Queues, etc. All of them have a time and place. Knowing that how you store and process information can affect performance is critical when designing a system. It's often extremely difficult (but not impossible) to change to other data structures later on if the system wasn't designed correctly to begin with.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Algorithms&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Algorithms often go hand and hand with data structures. The reason for this is because algorithms have to have data on which to perform. A bad algorithm has the capability to bring a system to its knees. Algorithms can usually be swapped in and out fairly easily as long as the design was modularized and made to be generic. This is just one more reason to spend a little extra time on the design to ensure the application/system is modularized.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Summary&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;There are many different approaches one may take in order to optimize an application. This can make the process of optimization very difficult to understand and complete successfully. Knowing various concepts is critical in order to choose the correct methods. The next article will dig into depth about specific concepts and provide concrete examples to further solidify the high level concepts discussed.&lt;/p&gt;&lt;p&gt;This article was created by Everlast Software, LLC: &lt;a href="http://www.everlastsoftware.com"&gt;http://www.everlastsoftware.com&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23234885-114178656423923087?l=everlastsoftware.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://everlastsoftware.blogspot.com/feeds/114178656423923087/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=23234885&amp;postID=114178656423923087' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114178656423923087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/23234885/posts/default/114178656423923087'/><link rel='alternate' type='text/html' href='http://everlastsoftware.blogspot.com/2006/03/optimization-concepts-most-of-us-have.html' title=''/><author><name>Darrell Fortae</name><uri>http://www.blogger.com/profile/16681954900281767566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
