Everlast Software Technical

Tuesday, March 07, 2006

Optimization Concepts


Most of us have experienced spending months or years, developing applications that end up being almost useless from a practical sense. Usually it is either a lack in the design process, or an incomplete design. Sometimes it is because of inefficient coding. Usually it’s a combination of both.
Unfortunately, most of the cases occur because of lack of experience. This deficit of experience may be related to the individual learning a new programming language. It could be because of not completely understanding core concepts. Perhaps it isn’t the developer’s fault but the actual runtime environment that is the problem. The list is endless.
I would like to pass on a few key concepts, as well as specific tips (future article), to help increase one’s chance of developing a high performance (or at least acceptable performance) application. The concepts cross all boundaries of languages, environments, etc. The tips will be specific examples for Java, but can be translated to other languages as well.
The following concepts shall be discussed at a high level. The purpose is simply to introduce the main concepts, or things to be aware of, as more details are disclosed at a later time. It’s very important to know about the possibilities at a high level before digging too deep. Otherwise, one may be lost in a sea of confusion. Please keep in mind that there are more optimization concepts than discussed in this article.

Speed vs Size

Speed and size are often closely related when it comes to performance. Obviously we want applications to be as fast as possible as well as small as possible. Historically, decreasing the size of something meant it would typically execute slower. Conversely, making something faster typically meant an increase in size. While this is sometimes still the case, it is not exclusively true.

Compiled vs Just-In-Time (JIT) Compiled vs Interpreted

There are three typical ways applications execute on a given machine. Compiled into native code ahead of time (before execution), compiled on the fly (during execution) into native code, and interpreted (each instruction translated on the fly). While there are advantages and disadvantages to each of these methods, just make a mental note that the types of optimizations performed can have a larger or smaller impact on the application depending on how it will be executed. Sometimes doing an optimization for compiled code will make things worse for JIT code. Sometimes an optimization will be beneficial regardless of the execution method. Just being aware of these facts can provide tremendous benefits. Just to give a few examples, C/C++ is compiled, Java can execute as interpreted or JIT compiled (default), and VB Script is interpreted. A general rule of thumb is any language/environment that has a Virtual Machine is either interpreted or JIT compiled. Any language that is a scripting language is almost always interpreted (but this is not always the case). Any language that has a runtime library is almost always compiled to native code.

Caching

Caching is simply the process of making something more frequently used quicker to retrieve in the future. The actual media, storage mechanism, etc., doesn’t matter as long as it is faster than the default. Typically, a cache is thought of in terms of memory. This does not mean a hard disk cannot be a cache as well however. If the main storage area for an image is on a CD, a hard disk is indeed a cache because of its large performance improvement over the CD. Caching is always a design optimization of some kind. It’s always best to plan what to cache ahead of time, but if something is missed (which is often the case), caching is usually an easy way to drastically improve the performance with a relatively small degree of effort compared to other optimizations. It’s also important to note that a cache can often optimize speed as well as size (reusing the same data instead of having copies of the same data).

Remote Communication

Networks have become the backbone for almost all major computer systems in existence. Networks must rely on remote communication for computers to work together. Remote communication can often be an area where major optimizations can occur. Often times, too much information is being transmitted than what is really required. This is often referred to as a bandwidth issue. There is also an issue of how long it takes for data to arrive based on the laws of physics. This is referred to as latency. There is little one can do to optimize for latency, but there is often a lot one can do to optimize bandwidth. Most of latency issues are because of a design that didn’t take into account distances, usage, etc., ahead of time. These can be extremely hard, if not impossible, to improve. Bandwidth is also largely an issue of a design issue, but sometimes there are fairly simple ways to still utilize the same design while improving bandwidth usage.

Parallel Execution

Parallel execution is the ability for an application to utilize more than one CPU, machine, hardware device, etc., in order to split the work up and reduce the overall time it takes to complete a task. There are several ways to develop for parallel execution. Some are much more automatic and handled by the operating system, others are much more difficult and require complex logic and sophisticated code. Again, most of these issues need to be addressed at design time. However, if a system has already been developed without taking into account parallel execution, there is still hope. Sometimes, the largest problem areas can be specifically re-factored to allow for parallel execution.

Inefficient Compilers and Interpreters

Unfortunately, compilers and interpreters aren’t perfect. Experience can teach someone what to expect when utilizing a specific one. If there are known performance issues, developing in a particular way can help get around the issue. Basically, the developer must work around the flaws in the compilers/interpreters. Another unfortunate issue is the developer may spend time working around the flaw only to find out their workaround doesn’t work in the future, or the makers of the compiler/interpreter fix the flaw. More likely than not, however, one may know with high certainty that the flaw is because of design. If that is the case, he/she may feel much more comfortable investing the time to do the workaround, because design issues are much harder to fix in general. If it appears to be a bug, time may be better spent looking at other optimization avenues (unless of course all others have already been considered).

Hardware vs Software

Hardware will always be able to outperform software. Therefore, sometimes it may be more cost effective to simply throw more powerful hardware at the problem. In fact, this ends up being the case fairly often (unfortunately). It’s always beneficial to analyze the application/system however. There are usually fairly cheap/quick optimizations that can be done with software, sometimes avoiding the need for expensive hardware purchases.

Maximizing Hardware Potential

Software instructions are eventually translated directly to hardware instructions. The hardware instructions are the ones that actually accomplish real tasks. Without manipulating hardware, or physical devices, software is useless. Some environments/languages allow the usage of special hardware instructions in order to speed up code execution. Sometimes it is necessary to tap into the deepest hardware potential, other times it is not. This largely depends on the type of application. Games, for example, must almost always attempt to maximize hardware usage because of the complexity and massive number of instructions they produce. A business application rarely needs to be optimized for the latest and greatest hardware.

External Libraries and Calls

Calling external libraries can be very time consuming. The operating system usually needs to be involved in order to allow applications to link to other libraries, execute other programs, etc. The flexibility to call external libraries comes with a cost: speed loss. Often times, a library may be called more times than it really needs to be if the number of calls can be batched up into a single call. Sometimes this is not a possibility, other times it is. Knowing that an external call has a speed price tag can perhaps make one design their applications to take advantage of minimal calls to begin with.

Multi-threading

Multi-threading is almost a necessity in today’s complex environments. Many systems (especially servers) come with at least 2 processors. Even desktops are now being built with hyper-threading on a single processor, essentially making the processor a dual (not exactly, but similar performance gains). Developing an application to utilize multiple threads taps into the power of multiple processors and/or hyper-threading. Multi-threading is a subset of parallel processing, but fortunately, is much easier to develop in general. Almost all modern operating systems have multi-threading support built in. This means applications can usually perform parallel processing fairly easily. Multi-threading is often determined at design time, but occasionally adding extra threads may be a possibility to an already existing application. Using multiple threads can drastically improve the performance (scalability) of an application when there is more than one processor available.

Single User Perception vs Multi-User Perception

Perception is usually the key to optimization choices. After all, if the users of the application are happy with the performance, it’s usually not worth further optimizing. This is where user perception comes in to play. Some systems/applications will only have one simultaneous user (desktop application for example), whereas others may frequently have multiple users accessing simultaneously. The latter is often related to a concept called scalability. Scalability is the ability for an application to take on more and more users and not become overloaded (within reason of course). This is usually achieved by multithreading or load balancing across multiple machines. This means scalability is almost always a design issue. It’s very difficult to load balance an application that has not been designed for parallel processing from the beginning. Getting back to the main point, optimizing an application/system for maximum throughput for a single user usually involves a completely different process than optimizing for multiple simultaneous users. One thing that is always true is, optimizing for a single user will improve performance for multiple users, if CPU time is the bottleneck. The converse of that is not true however. In fact, sometimes optimizing for multiple users can hurt performance for a single user. Keep those thoughts in the back of your mind when we further discuss optimizing for user perception. Another thing to keep in mind is that sometimes a real improvement in performance is not needed, if the user can be “tricked” or made to believe it’s faster. This is why many applications display splash screens on startup, or display progress bars while doing a time consuming task. If the user can be reassured productive work is being done, they will often be more patient without even realizing it.

Data Structures

Choosing the right data structures to use for particular situations can make a huge difference in not only speed, but also size. There are many different types of data structures: Arrays, Hash Tables, Stacks, Queues, etc. All of them have a time and place. Knowing that how you store and process information can affect performance is critical when designing a system. It's often extremely difficult (but not impossible) to change to other data structures later on if the system wasn't designed correctly to begin with.

Algorithms

Algorithms often go hand and hand with data structures. The reason for this is because algorithms have to have data on which to perform. A bad algorithm has the capability to bring a system to its knees. Algorithms can usually be swapped in and out fairly easily as long as the design was modularized and made to be generic. This is just one more reason to spend a little extra time on the design to ensure the application/system is modularized.

Summary

There are many different approaches one may take in order to optimize an application. This can make the process of optimization very difficult to understand and complete successfully. Knowing various concepts is critical in order to choose the correct methods. The next article will dig into depth about specific concepts and provide concrete examples to further solidify the high level concepts discussed.

This article was created by Everlast Software, LLC: http://www.everlastsoftware.com

0 Comments:

Post a Comment

<< Home