Bits and a Pint: Performance Tuning with VisualVM

What is VisualVM

VisualVM is a visual tool integrating several command line JDK tools and lightweight profiling capabilities. Designed for both production and development time use, it further enhances the capability of monitoring and performance analysis for the Java SE platform

Where can I get VisualVM

You can get VisualVM as a stand alone, which is preferred, or use it straight out of the JDK since it has now part of the JDK release as of JDK 1.6.0_07. The reason I prefer the stand alone release is that VisualVM releases are not in-sync with the JDK so to keep on the latest version without having to wait for the JDK to include it I prefer to have VisualVM as a stand alone application. You can download VisualVM from:

https://visualvm.dev.java.net

More Information

I recommend that you read up the documentation section on the VisualVM site as it has some good details around the basics of VisualVM: https://visualvm.dev.java.net/

Setting up VisualVM

Download VisualVM, get the latest version: https://visualvm.dev.java.net/download.html
Alter VisualVM's OOB parameters for heap size so that we can profile without running out of heap frequently.
1. Go to $VISUALVM_HOME\etc
2. Edit visualvm.conf in notepad
3. Alter the default_options so that the Xmx is set to 768: -J-Xmx768m
Shutdown all extraneous applications as VisualVM will build a performance profile the first time you run it.
Start up VisualVM
1. Run $VISUALVM_HOME\bin\visualvm.exe
Install the following recommended plug-ins (I personally have all of them installed, but it's not necessary)
1. On the menu bar choose: tools -> plugins
2. Under the available plugins tab select:
  1. VisualVM-Glassfish
  2. VisualVM-MBeans
  3. VisualVM-TDA-Module
  4. VisualVM-Extensions
  5. VisualVM-Logfile-Module
  6. VisualVM-JvmCapabilites

Monitoring an Application

VisualVM offers all of the familiar JConsle tools for monitoring a VM. To monitor/view an application just double click on your application shown in the left pane, these are all of the JVM's running locally.

Monitor Tab - Offers the JConsole overview look where you can see the Memory, CPU, Thread count and Classes count
Threads Tab - A drilled down view of the Threads just like in JConsole
MBeans Tab - With the VisualVM-MBeans plugin installed you get this tab which exposes the MBeans, just like in JConsole

If you are having an issue which requires a review of the thread stack, you can execute a thread dump from the Threads tab, which will create/open a new thread dump snapshot that can be viewed and saved. If you have the TDA plug-in installed a Thread Dump Analyzer tab will open as well, which gives you some nice tools for walking the thread stack and analyzing the thread dump results.

If you are having an issue which requires a review of the Heap, you can execute a heap dump from the Monitor tab, which will create/open a new heap dump snapshot that can be viewed and saved. Typically I will save these and open them in the Eclipse Memory Analyzer Toolkit as the toolset is much more comprehensive - I'll have an other blogpost on this coming soon.

Profiling an Application

NOTE: The profiling tool in general will only work with a local JVM. You can profile a remote JVM, but it involves installing/running a remote daemon that VisualVM can connect through. Usually for remote VM's I will execute an SSH command to run VisualVM on the box remote display it to an X-windows display running locally.

VisualVM offers a limited profiler, but it can be useful for quickly viewing the CPU utilization and/or memory utilization of an application to help pinpoint possible performance bottlenecks. One thing I've noticed is that when profiling for the CPU the instrumentation process can sometimes be a little finicky and not properly instrument all of the methods, so I will need to stop/restart it a few times.

NOTE: If you are profiling a remote JVM, you can also utilize the sampler and sample the remote VM. This will help to track CPU Utilization, however, memory sampling is not available for remote JVMs.

For more information regarding the profiler you can read the documentation from VisualVM here: https://visualvm.dev.java.net/profiler.html

Here is an example of profiling a VM on a remote server - note assumes all libraries installed that support the tool:

Login with port forwarding: I use Cygwin X server on my windows machine, but any ssh tool that allows port forwarding will work.

ssh -X <user>@10.100.2.34

Make sure your home account has permissions for msuser to create the required directories for visualVM - I give full perms.

cd /export/home/
chmod 777 <user>

Find the process you want to profile - note visualVM will show you all java process running on the box, so you might want to find the PID first so you know which one to look at in VisualVM.

./usr/jdk/jdk1.6.0_07/bin/jps -v | grep <process_name>
28177 PELaunch.....

Launch visualVM:

cd /usr/jdk/jdk1.6.0_07/bin
./jvisualvm

When VisualVM starts, click on the process you want to profile (GlassFish (pid 28177)) - (if it is local only good for local process). For remote processes you can use the light weight Sampler instead of the "Profiler". The "Sampler" can attach to a JVM via JMX.

CPU profiling

Starting a Session with Profiler

Click the CPU button to start profiling the applicaiton
After starting the profiler, wait a few minutes before pushing load.
This is where it can sometimes go wrong, as mentioned above the profiler doesn't always instrument correctly, if you see that happen do the following, which usually clears it up and the proper data starts getting collected:
1. Stop the profiler
2. Stop the load
3. Restart the profiler
4. Restart the load

Setup and Starting a Session with Sampler

Click on the sampler tab for the application you are connected to via JMX
Click on the CPU button
You should now see the profiling occur, but to drill down you'll need to take a snapshot. To do this, click on the "Snapshot" button.
To save the snapshot, right click on it in the left pane and click save.

Analyzing the Data

The data collected details CPU execution time for given methods. It is captured based on method entry and exit for threads. You can also see the number of times that a method has been invoked. These are excelent details for trying to find slow running code and/or code that perhaps is being invoked more times than what is expected.

You can also take a snapshot and save that for future comparisons. Having a historical view of the profiled versions is extremely helpful in finding points of degradation over time as opposed to areas that a profiler is reporting as bottlenecks. The other nice thing about snapshots is that it allows you to better filter your view and drill into the call stack for problem areas, which is very useful for examining exactly what is going on.

What can be learned from the data statistics?

Invocations - The number of times a method was invoked
- How to use it:
- Take a look at the method and the callstack. Is this expected or are we executing code more than what we are expecting/wanting. Review the code, is there a way to reduce the number of calls to the executed function, is it worth investing in? The answer here depends on the data. If a function is taking 4ms to return and it's executed 1000 times per thread, thats 4 seconds spent just in that function. Can we re-write that function to only take 2ms, if so we've saved 50% of execution time for that thread.
Self time - The total amount of processing time spent in a method, note this is not per invocation, but total time. To get the per invocation, divide by the invocation count.
- How to use it:
- Similar to invocation count, if we are spending a lot of time within a single function, we need to examine that function and determine why. It could be that there are sub methods or objects that are executing methods which are taking up the majority of the time and adding to self time. If you view the method in the call stack that should help you determine if the time spent is purely self time or something else. Also, it is possible that the method we are spending the most time in is not instrumented and being recorded as self time. If you examine a piece of code and find that the self time makes no sense, try extracting pieces out into sub methods and then re-run the profiler to see what was truly taking all the time.

ProductImageTag - A Case Study

Recently we tried to debug why the profiler was reporting the ProductImageTag as bottleneck. There was a high number of invocations for the doStartTag() method, but we expected this since we call this tag for every product image displayed, which during a BPT is quite a few. Next we looked at the self time. We were spending a bit of time within the self, more than what we thought was required to process the image tag as illustrated in the following screen shots:

The question then is, how much time does it take for a tag to render output to the screen. So we altered our function to just return a string of text to the out writer. i.e. the fastest it could possibly run and then we re-ran our profiler. From this we learned the absolute fastest our method can run, obviously we could never expect to reach this given the logic we need to execute, but it was a good learning experiment and told us we had room to improve.

Going back to the original code we started to examine what was happening and learned that there were 3 main things:

Call to resolve the passed in categoryItemId - this method was being tracked seperately so we could easily see it's processing time.
A call to get the thumb image from the image page for the categoryItem
Building the StringBuffer for the HTML output of the tag

Armed with the knowledge of what the function does a 2nd review of the data told us:

That we could rule out number 1 as a bottleneck, because we could see that time and it barely took 1/3 of the total time that the doStartTag was reporting.
That we were not instrumenting/tracking the call for grabbing the image or image page
That we couldn't tell how long the StringBuffer was taking.

With this new knowledge we segregated the StringBuffer build logic into a sub method which allowed us to instrument it separately. Also, while doing this we realized that we were not fully taking advantage of a StringBuffer, because we were doing String concatenations of variables within the append functions which causes the compiler to negate the usage of the StringBuffer optimizations. This was fixed.

We re-ran our profiler and gained new data, with a new understanding as seen in the following screen shot:

For every invocation of the doStartTag method we were invoking the categoryItem.getImage 2 times and within the Family or Product entity's getImagePage were there we were creating/calling entity.getPage 2 times and then doing returning a new ProxyImagePage.

There were a few issues with this logic:

One of the calls to categoryItem.getImage was just to see if the ImagePage is null while the second was to use the data if it wasn't null. Normally this is ok to do, but given the overhead of this call we needed to have the return captured in a local variable and then check that for null and use it later.
The fallback logic with Family/Product.getImagePage was to build the ProxyImagePage using joint data for the requested locale and the default sites locale. However, in 95% of our merchants the default site locale is the same as the locale we are requesting, because 95% of merchants are not multi-language. Some of this overhead could be negated by changing our logic so that we enter fallback based on the criteria specified as well as if the locale requested is not that of the site default locale.

After making the changes above we re-ran our test and found the following as seen in the following screen shot:

The doStartTag mehtod no longer showing as a bottleneck
A 10X improvement on the execution time of the doStartTag method
The times that entity.getPage for Image Pages was reduced from 4-1 to a 1-1 ratio on the call to doStartTag method

Memory Profiling

The memory profiler will do a real-time heap analysis and can be process intensive. It's good to run and see what is happening within the heap, but I've found that heap analysis tools to be limiting and not as intuitive as one would like. As a result I will normally not do Memory Profiling with VisualVM, but instead execute a heap dump from VisualVM and then open the dump inside of the Eclipse Memory Analyzer Toolkit which has some great functionality for doing Heap analysis. Coming soon - A look into the Eclipse Memory Analyzer Toolkit (MAT).

If you do insist on using the Memory Profiler, say you want real-time statistics or want to know the generations an object survives on the heap, then I'd suggesting giving the VisualVM Profiler documentation a read:https://visualvm.dev.java.net/profiler.html

Bits and a Pint

Daily Power Ups

Sunday, May 31, 2015

Performance Tuning with VisualVM