Wednesday, April 23, 2008

Diagnosing Java Memory Heap Leak with SAP Memory Analyzer

Hi dear reader, you might have been wondering why I haven’t been posting any new posts in quite a long time now. Well, you probably guessed the reason and it’s very common, my work has been really time-consuming. Indeed I joined a performance testing team and had much to learn. We are currently testing the performance and tuning a CRM application. As this is a J2EE application I learned a lot about the Java HotSpot Virtual Machine (especially the HP Ux flavor) memory performance and problems.

And here come this post, I thought I really need to drop a message about a very nice tool I found this week, SAP Memory Analyzer. This tool will analyze Java Heap dump (snapshots of the memory of a JVM) to help you fixing memory usage problems. This is a common feature of many programs (Sun Jhat, HP JMeter among others), yet I found Mem Analyzer very easy to use, full of functionality and something one really appreaciate after trying Sun JHat, it’s memory usage is quite low (and automatically managed) and it is very quick to open Heap Dump as opposed to some other tools. This TheServerSide article presents some striking features of this tool

If I was to stop here, it would be just be an Ad for this tool and nothing more, therefore I will give you a few hints that might help you diagnose Java application memory leak with this tool.

Making sure it is a Java Heap Leak

If your application crashs with an OutOfMemory error when it has been running with the same load for a few hours, then you probably have a Java Heap leak.

Whereas if the memory consumption of your java processes keeps increasing over time (once again with a constant load), it doesn’t necessary means that the Java Heap is leaking, indeed it might just be some JNI objects that you are using that are leaking (To put in a nutshell, the native heap is the heap used by your JVMs processes, it contains the Java Heap where the Java objects resides whereas non java object such as JNI objects resides in the remaining part of the native heap, thus the leak might be in either part of the native heap).

Because you might want to make sure that it is the Java heap that is leaking and because it is never a bad idea, you will probably need to log the Garbage Collection activity. I might post about how to do that latter, for now let’s just assume you added these commands to the command line of your JVMs (tested on a HP UV 1.4.2.x JVM) :

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:output.log

After running your application, you will get a output.log file that might be analyzed in the GCViewer application(http://www.tagtraum.com/gcviewer.html). If you want more detail, you will replace the Xloggc switch with the XVerbosegc:file:output.log switch and you will use the HP JMeter tool to analyze it (http://www.hp.com/products1/unix/java/hpjmeter/)

If the memory usage graph you get with either of these tools looks like a nice Saw Tooth shape (check http://en.wikipedia.org/wiki/Sawtooth_wave if I am not clear enough), then there is no Java Heap leak, you better check the other part the native heap. Aren’t your JNI Objects leaking, or is it the way your java code calls these objects?

If you see something else that tends to increase over time (probably something kind of like a SawTooth shape but that never goes down as much as it did the previous iteration) and more and more full garbage collection occurring until you eventually reach an OutOfMemory condition, then you probably have a leak (or memory retention as I think if more appropriate for these phenomenon when the memory is automatically managed).

Taking snapshots of the Java Heap

Now that we know we have got a memory retention in the Java Heap, we want to find what is responsible for it and hope we can solve it (I know, it’s so easy to blame external libraries J).

Here is a process I used recently, it’s only one among many other ways to analyze memory retentions, it might not be adapted to your case nor the easiest way, yet you might use it as a (good?) start. The general principle is always the same, taking a snapshot of your Java heap at two or more different times and diffing those.

First add the following attribute to your JVM : XX:HeapDumpOnCtrlBreak (it became available on the lattest 1.4.2.x JVMs).

With this attribute, you will be able to generate dump (snapshot of the memory) anytime you want. Anytime you will send a SIGQUIT signal (try kill -3 PID on Ux systems ) to the jvm process, instead of stopping it will generate a binary Heap Dump in a file entitled java_<pid>.hprof.<millitime>. By the way, mind the size of these dumps. If your heap is 1Go large, then the file will be 1Go as well.

A good way to find where the memory is retained is to take a snapshot when your application is in a nominal state (up, running with some users connected) and then wait a few hours (might be minutes if the heap is growing quickly or hours if it is very slowly gaining in size). Just be careful not to take the first snapshot while the system is warming up.

Finding what is retained in the Java Heap

You will then need to install SAP Memory Analyzer (available Here). Once it is done you can open the hprof generated beforehand. SAP Mem Analyzer will then parses and analyzes it.

This step might take quite some time (from a few minutes to half an hour for a typical heap on a typical computer). Yet don’t be afraid, you will be doing this step only once as the tool creates some files containing the result of the parsing along the HPROF file. If you open this hprof again later on, it will just read these parsing files.

TIP : You might want to do your test and snapshots on a specific JVM configuration with a smaller heap in order to quicken this step.

Once it has been parsed, SAP Memory Analyzer presents you with an histogram of the heap. While this can be useful you might want to jump to the Dominator view. This view will help you to analyze what is retained in the memory. If there are a handful of objects causing your memory retention, you will find those at the top when using this view. Linking these objects to your application code, and thus finding the cause of the problem might prove much easier that you think. I guess this is a good starting point for your analyze. It helped me quite a lot in a few cases even though it once proves quite useless (when we had tons of memory causing the retention). Another major feature that will help you is the Delta-Diffing feature available from the histogram view, it will compare two different Heap Dump. You might also want to try the "Calculate retained size" from the contextual menu.

You will see that this tool offers tons of functionality beyond the histogram and dominator view, you should really try as much of those as you can, it’s really useful and at least educating regarding your application memory usage.

One last thing, you might want to take a different approach for you analyze, that is instead of taking two snapshots before and after hours of nominal load, taking a snapshot before and after a very simple action on a platform you are the only to use might be your key to fixing your problem. It depends on the memory retention you are observing.

References: