Tuesday, November 25, 2008

Sharepoint - Sorting people search results by name

Hello Dear Reader, for this new post. First some personal news, after working for quite sometime on load testing a java CRM application, I wanted to move on and finally work with the newest Microsoft technologies. And that’s what I did as I managed to land a job at a local Microsoft Gold Partner in Nantes, France. The team I ‘m part of is mainly working integrating the Sharepoint solution for local partner, and I got to say it’s a nice opportunity so far.


For my first Sharepoint post I’ll speak about the limitation of the integrated People Search functionality. With MOSS 2007 comes the Enterprise Search Center that allows you to gives an easy acces to the search engine to your users. This Sharepoint site (because that’s exactly what it is) comewith two search mode OOB:

  • a standard search for searching through documents, lists, pages, ... basically whatever you let the Search Engine crawl and
  • a people search for, you guessed it, searching people based on any properties you want among the profile properties defined in the SSP (Shared Service Provider)

We recently had a problem where a client wanted to:

  • Display the list of employee for each office, sorted by name/firstname
  • Be able to search a client based on its name/firstname

The problem

We first imagined to fulfill these two needs with a single item, a people search page, where results per office would be obtained by adding the office as a searchable property. And that’s when we realized a huge shortcoming of the Search Center:

People search results can’t be sorted on arbitrary fields you will define, such as the name/firstname. Two standard Sorting modes are available and that’s it. Results can be sorted based on the ranking computed by the Search server. Results can also be sorted by social distance, the principle is fine enough, if you are looking for a John, it will first display the John who is your direct colleague instead of the John who is working in another department of another office.

The almost working workaround

Starting from this realization, we figured we might solve this problem by doing a quick sorting in the XSL displaying the results. This way, the sorting could be done on any fields necessary. While this solution is perfect if you have less than 50 results, it is not a full working solution.

The thing is, the search engine first returns the results. These results are then divided (by the Search Core Results web part) in x collections of y elements (people) where y is a webpart parameter capped at 50. Therefore if we have more than 50 matchs, the sorting will be done separately on each page in a discontinued manner.

That is , a hypothetical “Albert Alden” might be on page 2 while “Brittany Bern” is on page 1 if his rating is lower than Brittany’s. But we know for sure that “Cindy Costner” which is on page 2 will be presented after Albert whatever her rating is because the sorting on each page is done lexicographically by our XSL.

The working workaround

Keeping these limitations in mind we had to devise a quick solution. We chose to keep the standard people search for standard people search, considering the size of the company and the fact way it is going to be used. Yet we still needed a way to present the users with a list of all the employees of any given office (at this point, it’s probably needless to remind you that we REALLY want this list to be sorted by name/firstname J). That’s when we stumbled upon this superb article by Dietmar Kurok (http://www.codeproject.com/KB/sharepoint/DepartmentPeopleViewer1.aspx ).

The solution is to use the Web Service exposed by the search engine to get the information we need sorted as we need and then present it with a XSL of our own. In our case we used the same principle as the one used in Dietmar Kuork’s article, however we chose to generate our QueryEx input parameter (the Querypacket) with a very nice tool available on CodePlex. The tool is named SharePoint Search Service Tool and will be found at http://www.codeplex.com/SharePointSearchServ.

The general idea of the process is to:


I hope you enjoyed this first Sharepoint post and I hope to be able to post more often on this nice product in the future.

Wednesday, April 23, 2008

Diagnosing Java Memory Heap Leak with SAP Memory Analyzer

Hi dear reader, you might have been wondering why I haven’t been posting any new posts in quite a long time now. Well, you probably guessed the reason and it’s very common, my work has been really time-consuming. Indeed I joined a performance testing team and had much to learn. We are currently testing the performance and tuning a CRM application. As this is a J2EE application I learned a lot about the Java HotSpot Virtual Machine (especially the HP Ux flavor) memory performance and problems.

And here come this post, I thought I really need to drop a message about a very nice tool I found this week, SAP Memory Analyzer. This tool will analyze Java Heap dump (snapshots of the memory of a JVM) to help you fixing memory usage problems. This is a common feature of many programs (Sun Jhat, HP JMeter among others), yet I found Mem Analyzer very easy to use, full of functionality and something one really appreaciate after trying Sun JHat, it’s memory usage is quite low (and automatically managed) and it is very quick to open Heap Dump as opposed to some other tools. This TheServerSide article presents some striking features of this tool

If I was to stop here, it would be just be an Ad for this tool and nothing more, therefore I will give you a few hints that might help you diagnose Java application memory leak with this tool.

Making sure it is a Java Heap Leak

If your application crashs with an OutOfMemory error when it has been running with the same load for a few hours, then you probably have a Java Heap leak.

Whereas if the memory consumption of your java processes keeps increasing over time (once again with a constant load), it doesn’t necessary means that the Java Heap is leaking, indeed it might just be some JNI objects that you are using that are leaking (To put in a nutshell, the native heap is the heap used by your JVMs processes, it contains the Java Heap where the Java objects resides whereas non java object such as JNI objects resides in the remaining part of the native heap, thus the leak might be in either part of the native heap).

Because you might want to make sure that it is the Java heap that is leaking and because it is never a bad idea, you will probably need to log the Garbage Collection activity. I might post about how to do that latter, for now let’s just assume you added these commands to the command line of your JVMs (tested on a HP UV 1.4.2.x JVM) :

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:output.log

After running your application, you will get a output.log file that might be analyzed in the GCViewer application(http://www.tagtraum.com/gcviewer.html). If you want more detail, you will replace the Xloggc switch with the XVerbosegc:file:output.log switch and you will use the HP JMeter tool to analyze it (http://www.hp.com/products1/unix/java/hpjmeter/)

If the memory usage graph you get with either of these tools looks like a nice Saw Tooth shape (check http://en.wikipedia.org/wiki/Sawtooth_wave if I am not clear enough), then there is no Java Heap leak, you better check the other part the native heap. Aren’t your JNI Objects leaking, or is it the way your java code calls these objects?

If you see something else that tends to increase over time (probably something kind of like a SawTooth shape but that never goes down as much as it did the previous iteration) and more and more full garbage collection occurring until you eventually reach an OutOfMemory condition, then you probably have a leak (or memory retention as I think if more appropriate for these phenomenon when the memory is automatically managed).

Taking snapshots of the Java Heap

Now that we know we have got a memory retention in the Java Heap, we want to find what is responsible for it and hope we can solve it (I know, it’s so easy to blame external libraries J).

Here is a process I used recently, it’s only one among many other ways to analyze memory retentions, it might not be adapted to your case nor the easiest way, yet you might use it as a (good?) start. The general principle is always the same, taking a snapshot of your Java heap at two or more different times and diffing those.

First add the following attribute to your JVM : XX:HeapDumpOnCtrlBreak (it became available on the lattest 1.4.2.x JVMs).

With this attribute, you will be able to generate dump (snapshot of the memory) anytime you want. Anytime you will send a SIGQUIT signal (try kill -3 PID on Ux systems ) to the jvm process, instead of stopping it will generate a binary Heap Dump in a file entitled java_<pid>.hprof.<millitime>. By the way, mind the size of these dumps. If your heap is 1Go large, then the file will be 1Go as well.

A good way to find where the memory is retained is to take a snapshot when your application is in a nominal state (up, running with some users connected) and then wait a few hours (might be minutes if the heap is growing quickly or hours if it is very slowly gaining in size). Just be careful not to take the first snapshot while the system is warming up.

Finding what is retained in the Java Heap

You will then need to install SAP Memory Analyzer (available Here). Once it is done you can open the hprof generated beforehand. SAP Mem Analyzer will then parses and analyzes it.

This step might take quite some time (from a few minutes to half an hour for a typical heap on a typical computer). Yet don’t be afraid, you will be doing this step only once as the tool creates some files containing the result of the parsing along the HPROF file. If you open this hprof again later on, it will just read these parsing files.

TIP : You might want to do your test and snapshots on a specific JVM configuration with a smaller heap in order to quicken this step.

Once it has been parsed, SAP Memory Analyzer presents you with an histogram of the heap. While this can be useful you might want to jump to the Dominator view. This view will help you to analyze what is retained in the memory. If there are a handful of objects causing your memory retention, you will find those at the top when using this view. Linking these objects to your application code, and thus finding the cause of the problem might prove much easier that you think. I guess this is a good starting point for your analyze. It helped me quite a lot in a few cases even though it once proves quite useless (when we had tons of memory causing the retention). Another major feature that will help you is the Delta-Diffing feature available from the histogram view, it will compare two different Heap Dump. You might also want to try the "Calculate retained size" from the contextual menu.

You will see that this tool offers tons of functionality beyond the histogram and dominator view, you should really try as much of those as you can, it’s really useful and at least educating regarding your application memory usage.

One last thing, you might want to take a different approach for you analyze, that is instead of taking two snapshots before and after hours of nominal load, taking a snapshot before and after a very simple action on a platform you are the only to use might be your key to fixing your problem. It depends on the memory retention you are observing.