MonadicT

I see dead objects!

Java References

Perhaps the first trick of performance improvement is caching. Every programmer knows that performance of programs can be improved if somethings are cached in memory rather than recalculating or reading from disk everytime. However, balancing caching with memory needs of rest of the program is tricky. You can afford to cache a lot when demand for memory from rest of the program is small. But the cache should trim some of its entries when memory is needed by other parts of her program.

Consider another scenario. You have a class, say F, you would like to extend and associate some data with it. But F is final and you don't have the ability to change it. Ideally, you would like to have an instance of some other class G associated with each instance of F. You could use a hash map with aF as key and associate aG with it. While this works, when you don't need aF anymore, aF and aG hang around in memory because they are in the hash map.

For our last scenario, consider an object which uses some resource. When the object is reclaimed by the garbage collector, the resource should also be freed. Ideally, we would like this to happen without tracking the object's lifetime and explicitly initiating resource cleanup. Note that finalize method intended for object cleanup action is considered a nonstarter.

In all these cases, we would like the garbage collector to understand a bit more about the references our code keeps. The Reference class and its subclasses in java.lang.ref package provide us with the tools to interact with the garbage collector. Each of the above scenarios described above can be handled with a special type of Reference object. The rest of the post shows some Java code to illustrate the general ideas.

Building a GC-friendly cache

Say, your software creates something that is expensive and we would like to improve the overall performance by caching instances of it once they are created. We might want to go about building a cache that allows us to do something like this.

Insert items into cache without worrying about the number of entries in it.
When the program runs short of memory, discard some or all cached entries.
When a cache lookup finds a discarded entry, treat it the same as item not in cache.

The trick to accomplishing all this without a lot of house keeping is to wrap the cached values in instances of java.lang.Ref.SoftReference. The garbage collector has a notion of variable strength references. A normal reference to an object, also called strong reference, is one where the object is reachable from the roots of the program through a list of normal references. Instances which have strong references to them are not eligible for garbage collection. An object with a SoftReference doesn't have any strong references to it and the garbage collector is free to reclaim it at its discretion.

The following gist illustrates how the garbage collector treats soft references to objects. Notice that a garbage collection cycle which reclaims enough memory will leave the soft references unclaimed. When memory is scarce, however, soft references will be reclaimed by garbage collection.

Here is the output produced by this code. You can see that soft references get cleared when memory is scarce.

$ java SoftReferenceTest
Populating cache with 1000 objects
Count of objects in cache after population: 1000
Count of objects in cache after System.gc: 1000
Triggering OutOfMemoryError
Count of objects in cache after OOME: 10
$

What would happen if we used java.lang.ref.WeakReference instead of SoftReference? The difference between soft and weak references is in how aggressively garbage collector reclaims them. JDK documentation for WeakReference states that when garbage collector determines an object to be only weakly reachable, it will clear all weak references to that object. In contrast, when garbage collector determines an object to be softly reachable, it may clear soft references to it.

Here is another gist illustrating the difference between SoftReference and WeakReference. This gist replaces all occurrences of SoftReference with WeakReference. From the run output, you can see that the weak references to objects don't seem to survive the process of building the hash map and System.gc() call clears them immediately.

Output from running the above gist.

$ java WeakReferenceTest
Populating cache with 1000 objects
Count of objects in cache after population: 278
Count of objects in cache after System.gc: 10
Triggering OutOfMemoryError
Count of objects in cache after OOME: 10
$

While SoftReference looks great for building caches which automatically get smaller when memory becomes scarce, what are WeakReferences good for? That is the subject of next section.

Weak Hash Map

The JDK documentation for WeakReference says, weak references are used to build canonicalized mappings. A canonicalized mapping to an object always resolves to the same object. Consider the use case described at the beginning where we would like to associate additional information with an instance of a final class. Obviously, we should be able to retrieve the same information that we associated with the object at all times. While we could use a hash map with the object as key, all the housekeeping will have to be done by the developer.

This where a WeakReference proves handy. Instead of storing the key of the object directly, the key is wrapped in WeakReference. Say we have an object instance and with it we associate some information. We can query the weak hash map with the object as the key. If the object instance gets reclaimed, the only reference to it would be its use as a key in the weak hash map. As we saw earlier, garbage collector reclaims weak references right away and the key is put on a queue. When any operations are invoked on the weak hash map, all weak references waiting in the queue are removed from the map.

Note that keys used in WeakHashMap should not have any embedded strong references to them in the code and nor in the additional information we associated with it as the value. If the key is an interned string, the key will never become weakly referenced.

Phantom Reference

Now we come to the last scenario discussed in the beginning. Java objects can have a finalize method in which you can perform cleanup actions. Java doesn't specify how soon the finalize method will be invoked and which thread will execute the method. Joshua Bloch, author of Effective Java, also points out the performance penalty of defining finalize and strongly recommends avoiding it. finalize seems to be one of the features of Java better left untouched. Thankfully, PhantomReference is designed to address the problem of executing cleanup actions in a safe manner.

Using PhantomReference requires a little more effort than other reference types. For starters, phantom references don't return their referent ever. That prevents you from creating strong references to an object that has been finalized. It also requires you to subclass PhantomReference and maintain data needed to cleanup the referent when it is reclaimed. Phantom references must also be associated with a reference queue and a background thread is required to monitor the queue for references enqueued by garbage collector and do the cleanup actions as necessary.

Here is a gist which shows how phantom references should be used. Real code should probably be structured so that each object needing cleanup action should return an instance of PhantomReference subclass encapsulating the cleanup data.