Wednesday 23 July 2008

Love, Love Dispose

  At the time of writing, Wikipedia lists 3 benefits of garbage collection, that of eliminating or substantially reducing the probability, or impact of:

  1. Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is used.
  2. Double free bugs, which occur when the program attempts to free a region of memory that is already free.
  3. Certain kinds of memory leaks, in which a program fails to free memory that is no longer referenced by any variable, leading over time to memory exhaustion.

  Indeed, these are usually the only benefits, as the garbage collector cannot be relied upon to collect a resource wrapper at the right time. I use the word "resource" to refer not only to unmanaged resources, but also to things like a global data structure that may be owned by one object at a time.

  In the above case, if an object becomes unreferenced while retaining ownership of the global data structure no other object can take ownership of it until the garbage collector finalises the previous owner. Any operation that requires another object to take ownership of the global data structure cannot complete until the garbage collector runs. Knowledge of that state must often be fed back to the user of the program or to any controlling component.

  In the case of a typical desktop application, the user shouldn't have an option to request the operation that requires the data structure until it is available for use because it is not normally acceptable to stall for many seconds and nor is a transient failure acceptable for that kind of application.

  The problem is that the previously completed operation which looked like it had relinquished ownership of the global data structure indicates its completion to the user agent/GUI which is the only notification that allows the user agent to offer a new operation. But the user shouldn't be offered an operation that uses the global data structure until it is released.

  The object that owned the global data structure needed to implement IDispose. That's nice and easy right? If a resource (such as ownership of shared data) is held by an object, its class must be disposable, otherwise it doesn't need to be, right?

  That's fine if it's a small application developed end-to-end by one or two developers. But in a larger system algorithms involve factories or thirdparty assemblies.

  If an algorithm is going to ask a factory to create the object that holds the global data structure then it doesn't get to know whether the object will own non-memory resources. The routine must be written to handle resources correctly - including knowing when the resources have definitely been released and when they definitely haven't.

  If the factory reference and the object reference that will be created are not final, are abstract, or are interfaces - a common pattern and often preferred in medium-large programs and in third-party assemblies - then the routine that uses them cannot know whether it is safe to abandon the release of any resources to the garbage collector. That means the interface of the object must be explicitly disposable and must be explicitly disposed.

  But the problem is worse than this. If the problem involves synchronisation primitives, deadlocks can easily ensue and these problems are neither easy to debug nor easy to spot.

No comments: