Monday, 8 September 2008

U+2060: Word Joiner

Some scripts have places that are naturally a word boundary, but should not be for some particular instance. For that purpose Unicode provides the WJ (Word Joiner) special character at codepoint U+2060 which you insert wherever you don't want a word boundary:

encoding formEndiannessS0S1S2S3
UTF-32be0x000x000x200x60
le0x600x200x000x00
UTF-16be0x200x60
le0x600x20
UTF-80xE20x810xA0


But you'd be surprised how few programs support this vital unicode feature. Try getting Japanese to wrap between words in a .NET Winforms control.

Friday, 22 August 2008

Padding

A couple of C++ snippets for padding, one simple, one might be easier for a compiler to optimise though you'll have to do your own experiments to see which is faster for you in your program. Both of these are well defined for any value of (in, n) where n > 0 and sensibly defined where n < max_of_size_t + 1 - n. This will not work if either of the two types used are signed or non-integral.
inline size_t roundup_to(size_t n, size_t in)
{
in += n - 1;
in /= n;
in *= n;
return in;
}
inline size_t roundup_to(size_t n, size_t in)
{
in += n - 1;
in -= in % n;
return in;
}

Thursday, 24 July 2008

boost::mpl vs. variadic templates

Boost provides an interesting library, mpl, for metaprogramming at compile time to describe the construction of complex types from basic components. But C++0x, the next version of C++ due sometime in the next two years, has a scheduled feature called variadic templates.

This feature allows you to define class templates like so:

template<typename... T>
class my_template {};

my_template<int, double, my_template<float, char>> var;

But defining an mpl type in terms of one of these is not supported by boost, so I wrote a quick class to translate - although it may not be the fastest way to implement it I'm sure there are simple improvements to be made - if you've got some suggestions, please post them in the comments:


#include <boost/preprocessor/iterate.hpp>
/* The above line is because there is a bug in boost mpl
** which is triggered by gcc trunk where previous compilers
** (both gcc and otherwise) seem to accept some invalid
** code
*/

#include <boost/mpl/vector.hpp>
#include <boost/mpl/list.hpp>
#include <boost/mpl/map.hpp>

using boost::mpl::placeholders::_1;
using boost::mpl::placeholders::_2;
using namespace boost;

template<typename S, class A, typename... T>
struct fold;

template<typename S, class A, typename H, typename... T>
struct fold<S, A, H, T...> :
  fold<mpl::apply<A,S,H>, A, T...>
{};

template<typename S, class A>
struct fold<S, A> :
  S
{};


template<template<typename...> class C, typename S, typename V>
struct append;


template<template<typename...> class C, typename... T>
struct as_mpl :
  fold<C<>, append<C, _1, _2>, T...>
{};



template<typename S, typename V>
struct append<mpl::list, S, V> :
  mpl::push_back<S, V>
{};

template<typename S, typename V>
struct append<mpl::map, S, V> :
  mpl::insert<S, V>
{};

template<typename S, typename V>
struct append<mpl::vector, S, V> :
  mpl::push_back<S, V>
{};

You can add mappings to other mpl containers by specialising append<typename, typename, typename>.

This is used via as_mpl<mpl::vector, type_pack...> and the result is equivalent to mpl::vector<type1, type2, type3>.

Managed-wrapped unmanaged wrapper

A quick post to show a C++/CLI template for containing an unmanaged resource RAII instance variable in a managed object that uses the destruction pattern in my earlier post.

template<typename T>
ref class NWrap sealed {
 int disposed;
 T* const obj;
 !NWrap() { delete(obj); }
public:
 explicit NWrap(T const& src) : obj(new T(src)) {}
 ~NWrap() {
  if (System.Threading.Interlocked.
           CompareExchange(disposed, 1, 0))
   return;

  this->!NWrap();
 }
};

Your RAII class must by copyable (or support auto_ptr style move). This can then be used as

NWrap<raii_type> membername;
and initialised from a constructor initialiser as:

: membername(raii_type(args...)) {}

Wednesday, 23 July 2008

IDisposable in C++/CLI

A formula for a disposable class in C++/CLI

ref class MyClass : Base {

  int disposed;
  ManagedResource^ managed_resource;
  ManagedResource auto_managed_resource;
  void* unmanaged_resource;

  !MyClass() { free(unmanaged_resource); }

public:
  ~MyClass() {
    if (System.Threading.Interlocked.
            CompareExchange(disposed, 1, 0))
      return;

    managed_resource.Dispose();
    this->!MyClass();
  }
};

This class implements IDisposable, disposes both managed resources and the unmanaged resource. If it were a derived class it would correctly dispose and finalise its base class automatically (C++/CLI does that for you).

Note that reference classes can be defined without the carat "^" and they will be disposed when they go out of scope like in real C++.
Rules of thumb:
  1. Only release unmanaged resources in !MyClass
  2. Never perform any slow operations in !MyClass or which might depend on ordering with respect to any other part of the program. It would be agood idea to not call any dispose methods.
  3. Do not allow a reference to this to be saved somewhere from !MyClass. It would be a good idea to not use keyword "this" in !MyClass to avoid it being saved in a function that you use but don't know the implementation.
  4. Do not call a base class !Base or ~Base - let the compiler add them for you.
  5. Treat all breaches of these rules of thumb with suspicion.

Love, Love Dispose

  At the time of writing, Wikipedia lists 3 benefits of garbage collection, that of eliminating or substantially reducing the probability, or impact of:

  1. Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is used.
  2. Double free bugs, which occur when the program attempts to free a region of memory that is already free.
  3. Certain kinds of memory leaks, in which a program fails to free memory that is no longer referenced by any variable, leading over time to memory exhaustion.

  Indeed, these are usually the only benefits, as the garbage collector cannot be relied upon to collect a resource wrapper at the right time. I use the word "resource" to refer not only to unmanaged resources, but also to things like a global data structure that may be owned by one object at a time.

  In the above case, if an object becomes unreferenced while retaining ownership of the global data structure no other object can take ownership of it until the garbage collector finalises the previous owner. Any operation that requires another object to take ownership of the global data structure cannot complete until the garbage collector runs. Knowledge of that state must often be fed back to the user of the program or to any controlling component.

  In the case of a typical desktop application, the user shouldn't have an option to request the operation that requires the data structure until it is available for use because it is not normally acceptable to stall for many seconds and nor is a transient failure acceptable for that kind of application.

  The problem is that the previously completed operation which looked like it had relinquished ownership of the global data structure indicates its completion to the user agent/GUI which is the only notification that allows the user agent to offer a new operation. But the user shouldn't be offered an operation that uses the global data structure until it is released.

  The object that owned the global data structure needed to implement IDispose. That's nice and easy right? If a resource (such as ownership of shared data) is held by an object, its class must be disposable, otherwise it doesn't need to be, right?

  That's fine if it's a small application developed end-to-end by one or two developers. But in a larger system algorithms involve factories or thirdparty assemblies.

  If an algorithm is going to ask a factory to create the object that holds the global data structure then it doesn't get to know whether the object will own non-memory resources. The routine must be written to handle resources correctly - including knowing when the resources have definitely been released and when they definitely haven't.

  If the factory reference and the object reference that will be created are not final, are abstract, or are interfaces - a common pattern and often preferred in medium-large programs and in third-party assemblies - then the routine that uses them cannot know whether it is safe to abandon the release of any resources to the garbage collector. That means the interface of the object must be explicitly disposable and must be explicitly disposed.

  But the problem is worse than this. If the problem involves synchronisation primitives, deadlocks can easily ensue and these problems are neither easy to debug nor easy to spot.