Monday, 9 February 2009

Mercurial vs Git

I've had an epiphany and the epiphany has a name. Mercurial, a version control tool, has entered my life and I am, at last, complete. If you're a geek, and particularly if you are a sub-30 Open Source geek, you might have heard of the great version control war, if not fought in it yourself. Well I've taken up arms and I'm here to recruit you.

I will spare you the detail of what a dvcs is and just use my blog to opine, letting you read around the subject yourself, so I'll dive straight into comparing the main two contenders, Mercurial and Git.

I'm going to briefly cover a few of the advantages of Mercurial. I will update this comparison as I discover the inevitable complaints - if you're reading the comments please be aware that they might be responding to my errors and omissions so please do not engage in a flamewar here.

The Battle So Far

There is a commonly referenced list of reasons to use git and I'll begin by countering its author's perception that Mercurial doesn't score as well as git.

Mercurial Matches Up

According to that list mercurial matches up on some items:
  1. Everything is Local
  2. Fast
  3. Small
  4. Distributed
  5. Any Workflow
  6. Easy to Learn
but I'll go a little further and say that Mercurial surpasses Git on:
  1. Small (git's windows build is huge)
  2. Easy to Learn (git is vastly more complex to comprehend and the consequences of some features and actions are very non-obvious)
The list also says mercurial doesn't do:
  1. Cheap Local Branching
But it does and always has done, because it always had multiple head revisions, but what it didn't do was make it trivial to both name some of the head revisions and automatically update, upon checkin, the revision that a name references to remain at the head of that history branch. Now mercurial does make it easy with the bookmarks extension.

Mercurial Doesn't Focus on Pointless Things

There is a criterion in the list on which mercurial may well fall behind Git, "GitHub," but I argue that GitHub is mostly pointless, people don't socially network much based on which tool they're using to work, they network based on what they are creating and mercurial has mailing-lists a wiki and a freenode channel. These are great tools for networking around mercurial.

Mercurial Has Commit Support Tools

The list says Mercurial doesn't match up on "The Staging Area" but Git's Staging Area is complex to understand and Mercurial has the attic extension which is much simpler to understand and very flexible. The high flexibility combined with ease of understanding of Mercurial makes Git lose this round too.

Mercurial Has Excellent Trustiness

Much like Git, Mercurial has tools for creating alternative, more descriptive deconstruction of the changes resulting from a line of history and saving that as a second line of history branching from an earlier point where the two lines diverge, but the first line must remain. While it is often desirable to remove the old line of history, mercurial makes you take an extra step to ring home the gravity of the action and the effect it will have on collaborators.

Mercurial makes it clear to the user that this action actually constructs a /different/ repository to the one that collaborators have been working against. It does this by the straightforward and intuitive means of requiring the user to create a new repository from the line of history they want to keep, then requiring them to actually replace the old repository with the new one. Easy, takes a few seconds longer, but clear, effective and safe. The user also gets a chance to review the effect of their actions before committing to what they have done by giving the user two wholly separate, comparable repositories.

Monday, 8 September 2008

U+2060: Word Joiner

Some scripts have places that are naturally a word boundary, but should not be for some particular instance. For that purpose Unicode provides the WJ (Word Joiner) special character at codepoint U+2060 which you insert wherever you don't want a word boundary:

encoding formEndiannessS0S1S2S3
UTF-32be0x000x000x200x60
le0x600x200x000x00
UTF-16be0x200x60
le0x600x20
UTF-80xE20x810xA0


But you'd be surprised how few programs support this vital unicode feature. Try getting Japanese to wrap between words in a .NET Winforms control.

Friday, 22 August 2008

Padding

A couple of C++ snippets for padding, one simple, one might be easier for a compiler to optimise though you'll have to do your own experiments to see which is faster for you in your program. Both of these are well defined for any value of (in, n) where n > 0 and sensibly defined where n < max_of_size_t + 1 - n. This will not work if either of the two types used are signed or non-integral.
inline size_t roundup_to(size_t n, size_t in)
{
in += n - 1;
in /= n;
in *= n;
return in;
}
inline size_t roundup_to(size_t n, size_t in)
{
in += n - 1;
in -= in % n;
return in;
}

Thursday, 24 July 2008

boost::mpl vs. variadic templates

Boost provides an interesting library, mpl, for metaprogramming at compile time to describe the construction of complex types from basic components. But C++0x, the next version of C++ due sometime in the next two years, has a scheduled feature called variadic templates.

This feature allows you to define class templates like so:
template<typename... T>
class my_template {};

my_template<int, double, my_template<float, char>> var;

But defining an mpl type in terms of one of these is not supported by boost, so I wrote a quick class to translate - although it may not be the fastest way to implement it I'm sure there are simple improvements to be made - if you've got some suggestions, please post them in the comments:

#include <boost/preprocessor/iterate.hpp>
/* The above line is because there is a bug in boost mpl
** which is triggered by gcc trunk where previous compilers
** (both gcc and otherwise) seem to accept some invalid
** code
*/

#include <boost/mpl/vector.hpp>
#include <boost/mpl/list.hpp>
#include <boost/mpl/map.hpp>

using boost::mpl::placeholders::_1;
using boost::mpl::placeholders::_2;
using namespace boost;

template<typename S, class A, typename... T>
struct fold;

template<typename S, class A, typename H, typename... T>
struct fold<S, A, H, T...> :
fold<mpl::apply<A,S,H>, A, T...>
{};

template<typename S, class A>
struct fold<S, A> :
S
{};


template<template<typename...> class C, typename S, typename V>
struct append;


template<template<typename...> class C, typename... T>
struct as_mpl :
fold<C<>, append<C, _1, _2>, T...>
{};



template<typename S, typename V>
struct append<mpl::list, S, V> :
mpl::push_back<S, V>
{};

template<typename S, typename V>
struct append<mpl::map, S, V> :
mpl::insert<S, V>
{};

template<typename S, typename V>
struct append<mpl::vector, S, V> :
mpl::push_back<S, V>
{};

You can add mappings to other mpl containers by specialising append<typename, typename, typename>.

This is used via as_mpl<mpl::vector, type1, type2, type3> and the result is equivalent to mpl::vector<type1, type2, type3>.

Managed-wrapped unmanaged wrapper

A quick post to show a C++/CLI template for containing an unmanaged resource RAII instance variable in a managed object that uses the destruction pattern in my earlier post.
template<typename T>
ref class NWrap sealed {
int disposed;
T* const obj;
!NWrap() { delete(obj); }
public:
explicit NWrap(T const& src) : obj(new T(src)) {}
~NWrap() {
if (System.Threading.Interlocked.
CompareExchange(disposed, 1, 0))
return;

this->!Base();
}
};

Your RAII class must by copyable (or support auto_ptr style move). This can then be used as
NWrap<raii_type> membername;
and initialised from a constructor initialiser as:
: membername(raii_type(args...)) {}

Wednesday, 23 July 2008

IDisposable in C++/CLI

A formula for a disposable class in C++/CLI

ref class Base {

int disposed;
ManagedResource^ managed_resource;
ManagedResource auto_managed_resource;
void* unmanaged_resource;

!Base() { free(unmanaged_resource); }

public:
~Base() {
if (System.Threading.Interlocked.
CompareExchange(disposed, 1, 0))
return;

managed_resource.Dispose();
this->!Base();
}
};

This class implements IDisposable, disposes both managed resources and the unmanaged resource. If it were a derived class it would correctly dispose and finalise its base class automatically (C++/CLI does that for you).

Note that reference classes can be defined without the carat "^" and they will be disposed when they go out of scope like in real C++.

Rules of thumb:

  1. Only release unmanaged resources in !Base

  2. Never perform any slow operations in !Base or which might depend on ordering with respect to any other part of the program. It would be agood idea to not call any dispose methods.

  3. Do not allow a reference to this to be saved somewhere from !Base. It would be a good idea to not use keyword "this" in !Base to avoid it being saved in a function that you use but don't know the implementation.

  4. Do not call base class !Base or ~Base, let the compiler add them for you.

  5. Treat all breaches of these rules of thumb with suspicion.

Love, Love Dispose

At the time of writing, Wikipedia lists 3 benefits of garbage collection, that of eliminating or substantially reducing the probability, or impact of:
  1. Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is used.
  1. Double free bugs, which occur when the program attempts to free a region of memory that is already free.
  1. Certain kinds of memory leaks, in which a program fails to free memory that is no longer referenced by any variable, leading over time to memory exhaustion.
Indeed, these are usually the only benefits, as the garbage collector cannot be relied upon to collect a resource wrapper at the right time. I use the word "resource" to refer to not only unmanaged resource, but also to something like a static reference to a data structure that might be owned by one object at a time. If the owning object loses its last reference without relinquishing ownership of the static reference other objects cannot take ownership of the reference. Operations that require those objects to take ownership cannot complete and knowledge of that must be fed back to the use of the program.

But the user shouldn't have been given an option to request the performance of that operation since the program's own internal state forbade it at that very moment. The problem is that the previous operation that looked like it had relinquished ownership of the static reference fed back knowledge of its completion to the user allowing the user to attempt another operation when the user should have been told that the operation is not yet finished right up until the program state had reset.

So that class needed to implement IDispose. That's nice and easy right? If a resource (such as ownership of shared data) is held by an object, its class must be disposable, otherwise it doesn't need to be, right?

Wrong.

If the object can be returned from a factory and a factory can be passed into a routine then the routine must be written to handle resources correctly - including knowing when the resources have definitely been released (so they can feed back correct completion knowledge) and when they definitely haven't (so they can work with the resources). But if the factory is derivable or an interface and if the resource-owning object's class is derivable, or is an interface - a common pattern and frequently desirable in object oriented styles, which is idiomatic for .NET - the routine cannot know whether it is safe to leave freeing to the garbage collector so the interface of the object must be explicitly disposable and must be explicitly disposed.

But the problem is worse than this. If the problem involves mutexes, deadlocks can easily ensue and these problems are neither easy to debug nor easy to spot.

This is all because Garbage Collection is defined in terms of memory but memory is only one of an infinite variety of resource. That's why I advise making all non-sealed classes and all interfaces implement IDisposable except those that can be brought back to life - which should have a suitable alternative interface instead.