Sunday, February 20, 2011

Non-Trivial Union

In my last post I talked about placing some syntactic sugar around C++ unions. In that post I mentioned that having non-POD types - those with user-supplied or non-trivial constructors - were not allowed as a member of a union. In this post I describe how to overcome that limitation and why that might be useful.

First, to illustrate what I mean about unions not being allowed to contain non-POD types look at the following example:

struct Test {
int i;
Test () : i(0) {}
};

// ... and, later, in main
union NotAllowed {
Test test;
unsigned char bytes[sizeof(Test)];
};


Compiling that will present an error similar to

member ‘Test main()::NotAllowed::test’ with constructor not allowed in union


The workaround to this limitation is to store a pointer to the type and manage the memory external to the union. Well, it would be really nice to support RAII with unions since they actually support the constructor and destructor mechanism but the storing-a-pointer-to-type approach defeats this. I wanted to see if there was any way to use RAII with C++ unions.

I started out by considering using the union constructor to do a memcpy on the size of the complex type to itself but the obvious failure there is that the constructor/destructor of the type is still managed outside the scope of the union. In addition to that, memcpy would provide only a copy of the memory of the object so I could not manipulate the object itself.

After a little reading I found out about a cool mechanism called placement new operator. What placement new will do is allow you to provide a memory location to store an object and use that instead of what operator new would normally provide for you. The best part about that is that the constructor of the object is invoked. I looked around a bit and from what I saw placement new is generally used to provide memory pools or other memory management mechanisms outside of the default behavior of new and delete. So, continuing with my example from last time, I derived the following

#include <stdexcept>

template< typename T >
union NonTrivialUnion {
private:
T * ptr_;
unsigned char bytes_[sizeof(T*) + sizeof(T)];
public:

NonTrivialUnion (const T& t) {
ptr_ = new (bytes_ + sizeof(T*)) T(t);
}

NonTrivialUnion () {
ptr_ = new (bytes_ + sizeof(T*)) T;
}

~NonTrivialUnion () { ptr_->T::~T(); }

T * operator-> () { return ptr_; }

// ... remainder of implementation
// ... operator[], operator=, ...
};


The placement new is in the union constructor

ptr_ = new (bytes_ + sizeof(T*)) T(t);


You'll also notice in the destructor of the union that I explicitly call the destructor of the object I am maintaining. This is necessary. As I mentioned, the storage location is not provided by new so delete would do undefined things with it and as delete is responsible for calling an object's destructor it follows that the destructor needs to be called elsewhere. Hence, in the destructor of the union there is

~NonTrivialUnion () { ptr_->T::~T(); }


I'm also being sneaky by extending the size of the union by the size of a pointer so I have some way to access the memory as you would with a pointer to an object. This does break the strict rule of what a union is: the size is no longer exactly that of the object. However, with that caveat, we can now do the following:

#include <iostream>
#include "ntu.hh"

struct Test {
int i;
Test () : i(0) {}
void echo(const char * msg) { std::cout << msg << std::endl; }
};

int main () {
NonTrivialUnion< Test > ntu;
ntu->echo ("Allowed!");
return 0;
}


Notice that I am not just storing a copy of the memory of an object as I would be with memcpy; that memory is the object itself entirely managed by the union. Constructors and destructors are properly called and you can call through to the object to invoke it's member functions. Pretty cool.

One note about the above example: it relies on an empty constructor for the object. There is support for a non-empty constructor using the following approach

Test t;
NonTrivialUnion< Test > ntu(t);


But in that case a copy-constructor is used and exactly two objects are created: one external to the union and one owned by the union.

There needs to be much more thought about how to implement the union robustly. For example, operator= for unions supporting more than a single type; and how to dynamically determine the largest type among multiple members; and so on. Regardless, I think this is an interesting and useful tool.