Sunday, February 20, 2011

Non-Trivial Union

In my last post I talked about placing some syntactic sugar around C++ unions. In that post I mentioned that having non-POD types - those with user-supplied or non-trivial constructors - were not allowed as a member of a union. In this post I describe how to overcome that limitation and why that might be useful.

First, to illustrate what I mean about unions not being allowed to contain non-POD types look at the following example:

struct Test {
    int i;
    Test () : i(0) {}
};

// ... and, later, in main
union NotAllowed {
    Test test;
    unsigned char bytes[sizeof(Test)];
};

Compiling that will present an error similar to

member ‘Test main()::NotAllowed::test’ with constructor not allowed in union

The workaround to this limitation is to store a pointer to the type and manage the memory external to the union. Well, it would be really nice to support RAII with unions since they actually support the constructor and destructor mechanism but the storing-a-pointer-to-type approach defeats this. I wanted to see if there was any way to use RAII with C++ unions.

I started out by considering using the union constructor to do a memcpy on the size of the complex type to itself but the obvious failure there is that the constructor/destructor of the type is still managed outside the scope of the union. In addition to that, memcpy would provide only a copy of the memory of the object so I could not manipulate the object itself.

After a little reading I found out about a cool mechanism called placement new operator. What placement new will do is allow you to provide a memory location to store an object and use that instead of what operator new would normally provide for you. The best part about that is that the constructor of the object is invoked. I looked around a bit and from what I saw placement new is generally used to provide memory pools or other memory management mechanisms outside of the default behavior of new and delete. So, continuing with my example from last time, I derived the following

#include <stdexcept>

template< typename T >
union NonTrivialUnion {
    private:
        T * ptr_;
        unsigned char bytes_[sizeof(T*) + sizeof(T)];
    public:

        NonTrivialUnion (const T& t) {
            ptr_ = new (bytes_ + sizeof(T*)) T(t);
        }

        NonTrivialUnion () {
            ptr_ = new (bytes_ + sizeof(T*)) T;
        }

        ~NonTrivialUnion () { ptr_->T::~T(); }

        T * operator-> () { return ptr_; }

        // ... remainder of implementation 
        // ... operator[], operator=, ...
};

The placement new is in the union constructor

ptr_ = new (bytes_ + sizeof(T*)) T(t);

You'll also notice in the destructor of the union that I explicitly call the destructor of the object I am maintaining. This is necessary. As I mentioned, the storage location is not provided by new so delete would do undefined things with it and as delete is responsible for calling an object's destructor it follows that the destructor needs to be called elsewhere. Hence, in the destructor of the union there is

~NonTrivialUnion () { ptr_->T::~T(); }

I'm also being sneaky by extending the size of the union by the size of a pointer so I have some way to access the memory as you would with a pointer to an object. This does break the strict rule of what a union is: the size is no longer exactly that of the object. However, with that caveat, we can now do the following:

#include <iostream>
#include "ntu.hh"

struct Test {
    int i;
    Test () : i(0) {}
    void echo(const char * msg) { std::cout << msg << std::endl; }
};

int main () {
    NonTrivialUnion< Test > ntu;
    ntu->echo ("Allowed!");
    return 0;
}

Notice that I am not just storing a copy of the memory of an object as I would be with memcpy; that memory is the object itself entirely managed by the union. Constructors and destructors are properly called and you can call through to the object to invoke it's member functions. Pretty cool.

One note about the above example: it relies on an empty constructor for the object. There is support for a non-empty constructor using the following approach

Test t;
NonTrivialUnion< Test > ntu(t);

But in that case a copy-constructor is used and exactly two objects are created: one external to the union and one owned by the union.

There needs to be much more thought about how to implement the union robustly. For example, operator= for unions supporting more than a single type; and how to dynamically determine the largest type among multiple members; and so on. Regardless, I think this is an interesting and useful tool.

Thursday, February 17, 2011

C++ unions

I was reading Bruce Eckel's Thinking in C++ the other day and I came across a bit of information that I was not aware of: unions can be treated just like classes. That is, they can have constructors and member functions and access control.

I think that this is a neat feature. Combined with the use of templates and operator overloading one can create a shared memory space (just like you normally would with unions) but with a much nicer interface. Consider the following:

#include <stdexcept>
#include <sstream>

template< typename T >
union Bytes {
    private:
        T data_;
        unsigned char bytes_[sizeof(T)];
    public:

        Bytes(const T& t) : data_(t) {}

        T& value () { return data_; }

        size_t size () const { return sizeof(T); }

        void operator= (const T& t) { data_ = t; }

        unsigned char& operator[] (int i) {
            if (i < 0 || i >= size()) {
                std::stringstream ss;
                ss << "Invalid index " << i << " ";
                ss << "(sizeof(T) : " << sizeof(T) << ")";
                throw std::out_of_range (ss.str ());
            }
            return bytes_[i];
        }
};

Now you can have a shared memory space that can be accessed with operator[] for the individual bytes of any POD type (unions do not support objects with constructors) and operator= for setting the value of the type.

The result is mostly syntatic sugar but makes for much more readable code:

Bytes<long> memory(0);  // initial value of 0
memory[1] = 4;          // modify individual bytes. New value 1024
memory = 0;             // reset value to 0

Saturday, February 5, 2011

Goto Hell

There was a recent stackoverflow challenge question where users were asked to present code that prints the numbers 1 through 1000 without using loops or conditionals. A major theme for answers to the thread was recursion - both compile time and run time. There were a mix of other entries including consecutive function calls to iterate through 1000 increment/print operations, shelling-out to a program such as seq, and several other approaches to the problem. My first opinion was that the only correct solution (according to the spec) was the one that used constructors. I say correct because, technically, recursion is looping.

I posted my own solution for fun which aimed to solve the problem in a way not currently presented. This is both an abuse of compiler extensions and language constructs and is nothing more than me having some fun. Really, don't write code like this. Ever.

#include <stdio.h>

int main () {
    void * label[1001] = { [0 ... 999] = &&again };
    label[1000] = &&done;
    int i = 0;
again:
    printf ("%d\n", i + 1);
    goto *label[++i];
done:
    return 0;
}

I knew that if I considered recursion a loop that my code was also a loop construct. The point of my post, however, was to have a solution that was entirely distinct from what had already been posted and not necessarily to solve the puzzle exactly. This got me thinking though; how close is state machine looping, and recursion, and compile-time recursion to an actual coded loop?

I coded up and compiled three implementations with the intent of determining what the code looked like after the compiler was finished with it. The solutions I chose were: compile time recursion, run time recursion, and my goto looping. Each of these was compared to a basic for loop. Because of template stack depth issues I've only printed up to 10 in these examples. All files were compiled by g++ -O0 (except my solution, see below). In all assembly output the comments are my own.

Compile Time Template Recursion

template< int N > void fun () {
    fun< N-1 >();
    std::cout << N << '\n';
}

As asm

...

_Z3funILi1EEvv:                 ; fun< 1 >
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $1, %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    leave
    ret
...

_Z3funILi2EEvv:                 ; fun< 2 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi1EEvv      ; fun< 1 >()
...

_Z3funILi3EEvv:                 ; fun< 3 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi2EEvv      ; fun< 2 >()
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi10EEvv     ; fun< 10 >()
    movl    $0, %eax
    leave
    ret

This was a big surprise to me. I assumed that since the compiler was dealing with constant values and there was no class involved that this would actually be generated as a recursive expression. Instead, as you can see, the compiler actually generates all 10 function calls and arranges them to call each other in order where main calls _Z3funILi10EEvv which calls _Z3funILi9EEvv which, in turn, calls _Z3funILi8EEvv and so on - each printing a value after invoking the next function. The process bottoms out at _Z3funILi1EEvv where no function is called and only a value is printed. So this is essentially the same as creating a function for each value to be printed and calling them in order with the exception that the calls are actually nested instead of sequential. This is not recursion.

Run Time Recursion

void fun(int n) {
    n && (fun (n-1), std::cout << n << '\n');
}

As asm

_Z3funi:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    cmpl    $0, -4(%rbp)    ; hidden conditional ;)
    je  .L9
    movl    -4(%rbp), %eax
    leal    -1(%rax), %edi
    call    _Z3funi         ; the recursion
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi

    ; print setup stuff here

.L9:
    leave
    ret
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $10, %edi
    call    _Z3funi         ; start recursive call
    movl    $0, %eax
    leave
    ret

It is interesting to note that there is a hidden conditional in there. It applies to the short-circuit operation used to terminate the recursion. Other than that, there is nothing unexpected here. The only thing that might have changed this up a bit is if we could have done tail recursion; the compiler might have turned it into a loop. Requiring the print call to be the final executed statement of the function spoils any chance of that.

Goto State Table
 As a side note, I was not able to compile my submission as C++ code. In order to make the comparisons I've just used gcc instead - I don't imagine the result would be any different had g++ been willing to accept the code.

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $8048, %rsp
    leaq    -8016(%rbp), %rax
    movq    %rax, -8024(%rbp)
    movq    $0, -8032(%rbp)
    movl    $8008, %eax
    cmpl    $8, %eax        ; part of initialization, not my code
    jb  .L2
    movq    $1001, -8040(%rbp)
    movq    -8024(%rbp), %rdi
    movq    -8040(%rbp), %rcx
    movq    -8032(%rbp), %rax
    rep stosq
.L2:
    movq    $.L3, -8016(%rbp) ; again label

    ; lots of lines initializing jump table

    movq    $.L3, -24(%rbp)   ; again label
    movq    $.L4, -16(%rbp)   ; end lable
    movl    $0, -4(%rbp)
.L3:
    movl    -4(%rbp), %eax
    leal    1(%rax), %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    addl    $1, -4(%rbp)
    movl    -4(%rbp), %eax
    cltq
    movq    -8016(%rbp,%rax,8), %rax
    jmp *%rax       ; non-conditional jump
.L4:
    movl    $0, %eax
    leave
    ret

And, for completeness, the for loop

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $1, -4(%rbp)
    jmp .L7
.L8:
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    addl    $1, -4(%rbp)
.L7:
    cmpl    $1000, -4(%rbp)     ; conditional
    jle .L8
    movl    $0, %eax
    leave
    ret

The really surprising thing to me was the result of C++ template recursion and that the generated code is not recursive. So, in addition to the constructor submission it seems that compile time recursion satisfies both requirements of the problem: no conditionals and no looping.

 As expected, there were conditionals in the resulting assembly for the run time recursion - they are necessary to ensure an exit condition. This is where I believe that I have a leg up on the recursive solutions: the goto approach does not rely on a condition to terminate. In both the run time recursion and straight for loop implementation the resulting [assembly] code boiled down to basically a check for continuation (the conditional) and a jump to a previous location (the loop). Neither are present in the template recursion or the constructor solution. My code sits somewhere in between. Though, I likely lose major points for style.