Saturday, February 5, 2011

Goto Hell

There was a recent stackoverflow challenge question where users were asked to present code that prints the numbers 1 through 1000 without using loops or conditionals. A major theme for answers to the thread was recursion - both compile time and run time. There were a mix of other entries including consecutive function calls to iterate through 1000 increment/print operations, shelling-out to a program such as seq, and several other approaches to the problem. My first opinion was that the only correct solution (according to the spec) was the one that used constructors. I say correct because, technically, recursion is looping.

I posted my own solution for fun which aimed to solve the problem in a way not currently presented. This is both an abuse of compiler extensions and language constructs and is nothing more than me having some fun. Really, don't write code like this. Ever.

#include <stdio.h>

int main () {
    void * label[1001] = { [0 ... 999] = &&again };
    label[1000] = &&done;
    int i = 0;
again:
    printf ("%d\n", i + 1);
    goto *label[++i];
done:
    return 0;
}

I knew that if I considered recursion a loop that my code was also a loop construct. The point of my post, however, was to have a solution that was entirely distinct from what had already been posted and not necessarily to solve the puzzle exactly. This got me thinking though; how close is state machine looping, and recursion, and compile-time recursion to an actual coded loop?

I coded up and compiled three implementations with the intent of determining what the code looked like after the compiler was finished with it. The solutions I chose were: compile time recursion, run time recursion, and my goto looping. Each of these was compared to a basic for loop. Because of template stack depth issues I've only printed up to 10 in these examples. All files were compiled by g++ -O0 (except my solution, see below). In all assembly output the comments are my own.

Compile Time Template Recursion

template< int N > void fun () {
    fun< N-1 >();
    std::cout << N << '\n';
}

As asm

...

_Z3funILi1EEvv:                 ; fun< 1 >
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $1, %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    leave
    ret
...

_Z3funILi2EEvv:                 ; fun< 2 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi1EEvv      ; fun< 1 >()
...

_Z3funILi3EEvv:                 ; fun< 3 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi2EEvv      ; fun< 2 >()
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi10EEvv     ; fun< 10 >()
    movl    $0, %eax
    leave
    ret

This was a big surprise to me. I assumed that since the compiler was dealing with constant values and there was no class involved that this would actually be generated as a recursive expression. Instead, as you can see, the compiler actually generates all 10 function calls and arranges them to call each other in order where main calls _Z3funILi10EEvv which calls _Z3funILi9EEvv which, in turn, calls _Z3funILi8EEvv and so on - each printing a value after invoking the next function. The process bottoms out at _Z3funILi1EEvv where no function is called and only a value is printed. So this is essentially the same as creating a function for each value to be printed and calling them in order with the exception that the calls are actually nested instead of sequential. This is not recursion.

Run Time Recursion

void fun(int n) {
    n && (fun (n-1), std::cout << n << '\n');
}

As asm

_Z3funi:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    cmpl    $0, -4(%rbp)    ; hidden conditional ;)
    je  .L9
    movl    -4(%rbp), %eax
    leal    -1(%rax), %edi
    call    _Z3funi         ; the recursion
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi

    ; print setup stuff here

.L9:
    leave
    ret
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $10, %edi
    call    _Z3funi         ; start recursive call
    movl    $0, %eax
    leave
    ret

It is interesting to note that there is a hidden conditional in there. It applies to the short-circuit operation used to terminate the recursion. Other than that, there is nothing unexpected here. The only thing that might have changed this up a bit is if we could have done tail recursion; the compiler might have turned it into a loop. Requiring the print call to be the final executed statement of the function spoils any chance of that.

Goto State Table
 As a side note, I was not able to compile my submission as C++ code. In order to make the comparisons I've just used gcc instead - I don't imagine the result would be any different had g++ been willing to accept the code.

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $8048, %rsp
    leaq    -8016(%rbp), %rax
    movq    %rax, -8024(%rbp)
    movq    $0, -8032(%rbp)
    movl    $8008, %eax
    cmpl    $8, %eax        ; part of initialization, not my code
    jb  .L2
    movq    $1001, -8040(%rbp)
    movq    -8024(%rbp), %rdi
    movq    -8040(%rbp), %rcx
    movq    -8032(%rbp), %rax
    rep stosq
.L2:
    movq    $.L3, -8016(%rbp) ; again label

    ; lots of lines initializing jump table

    movq    $.L3, -24(%rbp)   ; again label
    movq    $.L4, -16(%rbp)   ; end lable
    movl    $0, -4(%rbp)
.L3:
    movl    -4(%rbp), %eax
    leal    1(%rax), %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    addl    $1, -4(%rbp)
    movl    -4(%rbp), %eax
    cltq
    movq    -8016(%rbp,%rax,8), %rax
    jmp *%rax       ; non-conditional jump
.L4:
    movl    $0, %eax
    leave
    ret

And, for completeness, the for loop

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $1, -4(%rbp)
    jmp .L7
.L8:
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    addl    $1, -4(%rbp)
.L7:
    cmpl    $1000, -4(%rbp)     ; conditional
    jle .L8
    movl    $0, %eax
    leave
    ret

The really surprising thing to me was the result of C++ template recursion and that the generated code is not recursive. So, in addition to the constructor submission it seems that compile time recursion satisfies both requirements of the problem: no conditionals and no looping.

 As expected, there were conditionals in the resulting assembly for the run time recursion - they are necessary to ensure an exit condition. This is where I believe that I have a leg up on the recursive solutions: the goto approach does not rely on a condition to terminate. In both the run time recursion and straight for loop implementation the resulting [assembly] code boiled down to basically a check for continuation (the conditional) and a jump to a previous location (the loop). Neither are present in the template recursion or the constructor solution. My code sits somewhere in between. Though, I likely lose major points for style.

No comments :

Post a Comment