Thursday, March 31, 2011

The Price of Parking

I live in the Philadelphia area. If you've ever been to a Philadelphia sporting event you understand that there exists a certain kind of crazy that envelops the area just before game time and hangs around to varying degrees after the game depending on the outcome. If you are a true Philly fan you drink the crazy Kool-Aid. I don't, but my father-in-law has season tickets to the Flyers so occasionally I pretend to get a little punch drunk.

I don't live in an area that makes it feasible to take public transportation so I usually end up driving to the stadium from work. Parking is an assumed cost (as is $7 beers) but the last game I went to the price was pretty steep: $15. That is opposed to just three years earlier when I could get a spot for a mere $10. This was painful enough to force me to look at it more when I got home - and so I did.

There wasn't much to be gleaned from the fact that prices had risen over the past few years. Especially since Phildelphia built three new stadiums in the last ten years and, unfortunately, there is a blanket excuse to raise prices without cause lately: the poor economy. In either case, in thinking about all of this I started to consider how information can be used to manipulate people. More specifically, I imagined how the owner of the lot might try to explain away the increase to his (or her) customers.

As a consumer, I might approach the situation with the argument that the cost of parking has risen exponentially over the past five years. Certainly, from the following representation that might make sense:

Of course, that is presented in precisely the right way: the scale is only large enough to include the data points being shown; there is no history previous to 2007 where a consistent price may have been held (the two data points at $10 hint at this, however); and there is no comparison to other lots and/or prices in the area.                           

The counter to that point of view might look something like this:
Plenty going on here. First, and probably most important, is the scale: adjusting the scale results in a trend that appears closer to linear than exponential. Other factors: extending the range (going back to 2001); a carefully chosen cost of living number - modest, but keeps total profit negative; keeping the number of games high (it is a multiplier of the profit). In addition, the loss is not exactly negative income per patron, it is lost profit against the chosen cost of living increase.

In exploring this thought experiment I've only enforced what I already know: we, as presenters of data, have an obligation to be honest and straightforward. Data can be made to tell any convenient story; we would do well to remember that when consuming and producing information.

"The most dangerous untruths are truths moderately distorted." - Georg Lichtenberg

Incidentally, I had to leave that game early due to my daughter being very tired. On the way out I was helping my daughter put on her coat and a boy came over and asked if he could give her a puck. It seems he caught the puck during the game and wanted my daughter to have it. Well her eyes lit up and, of course, it completely made my night. Certainly worth the price of parking if you ask me.

Sunday, March 13, 2011

Disappearing Ink

This is something I've been wanting to post about for a while now. There was thread on a forum I used to frequent (under a different handle). You can follow the link for details, but the main concept was that in a transition to a GUI environment the developers needed to intercept printf and fprintf calls and do something GUI-related when the GUI was running and leave the software as-is when the GUI was not running. The approach to handle this was to define their own version of the functions and handle the context there. Something like:

extern "C" int fprintf (FILE *__restrict __stream,
       __const char *__restrict __format, ...)
{
   va_list args;
   int return_status = 0;

   va_start(args,__format);

   if (is_gui && (__stream == stdout || __stream == stderr)) {
       return_status = showMessageInGui(NULL, __format, args);
   }
   else {
       return_status = vfprintf(__stream, __format, args);
   }

   va_end(args);
   return return_status;
}

Which only worked part of the time. Complicating matters was the behavior was different across multiple compiler versions: gcc4.2.4 exhibited the problem while gcc3.4.2 did not.

I did some digging around and found out that the newer version of gcc was actually implementing a level of optimization that undermined the approach of redefining printf. Basically, any argument to printf that did not contain a format string (%) to be filled in resulted in a translation by later versions of gcc to a call to fwrite (or puts). For example:

#include <stdio.h>
int main () {
    printf ("With args %d\n", 10);
    printf ("Without\n");
    return 0;
}

Translates to

.LC0:
    .string "With args %d\n"
.LC1:
    .string "Without"
    .text
    ...
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    movl    $.LC1, %edi
    call    puts
    movl    $0, %eax

Notice the puts call for the second call to printf. So regardless of what definition was provided for printf the code would never be executed. Try it with the following:

#include <stdio.h>

int printf (const char * fmt, ...) { return 1/0; }

int main () {
    printf ("w00t\n");
    return 0;
}

You'll see that a warning is presented (for division by zero) but the code runs without issue.

There were several solutions presented in the thread but the OP eventually chose to use gcc flags to prevent this behavior. Using -fno-builtin-fprintf and -fno-builtin-printf allowed compilation without the translation and the redefinition of printf worked as expected.

Anatomy of an update

I recently put ntrace up on github.

While I was doing testing for that version I saw my package manager blinking at me telling me I needed updates. I decided to take that as an opportunity to test ntrace on a more realistic use case (at the time I was just doing scp or iperf tests). Instead of using the Synaptic Package Manager GUI I opened up a terminal and used aptitude to do the update. Since I know nothing about how aptitude works, this was a good opportunity to see what ntrace could tell me.

Here was the command I ran:
LD_PRELOAD=./libntrace.so aptitude safe-upgrade
The update was small; it consisted of only 5 packages:
The following packages will be upgraded:
      libmozjs1d libsvn1 subversion xulrunner-1.9 xulrunner-1.9-gnome-support
    5 packages upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
    Need to get 10.4MB of archives. After unpacking 8192B will be used.
    Do you want to continue? [Y/n/?]
After typing Y to continue the update completed without issue. I parsed the resulting output and compiled the following graph:

Interesting points:
  • Between seconds 3 and 12 there is no activity. This is the time I was reading the screen before I confirmed and continued with the update.
  • Each package was retrieved by it's own process. The dots are color-coded according to the PID that was associated with the traffic (see the legend).
  • Packages are downloaded sequentially instead of in parallel.
  • The long delay after 17 seconds is the time it took to actually install the updates on my system. The original process then communicates some more (presumably about the status of the children) and the update exits.
  • The traffic patterns for the parent process are nearly identical at the front and back end of the communication. Maybe all details of the update are advertised at both ends instead of a diff? Dunno, but it is interesting.
This is how ntrace is useful to me: quick insight into application activity as it relates to the network.

As part of my testing I am doing performance impact analysis. Perhaps some of those details will appear here as well.

Wednesday, March 9, 2011

github

So I've finally caught up with all the hype and got myself a github account. You can find more randomness there.

Please let me know if you find anything there useful.

ezpz on github :)

Sunday, February 20, 2011

Non-Trivial Union

In my last post I talked about placing some syntactic sugar around C++ unions. In that post I mentioned that having non-POD types - those with user-supplied or non-trivial constructors - were not allowed as a member of a union. In this post I describe how to overcome that limitation and why that might be useful.

First, to illustrate what I mean about unions not being allowed to contain non-POD types look at the following example:

struct Test {
    int i;
    Test () : i(0) {}
};

// ... and, later, in main
union NotAllowed {
    Test test;
    unsigned char bytes[sizeof(Test)];
};

Compiling that will present an error similar to

member ‘Test main()::NotAllowed::test’ with constructor not allowed in union

The workaround to this limitation is to store a pointer to the type and manage the memory external to the union. Well, it would be really nice to support RAII with unions since they actually support the constructor and destructor mechanism but the storing-a-pointer-to-type approach defeats this. I wanted to see if there was any way to use RAII with C++ unions.

I started out by considering using the union constructor to do a memcpy on the size of the complex type to itself but the obvious failure there is that the constructor/destructor of the type is still managed outside the scope of the union. In addition to that, memcpy would provide only a copy of the memory of the object so I could not manipulate the object itself.

After a little reading I found out about a cool mechanism called placement new operator. What placement new will do is allow you to provide a memory location to store an object and use that instead of what operator new would normally provide for you. The best part about that is that the constructor of the object is invoked. I looked around a bit and from what I saw placement new is generally used to provide memory pools or other memory management mechanisms outside of the default behavior of new and delete. So, continuing with my example from last time, I derived the following

#include <stdexcept>

template< typename T >
union NonTrivialUnion {
    private:
        T * ptr_;
        unsigned char bytes_[sizeof(T*) + sizeof(T)];
    public:

        NonTrivialUnion (const T& t) {
            ptr_ = new (bytes_ + sizeof(T*)) T(t);
        }

        NonTrivialUnion () {
            ptr_ = new (bytes_ + sizeof(T*)) T;
        }

        ~NonTrivialUnion () { ptr_->T::~T(); }

        T * operator-> () { return ptr_; }

        // ... remainder of implementation 
        // ... operator[], operator=, ...
};

The placement new is in the union constructor

ptr_ = new (bytes_ + sizeof(T*)) T(t);

You'll also notice in the destructor of the union that I explicitly call the destructor of the object I am maintaining. This is necessary. As I mentioned, the storage location is not provided by new so delete would do undefined things with it and as delete is responsible for calling an object's destructor it follows that the destructor needs to be called elsewhere. Hence, in the destructor of the union there is

~NonTrivialUnion () { ptr_->T::~T(); }

I'm also being sneaky by extending the size of the union by the size of a pointer so I have some way to access the memory as you would with a pointer to an object. This does break the strict rule of what a union is: the size is no longer exactly that of the object. However, with that caveat, we can now do the following:

#include <iostream>
#include "ntu.hh"

struct Test {
    int i;
    Test () : i(0) {}
    void echo(const char * msg) { std::cout << msg << std::endl; }
};

int main () {
    NonTrivialUnion< Test > ntu;
    ntu->echo ("Allowed!");
    return 0;
}

Notice that I am not just storing a copy of the memory of an object as I would be with memcpy; that memory is the object itself entirely managed by the union. Constructors and destructors are properly called and you can call through to the object to invoke it's member functions. Pretty cool.

One note about the above example: it relies on an empty constructor for the object. There is support for a non-empty constructor using the following approach

Test t;
NonTrivialUnion< Test > ntu(t);

But in that case a copy-constructor is used and exactly two objects are created: one external to the union and one owned by the union.

There needs to be much more thought about how to implement the union robustly. For example, operator= for unions supporting more than a single type; and how to dynamically determine the largest type among multiple members; and so on. Regardless, I think this is an interesting and useful tool.

Thursday, February 17, 2011

C++ unions

I was reading Bruce Eckel's Thinking in C++ the other day and I came across a bit of information that I was not aware of: unions can be treated just like classes. That is, they can have constructors and member functions and access control.

I think that this is a neat feature. Combined with the use of templates and operator overloading one can create a shared memory space (just like you normally would with unions) but with a much nicer interface. Consider the following:

#include <stdexcept>
#include <sstream>

template< typename T >
union Bytes {
    private:
        T data_;
        unsigned char bytes_[sizeof(T)];
    public:

        Bytes(const T& t) : data_(t) {}

        T& value () { return data_; }

        size_t size () const { return sizeof(T); }

        void operator= (const T& t) { data_ = t; }

        unsigned char& operator[] (int i) {
            if (i < 0 || i >= size()) {
                std::stringstream ss;
                ss << "Invalid index " << i << " ";
                ss << "(sizeof(T) : " << sizeof(T) << ")";
                throw std::out_of_range (ss.str ());
            }
            return bytes_[i];
        }
};

Now you can have a shared memory space that can be accessed with operator[] for the individual bytes of any POD type (unions do not support objects with constructors) and operator= for setting the value of the type.

The result is mostly syntatic sugar but makes for much more readable code:

Bytes<long> memory(0);  // initial value of 0
memory[1] = 4;          // modify individual bytes. New value 1024
memory = 0;             // reset value to 0

Saturday, February 5, 2011

Goto Hell

There was a recent stackoverflow challenge question where users were asked to present code that prints the numbers 1 through 1000 without using loops or conditionals. A major theme for answers to the thread was recursion - both compile time and run time. There were a mix of other entries including consecutive function calls to iterate through 1000 increment/print operations, shelling-out to a program such as seq, and several other approaches to the problem. My first opinion was that the only correct solution (according to the spec) was the one that used constructors. I say correct because, technically, recursion is looping.

I posted my own solution for fun which aimed to solve the problem in a way not currently presented. This is both an abuse of compiler extensions and language constructs and is nothing more than me having some fun. Really, don't write code like this. Ever.

#include <stdio.h>

int main () {
    void * label[1001] = { [0 ... 999] = &&again };
    label[1000] = &&done;
    int i = 0;
again:
    printf ("%d\n", i + 1);
    goto *label[++i];
done:
    return 0;
}

I knew that if I considered recursion a loop that my code was also a loop construct. The point of my post, however, was to have a solution that was entirely distinct from what had already been posted and not necessarily to solve the puzzle exactly. This got me thinking though; how close is state machine looping, and recursion, and compile-time recursion to an actual coded loop?

I coded up and compiled three implementations with the intent of determining what the code looked like after the compiler was finished with it. The solutions I chose were: compile time recursion, run time recursion, and my goto looping. Each of these was compared to a basic for loop. Because of template stack depth issues I've only printed up to 10 in these examples. All files were compiled by g++ -O0 (except my solution, see below). In all assembly output the comments are my own.

Compile Time Template Recursion

template< int N > void fun () {
    fun< N-1 >();
    std::cout << N << '\n';
}

As asm

...

_Z3funILi1EEvv:                 ; fun< 1 >
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $1, %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    leave
    ret
...

_Z3funILi2EEvv:                 ; fun< 2 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi1EEvv      ; fun< 1 >()
...

_Z3funILi3EEvv:                 ; fun< 3 >
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi2EEvv      ; fun< 2 >()
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    call    _Z3funILi10EEvv     ; fun< 10 >()
    movl    $0, %eax
    leave
    ret

This was a big surprise to me. I assumed that since the compiler was dealing with constant values and there was no class involved that this would actually be generated as a recursive expression. Instead, as you can see, the compiler actually generates all 10 function calls and arranges them to call each other in order where main calls _Z3funILi10EEvv which calls _Z3funILi9EEvv which, in turn, calls _Z3funILi8EEvv and so on - each printing a value after invoking the next function. The process bottoms out at _Z3funILi1EEvv where no function is called and only a value is printed. So this is essentially the same as creating a function for each value to be printed and calling them in order with the exception that the calls are actually nested instead of sequential. This is not recursion.

Run Time Recursion

void fun(int n) {
    n && (fun (n-1), std::cout << n << '\n');
}

As asm

_Z3funi:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    cmpl    $0, -4(%rbp)    ; hidden conditional ;)
    je  .L9
    movl    -4(%rbp), %eax
    leal    -1(%rax), %edi
    call    _Z3funi         ; the recursion
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi

    ; print setup stuff here

.L9:
    leave
    ret
...

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $10, %edi
    call    _Z3funi         ; start recursive call
    movl    $0, %eax
    leave
    ret

It is interesting to note that there is a hidden conditional in there. It applies to the short-circuit operation used to terminate the recursion. Other than that, there is nothing unexpected here. The only thing that might have changed this up a bit is if we could have done tail recursion; the compiler might have turned it into a loop. Requiring the print call to be the final executed statement of the function spoils any chance of that.

Goto State Table
 As a side note, I was not able to compile my submission as C++ code. In order to make the comparisons I've just used gcc instead - I don't imagine the result would be any different had g++ been willing to accept the code.

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $8048, %rsp
    leaq    -8016(%rbp), %rax
    movq    %rax, -8024(%rbp)
    movq    $0, -8032(%rbp)
    movl    $8008, %eax
    cmpl    $8, %eax        ; part of initialization, not my code
    jb  .L2
    movq    $1001, -8040(%rbp)
    movq    -8024(%rbp), %rdi
    movq    -8040(%rbp), %rcx
    movq    -8032(%rbp), %rax
    rep stosq
.L2:
    movq    $.L3, -8016(%rbp) ; again label

    ; lots of lines initializing jump table

    movq    $.L3, -24(%rbp)   ; again label
    movq    $.L4, -16(%rbp)   ; end lable
    movl    $0, -4(%rbp)
.L3:
    movl    -4(%rbp), %eax
    leal    1(%rax), %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    addl    $1, -4(%rbp)
    movl    -4(%rbp), %eax
    cltq
    movq    -8016(%rbp,%rax,8), %rax
    jmp *%rax       ; non-conditional jump
.L4:
    movl    $0, %eax
    leave
    ret

And, for completeness, the for loop

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $1, -4(%rbp)
    jmp .L7
.L8:
    movl    -4(%rbp), %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    movl    $10, %esi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
    addl    $1, -4(%rbp)
.L7:
    cmpl    $1000, -4(%rbp)     ; conditional
    jle .L8
    movl    $0, %eax
    leave
    ret

The really surprising thing to me was the result of C++ template recursion and that the generated code is not recursive. So, in addition to the constructor submission it seems that compile time recursion satisfies both requirements of the problem: no conditionals and no looping.

 As expected, there were conditionals in the resulting assembly for the run time recursion - they are necessary to ensure an exit condition. This is where I believe that I have a leg up on the recursive solutions: the goto approach does not rely on a condition to terminate. In both the run time recursion and straight for loop implementation the resulting [assembly] code boiled down to basically a check for continuation (the conditional) and a jump to a previous location (the loop). Neither are present in the template recursion or the constructor solution. My code sits somewhere in between. Though, I likely lose major points for style.