Saturday, January 29, 2011

Unbalanced Expression

For the most part, interviews have always gone easy for me which is curious because I have an almost uncontrollable fear of social settings. I'm generally so consumed by the anticipation of the event that when the meeting eventually takes place my technical reflexes take over. This is both a blessing and a curse: technical questions and topics are not a problem when in this state but a vague question about personality and past experiences tends toward meaningless rambling about nonsense. Social prowess notwithstanding I've come out on top more times than not in these adventures... until recently.

I interviewed at a startup company where there were two technical questions that I had trouble with. Incidentally, the 'who we are' and 'who are you' portions actually went much better than usual - I do love irony. In either case, the first question was related to querying billions of bytes of data to provide quasi real-time behavior to users. The question was intentionally vague leaving much open to interpretation and I pared it down as much as my experience supported: what is unique in the data, what are the time constraints, what about false negatives/positives, caching, backend structure, and so on. In the end I just came to the conclusion that I could not provide the design details that they were looking for and said so. This took about fifteen minutes and they did not seem too discouraged with my ultimate conclusion.

The second question was direct and significantly less open-ended - this is the part that really threw me for a loop. The question was this: 'Solve the following expression using any language you want'.

(((1+3)*7)+((((3-1)/(1+3))*8)))

Me: "Use eval in ruby or python"
Them: "Good. You can't use eval"
Me: "I'm not too familiar, but a Lisp interpreter?"
Them: ... confused, unexpected looks ...
Me: "Nah, that is incorrect Lisp format, never mind"
Them: "Yeah, that's not exactly proper Lisp format. What else?"

I asked if the expression would always be balanced and that we only had to deal with binary operators. The answer was yes on both counts so I started drawing stacks on the whiteboard roughing out what would work. Eventually, I presented the following Ruby solution:

$ops = []
$vars = []

OPS = ['(', ')', '+', '-', '/', '*']

def push_op op
    $ops.push op
    if op == ')'
        $ops.pop
        v2 = $vars.pop
        v1 = $vars.pop
        op = $ops.pop
        v = eval("#{v1} #{op} #{v2}")
        $vars.push v
        $ops.pop
    end
end 

def push_var v
    $vars.push v
end

$stdin.gets.chomp.split(//).each do |c|
    if OPS.include? c
        push_op c
    else 
        push_var c
    end
end
 
puts $vars.pop

(the eval part was agreed to be hand-waving for a more complex case statement)

The immediate reply was: "Nope. You've missed a fundamental flaw for this class of equations". So I went back and did the entire equation by hand and realized that the solution would not support the unary operation of (). That is, something like ((1+2)) would end up with my solution popping from an empty stack. So, I injected the fix and claimed that the problem was solved for balanced and valid equations. Here is the fix.

def push_op op
    $ops.push op
    if op == ')'
        $ops.pop
        if $ops[-1] == '('
            $ops.pop
            return
        end
        v2 = $vars.pop
        v1 = $vars.pop

Again, the reply was that I was missing the real fundamental problem with this type of equation. I asked if this particular solution exhibited the flaw they were talking about and they claimed it did. I was stuck at that point and ultimately left the interview with the sense that I failed some basic understanding of mathematical equations. When I got home, I typed up my solution and ran the input they provided and the output was exactly as I had expected (the same as an eval call). I wrote my follow-up email to the company thanking them for the interview but I also attached my script with sample run indicating that I stood by my solution from that afternoon.

What do you think... did I missing something obvious or were they looking for my ability to stand my ground when I had the best solution? They never conceded that I actually had the correct answer; in fact, they indicated that I should take it home and consider what I was missing.

So, yeah. I'm still a bit off-balance from that experience.

Saturday, January 22, 2011

C++ namespace

 
 
using namespace std;

It appears in almost every tutorial for beginners and the vast majority of books dedicated to C++. I suppose it saves more time than the alternative of teaching the proper mechanics to newcomers. For the most part, I suspect that you may never have a problem using the standard namespace in it's entirety. I think it's unfortunate, however, that this approach hides both the intent of namespaces and the pitfalls associated with not understanding their mechanics.

A namespace is a clarification tool: it announces the intended scope of the token it proceeds. If you have two tokens with the same name, placing them in separate namespaces prevents compilation errors due to name clashes. There are other functions of namespaces, such as clarity of code, but these are considered secondary if at all.

I've heard - and argued against - people who claim that using an entire namespace is perfectly fine and any claims to the contrary are bunk. To be clear; understanding namespaces is not as important as understanding pointers, references and proper use of classes. But, there are things to watch out for especially as a beginner. Most notably: masking and ambiguity.

Masking
The problem here is that including an entire namespace can mask errors in your own code. This example is something that is very common in those starting out in C++, particularly if they are coming from a scripted language background.

#include <iostream>
#include <algorithm>

using namespace std;

int main () {
    int x = 5, y = 3;
    cout << "Before swap: " << x << " " << y << endl;
    swap (x, y);
    cout << "After  swap: " << x << " " << y << endl;
    return 0;
}

void swap (int x, int y) {
    int tmp = x;
    x = y;
    y = tmp;
}

The intention is obviously to swap the values of two numbers. As it is written however, this swap will only modify the local variables leaving the original two unchanged. The problem lies in the fact that there is a std::swap defined that does the appropriate thing and we get:

Before swap: 5 3
After  swap: 3 5

Which is correct but masks two things that a beginner now won't notice: 1) the local swap is not recognized by the compiler before the use in main (no prototype) and 2) the std::swap version is being invoked and masking incorrect behavior.

If, instead of the entire namespace, we only used specifically the elements local to our scope we would resolve these issues. Changing 

using namespace std;
to 
using std::cout;
using std::endl;

would result in a compilation error similar to:

mask.cc: In function ‘int main()’:
mask.cc:10: error: ‘swap’ was not declared in this scope

Ambiguity
This is a problem of name clashing: two equally qualified matches to a function are present and the compiler can not discern which is to be used. There is a similar problem when two names clash and scope can be used in determining which function gets used. This is an even more subtle bug since there are no complaints from the compiler your code just uses the function with the innermost scope regardless of your intentions.

Consider:

namespace foo { void fun() { } }
namespace bar { void fun() { } }

using namespace foo;
using namespace bar;

int main () {
    fun();
    return 0;
}

Trying to compile that results in

ambig.cc: In function ‘int main()’:
ambig.cc:8: error: call of overloaded ‘fun()’ is ambiguous
ambig.cc:2: note: candidates are: void bar::fun()
ambig.cc:1: note:                 void foo::fun()

That is a painfully obvious example but working with much larger projects where multiple teams are committing code this becomes a real problem. Again, using only the relevant portions of a namespace or being specific when calling the function (bar::fun(), for example) solves this problem.



Another benefit of understanding how to use namespaces is cleaner code in some settings. Consider some of the code you may have seen for multi-platform projects:

void fun () {
#ifdef PLATFORM1
std::cout << "platform1::fun()" << std::endl;
#else
std::cout << "platform2::fun()" << std::endl;
#endif
}

with the logic of the platform sprinkled within the function bodies themselves. This is hard to read and even harder to maintain. With namespaces this can be cleaned up to separate each function body (one for each platform) into it's own namespace.

#include <iostream>

namespace platform1 {
    void fun () {
        std::cout << "platform1::fun()" << std::endl;
    }
}

namespace platform2 {
    void fun () {
        std::cout << "platform2::fun()" << std::endl;}
}

Now, the only place that needs platform logic is when we decide to use the namespace.

#include "platform.hh"

#ifdef PLATFORM1
namespace platform = platform1;
#else
namespace platform = platform2;
#endif

int main () {
    platform::fun ();
    return 0;
}

I can't remember ever seeing this approach suggested instead of the less robust macro everywhere code.



Again, with all errors you can run across in a C++ program the ones listed above are some of the easier to pinpoint. Beginners will have the most difficulty as they are not only struggling with a new language and syntax but with subtle errors that are masking behavior they don't even realize is incorrect. I don't think that teaching the proper use of namespace adds that much complexity to learning C++; it really is unfortunate that the topic is not done justice in most reference material.

Friday, January 14, 2011

Closures (Ruby v Python)

I'm a Ruby guy. I'm not religious about it, but when it comes to scripting, prototyping, parsing, or automation my instinct (read: comfort zone) is Ruby.

I've been working on a project lately where the dominant language is Python. This has required me to both learn a new language and break habits built in other environments. I like learning new things and I think that having your comfort zone invaded every once in a while keeps you on your toes so this experience hasn't been too painful so far. It's interesting, though, to evaluate what I consider habits against what I think are more along the lines of principles.

One thing that prompted this was the way Ruby and Python differ in handling closures. I could claim that this started back when I was a TA and had to grade dozens of projects written in Python over many disparate editors (tabs vs. spaces, anyone?) but I digress. In this case, I was using what was basically a callback and I really like they way Python handles this syntactically. Consider the following:


def foo():
    print("in foo")

fun = foo   # fun is now a function object
fun()       # this invokes foo()

Which is not possible (at least with a similar syntax) in Ruby. In Ruby what you get is

def foo
    puts "in foo"
end

fun = foo   # invokes foo because foo takes no arguments
            # would be an error if foo expected arguments

And since foo gets the return value of puts (nil) if you try to use it later in your code you get an error. You'd either have to use a lambda or Proc to get around the fact that the method is invoked. In Ruby, you can elide the parentheses when calling a method which makes the above behavior perfectly rational.

Advantage Python. Let's try and use closures to do something...

There is a cool feature in Ruby that I've found useful in the past in that I can create a function generator similar to the following:

def foo_gen
    x = 0
    return lambda { x += 1 }
end

Where the returned function would represent the list of integers starting at 1 and increasing each time the function is invoked. Something like:

foo = foo_gen
3.times { puts foo.call }

Which would print out the numbers 1, 2, 3. In fact, each new call to foo_gen creates a new infinite sequence starting at 1. In Python, this turned out to not be so easy. There is a subtle difference in how the lexical scope is represented when defining a Python closure. Consider what I thought was an equivalent Python construct:

def foo_gen():
    x = 0
    def foo():
        x += 1  # Error: x not in scope here
        return x
    return foo

Unfortunately, the variable local to the scope that the closure was defined in is not visible from the closure itself. This subtlety is actually related to why a Python class method definition requires an explicit this argument when it is defined and Ruby class methods do not. In either case, the workaround to this problem is something similar to:

def foo_gen():
    x = 42
    def foo(x=x):
        while 1:
            x += 1
            yield x
    return foo

Two things are important in the above change:

  • I am required to provide an argument to the function that receives the value of the locally (to the defining scope) bound variable. (The alternative is to define x as an array and increment the first index of that array - I don't fully understand why that is valid, however)
  • I am now using a generator and yielding the values explicitly requiring an infinite loop to enable the generator

I should note that, in Ruby, the yield still happens it's just hidden behind the syntax and without the need for an explicit loop construct.

I'm a fan of list comprehensions and functionality such as enumerate. Other features of Python are certainly growing on me quickly (though, ' '.join(lst) is still counter intuitive compared to lst.join ' '). And, while I can accept the white space requirement, the necessary parenthesis for function calls, explicit self argument in class methods and some of Python's other assembly as uncomfortable, I find this to be more or less broken.

I'm not intending to bash Python or praise Ruby. I've got miles to go before I am a Python master (or Ruby master, for that matter) and I am entirely open to the fact that in my journey I may come to understand this behavior. Until that point, however, I still feel icky writing code like that.