Showing posts with label style. Show all posts
Showing posts with label style. Show all posts

Saturday, July 21, 2012

gcc constructor attribute

It is sometimes helpful to provide a way to 'initialize the system' when you are writing a library. Many times, this manifests itself in the form of a lib_init() call. Similarly, any convenience cleanup would be provided via lib_close(). For a concrete example look at ncurses which provides initscr() and endwin() for this use.

In the case when these routines are required (as they are in ncurses) there is an easier way to provide the proper setup without placing the onus to call these methods on the developer. GCC provides __attribute__((constructor)) and __attribute__((destructor)) which allow for calling methods when a library is loaded and unloaded respectively. These calls happen outside the scope of main. For example,

#include <stdio.h>

__attribute__((constructor)) void init() {
    fprintf (stderr, "constructor\n");
}

int main () {
    return fprintf(stderr, "main\n");
}

This program will output the following:

constructor
main


In the case of a library such as ncurses this is a perfect place to invoke the necessary initialization and cleanup routines. A simplified example library:

__attribute__((constructor))
static void setup() { do_some ("setup"); }

__attribute__((destructor))
static void breakdown() { do_some ("breakdown"); }

void mylib_method (const char * thing) { do_some (thing); }

Now, if a developer links against your library setup() is called when your library is loaded and breakdown() is called when it is unloaded. A nicety of this approach is that it provides the same functionality if the library is pulled in when the executable is loaded or at some later point (via dlopen/dlsym, for example) thus always ensuring a consistent environment for your code.

This is obviously not a fit for all libraries. Only those with setup and cleanup that can be self-contained and are always required would benefit from such an approach. In those cases, however, I prefer this method to the alternative of burdening the developer with requirements of my library.

Saturday, September 17, 2011

New Look

In my last post I received a comment regarding the presentation of this blog - specifically the colors I use. It's not the first time I have had trouble with color schemes - I'm colorblind so my good is basically everyone else's bad. After asking around to several people offline about the issue and anything I could do to fix it I was offered this site which helped me decide on a [hopefully] better look to the site.

Please let me know any other hard-to-use features so that may address them as well. Thanks.

Thursday, September 15, 2011

Raising the Bar


"Let thy speech be better than silence, or be silent" - Dionysius the Elder

I was involved in a dialog recently about this post. It made me consider some things about data presentation that I've been reluctant to admit. First, not all audiences are created equal and, more importantly, there is emotion involved.

I live in a world where precision is expected and any lack of clarity is considered detrimental to a cause. For the most part I present material to an informed technical audience who is prepared to consume data related to the topic at hand. But there are often situations where a presenter doesn't have such luxuries - in fact, an audience may struggle with the topic so much that getting across high level information is all that one can hope for. In a scenario like this, one should use any means necessary (within reason) to get a point across. I'm still not convinced this is requirement enough for a pie chart but it does raise a valid point.

In my mind there is something more driving than the aptitude of an audience, however, and that is the emotional reaction they can have to your graphics. For better or worse people are emotionally attached to pie charts. Many individuals have a visceral reaction when they see one knowing they can quickly make sense of the data in front of them. Forget about accuracy - we are talking basic understanding. For me, this is harder to ignore; it opens the door to using something like a pie chart to avoid alienating your audience.

The part about this that is hard for me is that I rant about visual display; probably too much for my contribution to alternatives. I'm also critical about style - often to the point of exhaustion. I just can't seem to relinquish my position that pie charts really are a failure but the points above are nagging at me: how do you captivate audience that expects the general view without sacrificing the details? I stumbled upon an idea recently that I hope can help bridge the gap.

I was reading Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design the other day which led me to Cleveland and McGill's original study. One test that really stood out to me was the position-angle test where subjects were exposed to a Cartesian bar graph and a pie chart each containing the same data. The subjects were tasked with estimating values of the shapes. In 40 tests, only three times was the pie chart more accurate (on average) than the bar chart.

The original study also mentions that pie charts are "one of the most commonly used graphs for showing the relative sizes of a whole." Certainly, a pie chart intuitively exposes that the whole is the sum of it's parts. In fact, I think it does so better than some of the alternatives - stacked bar charts and treemaps. It is unfortunate that we are unable to accurately decipher the actual portions of those parts. What is really needed is the ability to combine the concept of 'sum of the parts' with direct representation of data but, to the best of my knowledge, this does not exist in standalone form.

Well, I've been exploring processing more and more lately and the idea struck me to combine the two chart types in a way that allowed both versions of the data to be presented (without having to share the display real estate). I came up with an interactive pie/bar chart fusion. On the surface it appears as a standard pie chart:
But when the user clicks any of the sections, it transitions into a bar chart with details of the data while keeping a shade of the relevant pie slice in the background.
Now, I eluded to the fact that this not a complete solution; it only helps to bridge the gap. Unfortunately, this graphic relies on user interaction (mouse clicks) for the transition which pretty much excludes it for most presentations. However, as PDF now supports Javascript, online resources are becoming prevalent and users can download these open source tools on their own the availability for melding these approaches becomes tangible.

I still don't condone the use of pie charts. However, instead of just describing the problems associated with them I'm finally trying to present a solution.

You can find the code for this on github.
Actual interactive visualization can be found here.


Thursday, August 4, 2011

Hotness

We have an internal image that floated around work several years ago that details network utilization of TCP over a wide variety of configurations. It is a heatmap created in matlab that is just sweet, sweet eye candy. We actually hung it on the outside of a cube for a short while and people couldn't help but stop and look at it.

It is entirely dysfunctional, mind you. The designer tried to combine eight parameters - with all variations - into a individual 2D plot (3D if you consider color a dimension). It was definitely an internal tool - there were only two or three of us who could decipher the layout enough to say anything about the data. That was fine by us; we basically made up the entire population of people who cared.

Fast forward a few years. I'm currently working on a technical report that could use the data we used to create that plot and, as luck would have it, I'm also the only one from the original group still at the company. In order to be able to include this data in the report there needs to occur a certain amount of reformatting - first my brain, then that plot. I wasn't the original designer and, although I have access to the code, I don't know matlab so I'm pretty much stuck. I decided to rework the data in R.

The thing about that original plot was that it had a certain je ne sais quoi: it made you look. I wanted to keep that so I immediately investigated heatmap functionality available in R.

Really? Ouch. Not much available there. I came up with two resources that were helpful: A Wikipedia entry about a Mandelbrot set animation in R; A stackoverflow answer that mentioned rasterImage in a comment. The first site lead me to the color set used in our original plot and the second gave me the pointer I needed to get the job done. I'll leave what follows as a reminder for myself and a helpful nudge for those who face a similar problem in the future.

hmap.example <- function () {

    code <- c("colfun <- colorRampPalette(c(...))",
        "my.colors <- colfun(10000)","xs <- 1:100",
        "X  <- outer(xs,rev(xs))",
        "C1 <- matrix(my.colors[X],100,100)", "X  <- outer(xs,xs)",
        "C2 <- matrix(my.colors[X],100,100)", "X  <- outer(rev(xs),xs)",
        "C3 <- matrix(my.colors[X],100,100)",
        "plot(c(-100,100),c(-100,100),type='n')",
        "rasterImage(C1,1,1,100,100)",
        "rasterImage(C2,-100,1,1,100)", "rasterImage(C3,-100,-100,1,1)",
        "abline(v=0,col='black',lwd=5)", "abline(h=0,col='black',lwd=5)")

    colfun <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan",
                    "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))

    my.colors <- colfun(10000)
    xs <- 1:100
    X  <- outer(xs,rev(xs))
    C1 <- matrix(my.colors[X],100,100)
    X  <- outer(xs,xs)
    C2 <- matrix(my.colors[X],100,100)
    X  <- outer(rev(xs),xs)
    C3 <- matrix(my.colors[X],100,100)
    plot(c(-100,100),c(-100,100),type='n',axes=FALSE,xlab='',ylab='')
    rasterImage(C1,1,1,100,100)
    rasterImage(C2,-100,1,1,100)
    rasterImage(C3,-100,-100,1,1)
    abline(v=0,col='black',lwd=5)
    abline(h=0,col='black',lwd=5)
    text(1,1:length(code)*-6,labels=code,cex=0.8,pos=4,family="mono")
}

And the result:

Thursday, May 26, 2011

Optical Disillusion

I dislike chartjunk. Not only is there a trend toward the incomprehensible but the movement comes with a ridiculous amount of flair. For all I can tell there exists a competition between infographic creators where the rules are based solely on who can cram more slop on a page.

Besides my obvious distaste for the style, there are sacrifices being made that compromise, or even forgo, the actual message - often without awareness or malice. Take, for example, the evolution of the pie chart.

For the sake of this example lets ignore the fact that a pie chart is a particularly poor way to display data to begin with. As a measure of two variables it provides a rough estimate of dominance but beyond that the human eye can not distinguish relative quantities across the various shapes. In almost all cases a simple bar chart provides a more precise description - even in the two variable case. A visual display of data should be able to stand on its own without the need of labels describing quantities or values. The pie chart fails in this respect, but I digress.

Consider a simple pie chart of two variables.
  (As a measure of the strength of a pie chart as a communication tool, can you guess the values of the two areas? Go ahead, take a guess.)

The red portion of the chart is 55% and the blue portion is the remaining 45%. Without labels it is hard to distinguish exactly but serves to at least show the dominance of red over blue. The problem with trendy infographincs is that a simple pie chart is almost never sufficient in the layout. It needs exploding, or gradients, or even a third dimension.

Lets dress it up a bit and make a 3D pie chart with the same values.
So what's my beef about that? Lets consider the new representative areas of the chart. In the first chart, the values were inconspicuous but at least the color representation mapped directly to the underlying data.

Standard Pie Chart (red pixels) : 44295 (55.000%)
Standard Pie Chart (blue pixels): 36188 (44.900%)


In this new 'cooler' version of the chart we have skewed the data representation and thus our understanding of the overall message. In fact, by visible surface area alone we have changed the meaning of the chart entirely!

3D Pie Chart (red pixels) : 44792 (47.300%)
3D Pie Chart (blue pixels): 49740 (52.600%)


What is now required of us in this new chart, along with somehow mapping area to value, is to do accurate mathematical transformations in our heads to convert the 3D surface to an area in 2D. In fact, we need to now be able to deduce that roughly 52% of viewable surface area translates to 45% underlying data. The skew depends on the pitch, yaw, and roll so there is no magical formula here - every view will be a different mapping between surfaces.

I don't think people consider these details when compiling charts. In my estimate they are only trying to provide the most 'eye candy' for the intended consumer. The behavior is facilitated by common built-in chart generators (only 48 out of Excel's 288 pie chart variations are simple 2D charts) but there is no warning about the possible loss of meaning.

I'm certainly not among those pushing the envelope with infographics - this definitely makes my opinion biased. I keep things as simple as possible and for most data hungry crowds my approach is just too boring against current standards. I do believe there is a middle-ground, however; a place where rich graphics convey accurate data with minimal annotation markup. I only wish I knew how to bridge the gap.

A huge thanks to Dana Brown for taking the time to review and provide feedback on the first draft of this post.

Thursday, May 19, 2011

Using structs

If you want to group related information in a C program the common way to do so is with a struct. You see this everywhere: various network packets are represented as structs; file system objects (inodes); time values can be stored with separate components in struct members; and so on. In all of these cases it is entirely possible to use basic arrays to maintain this data. The problem with that is that we, as non-machines, find it more natural to think in terms of collections with named members than random offsets into a buffer. In other words, it is easier to design and reason with 'object.member = value' than it is with 'buffer[offset] = value' (or something more obscure). Especially if you deal with members of various sizes.

I feel this is a natural progression - the tendency to want to group related items and operate with them as individual 'things'. I believe this to be a dominating theme with C programmers (and programmers in general). What I dont see too much of, however, is explicitly moving in the opposite direction. That is, given some data in the form of a buffer, C programmers are more likely to develop their own mini-parser to get at that data instead of using a struct to ease implementation.

As an example, I've seen the following many times in a variety of flavors:

uint32_t extract(unsigned char *buffer) {
    uint32_t value = 0;
    int i = 0;
    for (i=sizeof(uint32_t)-1; i>=0; i--) {
        value = value << 8;
        value += buffer[i];
    }
    return value;
}


And, while that is functional it is also error-prone and cumbersome to write each time you need to do such a conversion. In contrast, I see very little in the form of

struct extracted {
    uint32_t v[5];
};

struct extracted * map = buffer;


Where I think the implementation is simplified

If we remove most of the boilerplate sections of an example and examine just how each of these is available to use we can see what the overall effect is.

uint32_t vals[] = {0x0, 0x0f, 0x0f0f, 0x0f0f0f, 0x0f0f0f0f};
unsigned char * cvals = vals;
for (; i < 5; ++i)
    printf ("%10lu\n", extract (cvals + i * sizeof(uint32_t)));

and with the struct

uint32_t vals[] = {0x0, 0x0f, 0x0f0f, 0x0f0f0f, 0x0f0f0f0f};
unsigned char * cvals = vals;
struct extracted * map = cvals;
for (; i < 5; ++i)
    printf ("%10lu\n", map->v[i]);

The main differences between the two are how data is extracted from the raw buffer and how that implementation affects code design. In both cases the struct provides a cleaner and more understandable solution. In a more generic setting, one where you may not have the exact number of elements in the buffer, the struct approach above doesn't fit exactly.

However, it can be modified:

struct value {
    uint32_t data;
};

uint32_t extract(unsigned char *buffer) {
    struct value * v = buffer;
    return v->data;
}

Where the using the function still requires the buffer offset when calling the function but the method implementation is much cleaner.

This becomes even more useful if you consider cases where an individual member may contain multiple bits of information. For instance, it is common to have a data member of a struct represent a set of flags. The typical implementation involves a set of macros to test or set values in a bit mask.

For example:

#define FOO_FLAG    (1<<0)
#define BAR_FLAG    (1<<1)
#define BAZ_FLAG    (1<<2)

#define ISSETFOO(v) ((v).mask & (FOO_FLAG))
#define SETFOO(v)   ((v).mask |= (FOO_FLAG))
#define UNSETFOO(v) ((v).mask &= ~(FOO_FLAG))

/* similarly for BAR_FLAG and BAZ_FLAG */

struct data {
    unsigned short mask;
    unsigned char stuff[200];
};

int main () {
    struct data data;
    SETFOO(data);
    if (ISSETFOO(data)) {
        printf ("FOO_FLAG is set\n");
    } else {
        printf ("Foo_FLAG is not set\n");
    }
    /* ... */

With well designed macro names this approach does not imply altogether clumsy code but the macro design is still cumbersome. I think that a more elegant approach can be achieved through the use of structs.

struct flags {
    unsigned char foo:1;
    unsigned char bar:1;
    unsigned char baz:1;
};

struct data {
    struct flags mask;
    unsigned char stuff[200];
};

int main () {
    struct data data;
    data.mask.foo = 1;
    if (data.mask.foo) {
        printf ("FOO_FLAG is set\n");
    } else {
        printf ("Foo_FLAG is not set\n");
    }
    /* ... */

This can even be done without having to change implementation code. Leaving the macro interface while changing the bodies to represent the new implementation allows users of the macro interface to continue uninterrupted.

The struct is no panacea. However, I find that in these types of scenarios the struct provides for much cleaner and manageable code - something I favor whenever I can.

Friday, January 14, 2011

Closures (Ruby v Python)

I'm a Ruby guy. I'm not religious about it, but when it comes to scripting, prototyping, parsing, or automation my instinct (read: comfort zone) is Ruby.

I've been working on a project lately where the dominant language is Python. This has required me to both learn a new language and break habits built in other environments. I like learning new things and I think that having your comfort zone invaded every once in a while keeps you on your toes so this experience hasn't been too painful so far. It's interesting, though, to evaluate what I consider habits against what I think are more along the lines of principles.

One thing that prompted this was the way Ruby and Python differ in handling closures. I could claim that this started back when I was a TA and had to grade dozens of projects written in Python over many disparate editors (tabs vs. spaces, anyone?) but I digress. In this case, I was using what was basically a callback and I really like they way Python handles this syntactically. Consider the following:


def foo():
    print("in foo")

fun = foo   # fun is now a function object
fun()       # this invokes foo()

Which is not possible (at least with a similar syntax) in Ruby. In Ruby what you get is

def foo
    puts "in foo"
end

fun = foo   # invokes foo because foo takes no arguments
            # would be an error if foo expected arguments

And since foo gets the return value of puts (nil) if you try to use it later in your code you get an error. You'd either have to use a lambda or Proc to get around the fact that the method is invoked. In Ruby, you can elide the parentheses when calling a method which makes the above behavior perfectly rational.

Advantage Python. Let's try and use closures to do something...

There is a cool feature in Ruby that I've found useful in the past in that I can create a function generator similar to the following:

def foo_gen
    x = 0
    return lambda { x += 1 }
end

Where the returned function would represent the list of integers starting at 1 and increasing each time the function is invoked. Something like:

foo = foo_gen
3.times { puts foo.call }

Which would print out the numbers 1, 2, 3. In fact, each new call to foo_gen creates a new infinite sequence starting at 1. In Python, this turned out to not be so easy. There is a subtle difference in how the lexical scope is represented when defining a Python closure. Consider what I thought was an equivalent Python construct:

def foo_gen():
    x = 0
    def foo():
        x += 1  # Error: x not in scope here
        return x
    return foo

Unfortunately, the variable local to the scope that the closure was defined in is not visible from the closure itself. This subtlety is actually related to why a Python class method definition requires an explicit this argument when it is defined and Ruby class methods do not. In either case, the workaround to this problem is something similar to:

def foo_gen():
    x = 42
    def foo(x=x):
        while 1:
            x += 1
            yield x
    return foo

Two things are important in the above change:

  • I am required to provide an argument to the function that receives the value of the locally (to the defining scope) bound variable. (The alternative is to define x as an array and increment the first index of that array - I don't fully understand why that is valid, however)
  • I am now using a generator and yielding the values explicitly requiring an infinite loop to enable the generator

I should note that, in Ruby, the yield still happens it's just hidden behind the syntax and without the need for an explicit loop construct.

I'm a fan of list comprehensions and functionality such as enumerate. Other features of Python are certainly growing on me quickly (though, ' '.join(lst) is still counter intuitive compared to lst.join ' '). And, while I can accept the white space requirement, the necessary parenthesis for function calls, explicit self argument in class methods and some of Python's other assembly as uncomfortable, I find this to be more or less broken.

I'm not intending to bash Python or praise Ruby. I've got miles to go before I am a Python master (or Ruby master, for that matter) and I am entirely open to the fact that in my journey I may come to understand this behavior. Until that point, however, I still feel icky writing code like that.