Thursday, May 19, 2011

Using structs

If you want to group related information in a C program the common way to do so is with a struct. You see this everywhere: various network packets are represented as structs; file system objects (inodes); time values can be stored with separate components in struct members; and so on. In all of these cases it is entirely possible to use basic arrays to maintain this data. The problem with that is that we, as non-machines, find it more natural to think in terms of collections with named members than random offsets into a buffer. In other words, it is easier to design and reason with 'object.member = value' than it is with 'buffer[offset] = value' (or something more obscure). Especially if you deal with members of various sizes.

I feel this is a natural progression - the tendency to want to group related items and operate with them as individual 'things'. I believe this to be a dominating theme with C programmers (and programmers in general). What I dont see too much of, however, is explicitly moving in the opposite direction. That is, given some data in the form of a buffer, C programmers are more likely to develop their own mini-parser to get at that data instead of using a struct to ease implementation.

As an example, I've seen the following many times in a variety of flavors:

uint32_t extract(unsigned char *buffer) {
    uint32_t value = 0;
    int i = 0;
    for (i=sizeof(uint32_t)-1; i>=0; i--) {
        value = value << 8;
        value += buffer[i];
    }
    return value;
}


And, while that is functional it is also error-prone and cumbersome to write each time you need to do such a conversion. In contrast, I see very little in the form of

struct extracted {
    uint32_t v[5];
};

struct extracted * map = buffer;


Where I think the implementation is simplified

If we remove most of the boilerplate sections of an example and examine just how each of these is available to use we can see what the overall effect is.

uint32_t vals[] = {0x0, 0x0f, 0x0f0f, 0x0f0f0f, 0x0f0f0f0f};
unsigned char * cvals = vals;
for (; i < 5; ++i)
    printf ("%10lu\n", extract (cvals + i * sizeof(uint32_t)));

and with the struct

uint32_t vals[] = {0x0, 0x0f, 0x0f0f, 0x0f0f0f, 0x0f0f0f0f};
unsigned char * cvals = vals;
struct extracted * map = cvals;
for (; i < 5; ++i)
    printf ("%10lu\n", map->v[i]);

The main differences between the two are how data is extracted from the raw buffer and how that implementation affects code design. In both cases the struct provides a cleaner and more understandable solution. In a more generic setting, one where you may not have the exact number of elements in the buffer, the struct approach above doesn't fit exactly.

However, it can be modified:

struct value {
    uint32_t data;
};

uint32_t extract(unsigned char *buffer) {
    struct value * v = buffer;
    return v->data;
}

Where the using the function still requires the buffer offset when calling the function but the method implementation is much cleaner.

This becomes even more useful if you consider cases where an individual member may contain multiple bits of information. For instance, it is common to have a data member of a struct represent a set of flags. The typical implementation involves a set of macros to test or set values in a bit mask.

For example:

#define FOO_FLAG    (1<<0)
#define BAR_FLAG    (1<<1)
#define BAZ_FLAG    (1<<2)

#define ISSETFOO(v) ((v).mask & (FOO_FLAG))
#define SETFOO(v)   ((v).mask |= (FOO_FLAG))
#define UNSETFOO(v) ((v).mask &= ~(FOO_FLAG))

/* similarly for BAR_FLAG and BAZ_FLAG */

struct data {
    unsigned short mask;
    unsigned char stuff[200];
};

int main () {
    struct data data;
    SETFOO(data);
    if (ISSETFOO(data)) {
        printf ("FOO_FLAG is set\n");
    } else {
        printf ("Foo_FLAG is not set\n");
    }
    /* ... */

With well designed macro names this approach does not imply altogether clumsy code but the macro design is still cumbersome. I think that a more elegant approach can be achieved through the use of structs.

struct flags {
    unsigned char foo:1;
    unsigned char bar:1;
    unsigned char baz:1;
};

struct data {
    struct flags mask;
    unsigned char stuff[200];
};

int main () {
    struct data data;
    data.mask.foo = 1;
    if (data.mask.foo) {
        printf ("FOO_FLAG is set\n");
    } else {
        printf ("Foo_FLAG is not set\n");
    }
    /* ... */

This can even be done without having to change implementation code. Leaving the macro interface while changing the bodies to represent the new implementation allows users of the macro interface to continue uninterrupted.

The struct is no panacea. However, I find that in these types of scenarios the struct provides for much cleaner and manageable code - something I favor whenever I can.

No comments :

Post a Comment