Friday, July 19, 2013

Column-Major Confusion

As a programmer coming from a language like C I am used to understanding multidimensional arrays in the following way:
int matrix[3][3] = {
    { 1, 2, 3 },
    { 4, 5, 6 },
    { 7, 8, 9 }
};

Understandably, this is a little bit of a setback when trying to grok how a language such as R handles matrices. The example above is 'row-major' but R uses 'column-major' by default. Note that I'm not describing memory layout here - which coincidentally is the same as my description - I referring to how the matrices are presented to the programmer. To complete the example, here is how I would create the same matrix as above in R:
> matrix(1:9,ncol=3)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

There is no problem with this per se. In fact, I'd imagine that R programmers that come to something like C may feel similar unease in the paradigm shift. I'm finding that this fact is glazed over in some texts offering an introduction to R. This happens in non-obvious ways. In particular, I found this example today:
> matrix(c(1,0,0,0,1,0,0,0,1),ncol=3)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

The identity matrix is a particularly bad choice in this regard as it gives no indication of the true layout being used. It is probably good for any tutorial using matrices to cover an obvious simple case first to set the stage before moving directly to something like the identity matrix (or any other symmetric matrix for that matter).

No comments :

Post a Comment