Thursday, February 27, 2014

Vertical Histogram

In the process of munging data for my current project I came across the need to compare (visually) the difference between two modes within the same dataset. I was using a simple scatterplot and setting the alpha in the hopes that the over-plotting would indicate which was the major mode. Unfortunately, the size of the data overwhelmed this approach.

I only wanted to use a single image and it was important that I keep the scatterplot to show other features of the data. I started looking for a way to combine a histogram (rotated 90 degrees) with the scatterplot to help describe the density within the plot. A quick search for how to do this in R turned up empty so I decided to implement my own version of such a plot.

Certainly, there are other ways to describe the features that I am trying to present here but in this particular case the following code worked out nicely. Hopefully it proves useful to others as well.


plot.vertical.hist <- function(data,breaks=500) {

    agg <- aggregate(data$Y, by=list(xs=data$X), FUN=mean)
    hs <- hist(agg$x / 10000, breaks=breaks, plot=FALSE)

    old.par <- par(no.readonly=TRUE)
    mar.default <- par('mar')
    mar.left <- mar.default
    mar.right <- mar.default
    mar.left[4] <- 0
    mar.right[2] <- 0

    # Main plot 
    par (fig=c(0,0.8,0,1.0), mar=mar.left)
    plot (agg$xs, agg$x / 10000,
          xlab="X", ylab="Y",
          main="Vertical Histogram Side Plot",
          pch=19, col=rgb(0.5,0.5,0.5,alpha=0.5))
    grid ()

    # Vertical histogram of the same data
    par (fig=c(0.8,1.0,0.0,1.0), mar=mar.right, new=TRUE)
    plot (NA, type='n', axes=FALSE, yaxt='n',
          xlab='Frequency', ylab=NA, main=NA,
          xlim=c(0,max(hs$counts)),
          ylim=c(1,length(hs$counts)))
    axis (1)
    arrows(rep(0,length(hs$counts)),1:length(hs$counts),
           hs$counts,1:length(hs$counts),
           length=0,angle=0)

    par(old.par)
    invisible ()
}

Results look similar to the following:



Initially, I experimented with rug or barplot(..., horiz=TRUE). Unfortunately, rug isn't available on the left or right side and would suffer from the same problem that the alpha settings did and I was unable to get the alignment worked out when using barplot.