Friday, May 6, 2016

Derive formula to convert Fahrenheit to Celsius

I had been revisiting linear regression the other day and as part of that review I challenged myself to use regression to derive a well known formula without actually looking it up (the formula, that is).

The first example that came to my mind was the formula for converting temperature in Fahrenheit to Celsius. I wanted to see if I could derive that formula using two sample data sets and a simple linear regression. If the data was accurate enough, I should be able to derive the exact equation for converting between the two formats. In essence, I wanted to be able to come to the following:

C = (F - 32) * 5/9

Since I didn't have a data set with both types of observations available I was faced with a little 'chicken or the egg' situation. Seeing how this is just a fun little exercise I generated my own data introducing some artificial error to stand in for true observations.

After the 'observations' were available the regression was as simple as loading the data into R and running lm. I ran through the entire manual procedure of how this works in a previous post so wont repeat it here. The result of calling lm is a list and one of the elements of that list is the coefficients - these represent the intercept and slope of:

y = mx + b

Since the Celsius observations are the response in my formula and the Fahrenheit observations are the predictors the I can create a similar equation where y represents the Celsius values and x represents the Fahrenheit values. Given that, I get the following (after plugging in the slope and intercept):

C = 0.555547 * F - 17.772318

Expanding the original equation for converting between Fahrenheit and Celsius yields:

C = (F * 5/9) - (32 * 5/9)
C = F * 0.555556 - 17.777778

So, given observations in both Celsius and Fahrenheit (for the same events, of course) it is possible to derive an equation to convert between the two using linear regression.

My observations are very highly correlated. Obviously, as this correlation falls the accuracy of the resulting equation will suffer. Fortunately there are tools to measure the correlation which helps quantify this accuracy.

You can find the code for this exercise on github.