‘The Very Little Code’ Illusion

February 26, 2013 2:45 pm

In many programming languages, there is this new claim: You can achieve so much with just so little code. Let us take an example from R. (This example is from “The Art of R Programming” by Norman Matloff.

So basically there are two vectors, one vector contains the predicted values of a given variable (the prediction is done using an algorithm) and the other contains the actual values for the same variable. All the values are binary – they can take a value of 0 or 1. One way to calculate the accuracy of our prediction algorithm is to calculate the ‘proportion of errors’ – basically compare the first vector, item by item, with the second vector, and calculate the error ratio as a proportion of the number of elements that are different in the two vectors, to the total number of elements. Here is how it is done in R:

mean(abs(pred-actual))

Here, ‘pred’ is the vector that contains the predicted values, while ‘actual’ is the vector has the actual values. Since all values are just 0 or 1, Subtracting the second from the first gives us values of either 0, 1, or -1. Here, 1 or -1 correspond to prediction errors in one direction or the other, predicting 0 when the true value was 1 or vice versa. Taking absolute values with abs(), we have 0s and 1s, the latter corresponding to errors.It remains to calculate the proportion of errors. We do this by applying mean(), where we are exploiting the mathematical fact that the mean of 0 and 1 data is the proportion of 1s. This is a common R trick. (as explained in the book)

Here is our question: Yes, the final code looks compact – but did we not spend the same amount of time and effort in coming up with this, as we would have, if we had just written it as ‘Step 1 – calculate errors, Step 2 – count them, Step 3 – Divide by total number’ etc? The claim that we wrote very little code – which is done for many programming languages these days – is that not just an illusion, given the fact that we went through the same thinking process as we would have for a ‘longer’ programming language? (We totally understand if the argument was made from an efficiency standpoint, but in most cases, efficiency is actually sacrificed in the interest of brevity.)

For Interesting Statistics Everyday, Find Statspotting on Facebook and Follow Statspotting on Twitter

Leave a Reply