## On confidence in model predictions and why it’s not necessarily a good thing

Earlier this year Phil Birnbaum wrote a great post here about how the spread of points in real life will always be larger than that predicted by a model. I could make the same argument but Phil does so with an eloquence and clarity that’s beyond me. I’d highly recommend just reading his post but if not then basically the point boils down to two points.

1. the variance in the number of points that teams score in a given season can be attributed to one of two factors – skill and luck.

Expressed mathematically that can be summarised as
Variance(Observed) = Variance(Skill) + Variance(Luck)
Or, to put it another way: STDEV(Observed)^2 = STDEV(Skill)^2 + STDEV(Luck)^2

2. obviously we can only predict the skill part – predicting which teams will get lucky is a fools game.

Thus, a quick way to check whether a model gives rise to a sensible set of predictions is to consider the size of the variance in the predictions compared to those in the Premiership table that can be attributed to skill alone.

But how do we know how much of the variance is due to skill and how much is due to luck? Well, we have estimates for two parts of the equation and thus can determine the third. The observed standard deviation of points in the Premiership over the past 14 seasons is ~16.6 points. Within a given season that can range from ~13 – 20 points, but on the whole ~16.6 is a pretty solid estimate. The standard deviation due to luck has been estimated a couple of ways. I’ve estimated it to be ~8.2 points (though I’m aware that is an overestimate as home advantage isn’t factored in), whilst Neil Charles has estimated it to range from 7.0 – 7.6 for individual teams. If we take those two extremes (7.0 and 8.2) and plug them into the equation along with an observed standard deviation of 16.6 points, then we get a standard deviation due to skill of 14.4 – 15.1 points.

That gives us a good benchmark. But, as I’ve shown previously, the standard deviation of points in the Premiership is rising, and in the last three seasons the standard deviation in points has been 17.9 points (that being said, it is 16.7 if we consider the past four seasons). Let’s be somewhat generous and suggest that the standard deviation of the past three seasons is a more accurate reflection of the spread of talent in the league today than the last 14 (or even four) seasons. Following the same method as in the previous paragraph that would give us a standard deviation due to skill of 15.9 – 16.5 points.

So with that conservative estimate I think we can pretty fairly say that any set of predictions that has a standard deviation in it’s point of more than 16.5 then it’s confident not only in it’s ability to predict skill, but also is trying to predict some of the variance due to luck.

Prior to the season Simon Gleave gathered a whole slew of Premiership predictions which may be found here. In there were a total of 22 predictions made using models. Below is a table showing the standard deviation in the predictions of those models, in order of most conservative to most confident. I’ve also added two benchmarks – that each team scores a league average 52 points (which obviously has a standard deviation of 0 points) and a simple model that makes a prediction for each team by regressing the number of points that the team scored last season.

Most of the models fall into the range below 16.5. I don’t know the details of all of the models so I’ll discuss the ones that I’m most familiar with.

1. It’s no surprise to see the raw TSR predictions close to the top – it does well for what it is but has very little knowledge and so is, by necessity, modest in how accurately it claims to be able to predict what will happen in the future.
2. It’s also no surprise to see the predictions based on the Team Ratings to have a higher standard deviation than TSR – as they incorporate more information.
3. The simple points based regression model gives a standard deviation of >16.5, why is that? Well it’s due to the fact that the ’13-14 season saw a very wide spread in points distribution throughout the league. In most seasons this model comes in with a standard deviation of <16.5.

Finally – two models in particular (those by Steve Lawrence and James Yorke), and maybe a couple of others, stand out from the crowd in that they seem confident in their ability to predict a significant proportion of the luck seen in the league. What does that mean? Well Phil says it best (seriously – go and read the piece, it’s excellent):

Without looking at these models I’m not sure what’s going on – but I can’t really think of a reasonable way that the standard deviation should be that high.

One last thing – it’s early in the season and I’ll update this at some point as the season goes on – but here’s a table with the current performance of the models, from most to least accurate. The final column is the standard deviation in the points predicted by each model. I’ve used conditional formatting to highlight it (green = more modest spread, red = wide spread) but it’s not hard to spot the pattern: