Last time I suggested that the best predictive model we could hope for in terms of the final Premiership standings would have a STDEV of ~9.25 points over the course of a season, because random variation can play a pretty big role over a small sample of games. However, as I’m not entirely sure this method captures the nuance of a 3-1-0 points system (hence why I debated between using 3 points and 2.74 as the multiplier in the last post) then lets I’m going to go about it another way, and see how well the methods agree. This time I’m following Phil’s methodology in this post, whereby the variance is determined as the outcome of actual game results.
So the average team scores 52.075 points a season, or 1.370 points per game, on the back of 14.075 wins, 9.85 draws, and 14.075 defeats. The standard deviation is thus calculated as follows:
STDEV = ((14.075*(3-1.37)^2)+(9.85*(1-1.37)^2)+(14.075*(0-1.37)^2))/(38-1) = 1.76
The variance is then determined by multiplying this number by the number of games (38), and taking the square root, which gives us a value of the random variation of 8.18 points, lower than the 9.25 seen last time. If we take yesterdays number and multiply by 2.74 instead of 3 (i.e., by the number of points actually scored in games as opposed to the number of points for a win) then the number comes out as 8.45, which is markedly closer. I’m still not entirely sure I’ve convinced myself which is the best to use though, so for now I’ll use the two extremes, and say that the amount of random variation is somewhere between 8.18 and 9.25 points over a 38 game season.
What we actually observe (over 12 seasons – 240 team seasons – of data) is that the variation is 16.32, and using the following equation we can determine how much of the variation is random, and how much of it is due to talent.
Variation (observed) = Variation (random) + Variation (talent)
So at one end of the scale
Variation (talent) = 16.32^2 – 8.18^2 = 14.12^2
In this scenario 37% of the variation is random, and 63% is talent
And at the other end of the scale
Variation (talent) = 16.32^2 – 9.25^2 = 13.45^2
In this scenario 41% of the variation is random, and 59% is talent
So we have a pretty good agreement – the amount of variation in the points scored by Premiership teams is 37-41% due to random variation, and 59-63% due to the variation in talent in the league. This also suggests we’d expect the spread of talent to be sufficient that, even if there were no random variation we’d expect ~1 team per season to score more than 80 points OR less than 24 based on the variation in talent alone.
In this case the observed variation is 14.13 points, which is markedly smaller than the Premiership. This fits well with the plot I posted when I compared the two leagues, with the middle of the La Liga table historically being significantly more bunched than the Premiership. The lower limit on the amount of random variation in La Liga is 8.22 points, and the upper limit is 9.25 points. Using these numbers we find that the variation in talent is 11.49 and 10.68 points respectively, meaning 42-46% of the observed variation is due to random variation, with the other 54-58% due to the variation in talent.
That home advantage exists probably means these values are a slight underestimation in the proportion due to talent, but I’m confident they’re in the right ballpark. There’s still a solid amount of information to mine from variation so I’ll probably revisit this in the near future. I also want to go back and re-calculate at what point in the season talent overtakes luck as the dominant factor using this method – I figure it’ll be within a few of the 18 proposed last time, but it’s worth checking.