In my last post I plotted team points in consecutive seasons to illustrate how to calculate regression to the mean. It also serves a purpose in this post so I’ve included it again, with points scored in year 1 (eg 2000-01) plotted on the x-axis and points scored in year 2 (eg 2001-02) plotted on the y-axis.
The R2 value is 0.608*, meaning that points regress ~22% to the mean on a yearly basis. I want to now move on from points and see if there’s anything that shows a better correlation from season to season.
I’m going to compare points to goals, shots on target and total shots and I’m going to evaluate each one in three different ways.
First is differential which is simply described as:
events for minus events against
Second is the ratio:
Events for / (Events for + events against)
And the third way is using Pythagoras:
(Events for)2 / (Events for + events against)2
For each variable I’ve determined the correlation between performance in year 1 and 2 as shown above in the plot for points. The R2 values are summarised in the table below
|Shots on target||0.750||0.761||0.750|
The table tells us two things. Firstly that shots on target are the most reproducible event on a season to season basis and that option two, using the ratio of two events gives the most repeatable results. Whilst using ratios is only marginally better than using differential the difference is enough to be significant.
The best overall is shots on target ratio (SOTR), with less than 13% regression to the mean from season to season.
In the future I’m going to look at how predictive each of these are over a smaller sample of games (from 2 up to 38) and get a definitive result but for the time being I’m going to use SOTR to predict future performance.
*There is some survivor bias here which is likely to be skewing the correlation. Obviously only the 17 teams that aren’t relegated can be used here. Teams are relegated through a combination of being terrible and bad luck. Regression to the mean (eg of PDO) is likely to cause their point total to increase the following year, thus reducing the R2 value observed