Obviously, given the ample evidence that TSR is the best metric I can find in the data available to me in terms of predicting future performance (link 1, link 2) it makes sense to try and improve upon that metric. This is the first attempt to do so and I can think of a couple of tweaks in the future that may help out too.
Something that’s mentioned often in sports is the Pythagorean. It was initially made popular in baseball and is something I’ve actually considered myself for football. Though it performed admirably it wasn’t quite as good as using a simple ratio of shots (link).
For football, in terms of shots, the formula of the Pythagorean is as follows:
TSRx = (Total shots for)x/(Total shots for + total shots against)x
In the earlier, previously linked, study I was only using a Pythagorean exponent (x) of two and the sample consisted of the top four English leagues. This time I’m going to focus on the records of Premiership teams and vary the exponent to see what happens to the correlation between TSRx in year 1 and TSRx in year 2.
The plot reaches a maximum where the exponent is, you guessed it, 2.4. This means that using TSR2.4 is the most repeatable value, year-to-year of any of the metrics I’ve so far proposed.
One way to show this is graphically. Below I’ve plotted the standard deviation for goals ratio, shots on target ratio, TSR, and TSR2.4 on a season by season basis for the 17 teams that avoid relegation from the Premiership in year ’1′ (eg, 2000-01) and the corresponding values for each of these metrics in year ’2′ (eg, 2001-02).
Season ’12′ doesn’t represent a season of games, but is the standard deviation of all 187 data points that went into the plot.
The improvement over the 187 sample isn’t enormous, TSR2.4 shows ~2% less regression than TSR. However it is better and a lot of small steps are still a plausible way to make this metric more powerful.
I actually think this plot provides further evidence that TSR is simply the best season-to-season predictor in terms of a metric that is repeatable.
Let’s tie this in to the plot I showed last week (link), which demonstrated that, at least early in the season, TSR was the best metric we had to predict performance over the remainder of the season. Does TSR2.4 provide an improvement over TSR?
Again the improvement isn’t enormous but it’s marked and observable. It’s value is most visible over the first four to six games.
So one last thing I want to show is how well these metrics in year ’1′ translate into points in year ’2′. Ultimately that isn’t my aim in this exercise but I’m very well aware that for many of the people reading this points are all that matter. As such, the plot below follows the same idea as the second figure above, but in this case the standard deviation is between the points that these metrics would expect a team to score in year ’2′ and the amount the team actually does score.
As before, season ’12′ doesn’t represent one season, but the standard deviation of all of the data points for a given metric.
In this case goals rule the day, and I’m beginning to wonder whether a seasons worth of goals, a total of ~100 in each team’s matches, is actually a big enough sample to allow us to judge a teams season more heavily on those.
We see the head start that the shot metrics have over goals at the beginning of the season when their sample size is in the low hundreds, but this advantage subsequently fades as the season goes on and the sample size of goals increases. I think it’s an avenue that at least deserves some more consideration.
Anyway, there you have it, TSR2.4. For now it sits as the gold standard but am I going to be using it in everyday predictions? Probably not – I think the added complication of understanding the Pythagorean makes it a little less accessible for others, and that the small improvement in predictive ability doesn’t compensate for that. Yet. A couple more small steps and I think we’ll be on to something that is worth using on a more regular basis.
P.S. You may well have read already but myself and Richard Whittall have begun the Soccer Analytics Forum. We’re hoping to gather together as much of the community in a place that has room for more coherent discussion than we see on twitter but is less formal than blogging. I’d encourage anyone interested in the stuff I write to sign up and share their idea’s. And as always, feedback and suggestions as to what we can do to improve the forum are more than welcome. James