A quick cautionary note on predictiveness and R-squared

Yesterday Sander IJtsma published a nice post looking at various metrics and how well they predict future performance (note 1). Using R-squared as a measure of how predictable of future performance is, Sander’s summary is such

“The conclusion from these graphs is quite simple actually. Expected Goals Ratio forms an impressive improvement on raw shot metrics at each and every point in the season. It picks up information much like the raw shot metrics do in the very early stages, then predicts future performance significantly better at early to mid-season, and also holds predictive capacities for longer. It makes sense to use Expected Goals Ratio from as early as four matches played. Even that early, it is as good a predictor for future performance as Points per Game and Goals Ratio will ever be.”

However, I’m not sure that’s necessarily true. Below I’ve reproduced Sander’s sixth plot using my own dataset. As I don’t have data from the Eredivisie the dataset here is that from the ’12-13 and ’13-14 seasons for the big 5 leagues. I’d be happy to include the Eredivisie data if someone were to forward it. I’ve also used a simple Team Rating I made for the European leagues in place of ExG:

Screen Shot 2015-01-06 at 11.51.43 AM

Now it’s more than fair to say that Team Rating (and in Sander’s case Expected Goals) has a markedly stronger correlation to the points per game that a team scores in the future than any of the other metrics. STR outperforms TSR, and by a marked amount in the middle of the season, whilst GR and TSR are essentially equivalent from games 13 onwards. However, if we’re looking at how good a metric is at predicting future points we should really be looking at the error in those predictions, where the smaller the error, the better the metric at predicting what will happen. I’ve plotted the error for each of the metrics above after each match of the season below (for methodology see note 2). For reference, if R-squared is doing a good job of outlining how well each metric is predicting future performance here are three features we’d expect to see:

1. Team Rating gets off to a flying start, with a clear lead over the other metrics after just 2-3 games
2. STR catches TSR after 4 games and comfortably surpasses it by the time 8 have been played
3. GR catchers TSR after 13 games and remains at least as predictive as TSR for the remainder of the season

So onto the plot:

Screen Shot 2015-01-06 at 11.52.21 AM

What do we see? Well TSR runs the show for the first nine weeks of the season, at which point the Team Rating takes over and holds the lead until week 27, and from there the metrics become fairly interchangeable. At the ‘bad end’ GR is pretty bad for the entire season, finally joining the pack after 28-30 games. This is a pretty different story from the one we’d expect from looking at the plot of correlation coefficients, and not one of the three features outlined above is really present. I’d contend that this is a better way of assessing which metric is the best at predicting future points per game.

Does this mean that ExpG isn’t a better predictor of future points per game than TSR, STR, or GR? Without running the same study I can’t say, but I think this shows that the correlation coefficient doesn’t provide the required evidence to state that conclusively.

Finally, to read more on R-squared I suggest reading any of the following multitude of links by Phil Birnbaum (even with this many I know I’m missing some) 1, 2, 3, 4, 5, 6, 7.

Note 1: I should note that the work by me referenced in Sander’s post has specifically looked at ‘how reproducible is a metric from one season to the next‘ and focussed solely on the Premiership. Though, based on that work, I suggested that I’d use TSR more than STR for the Premiership only, whether that conclusion holds for a markedly different dataset such as this one is unknown. Furthermore, my thinking has changed somewhat in the ensuing 3-4 years and, as outlined above, I no longer think relying on R-squared is the best way to go about such a study, rather how looking at the errors produced by a metric is a better method.

Note 2: For each of the metrics (GR/TSR/STR/Team Rating) I’ve determined the relationship between the metric and the points per game each team has scored so far in the season. I’ve then used that relationship to determine the points per game we’d expect each team to score over the remainder of the season, and calculated the difference between that value and the points per game the team actually does score over the remainder of the season. The STDEV reported is that of the difference between the predicted end of season points and the actual end of season points for each team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s