How quickly do the ‘advanced’ metrics regress to their final values?

This post has been a long time coming. I think there’s tons of evidence on this site (link) that shots are the best predictor I have of performance in the following season but, because of the way my data was stored, I’ve yet to produce anything that showed how these metrics perform within an individual season. The basic premise I want to answer is a) are they equally useful within a season and b) how quickly do they resemble their final (end of season) values?

So without further delay; shots, shots on target, and goals. Taking goals ratio as an example, the idea of these plots is to find the correlation between a teams goals ratio after x games and the same teams goal ratio at the end of the season. The more predictive power a metric has, the faster it will trend towards an R2 of 1. Each dot on these plots represents a sample size of 240 (20 teams for 12 seasons):

So total shots are marginally better than total shots (most notably after 1 game but also persistently throughout the season) and both are significantly better predictors than goals.

How about some of the metrics I’ve identified as more variable over a season, shooting and save percentages, and PDO;

There’s no real rush towards an R2 of 1 here, complementing the previous evidence I’ve presented that these metrics have terrible predictive power and regress very heavily towards the mean (link).

Finally this table is a quick summary of how long it takes each of the metrics to achieve R2 values above certain thresholds.

Basically shots are better than goals and they’re all far better than Sh%, Sv%, and PDO.

What I think is significant about this is that I haven’t factored in quality of competition at all. I’m currently contemplating the best way to incorporate that at the moment but already TSR (and STR) provides a good measure very early in the season. By this point in the current season, 12-13 games in, we can be relatively sure that a teams current TSR is a pretty good reflection of what their TSR will be come the end of the season. There’s something in this, the onus is on finding ways to improve upon it.

About these ads

4 thoughts on “How quickly do the ‘advanced’ metrics regress to their final values?”

1. Matthias Kullowatz says:

Just trying to get a feel for the data that lead to these correlations… Am I correct in thinking that you took all teams’ TSRs through, say, 15 games, then their TSRs through the whole season, and calculated the correlation/R^2 from that?

Did you try splitting the data into two pieces with no games in the intersection? Like for instance, how do the first ten games predict the next ten, or first five predict the next five?

Good stuff. Thanks for sharing :-)