On whether a team drawing games early in the season is a predictor of it drawing games over the rest of the season

Yesterday I showed that there was essentially no season to season correlation with regards to the number of games that a team drew. I also showed that there was essentially no correlation between home and away draws for a given team in a given season. I was asked on twitter about whether draws early on in the season were predictive of draws over the rest of the season, which is a pretty solid question.

Fortunately I’ve finally got round to building a spreadsheet I’d been thinking of doing for a while, and it allows me to answer the question really quickly (and hopefully effectively).

There are only two plots in this post (and a nota bene to conclude) so I’ll dive straight in. The sample size here are the 280 team season played in the Premiership from the ’00-01 to ’13-14 seasons, inclusive. For both plots what I’ve done is plotted the number of games played thus far in the season on the x-axis, and an associated R^2 value on the y-axis. I’m going to take an example to describe how the R^2 values were calculated. On the first plot is a point at (21,0.008) – the highest y value on the plot. The R^2 value for that point was calculated by looking at the number of draws each of the 280 teams recorded in games 1 – 21 of their season, and comparing it to the number of draws each team recorded in games 22 – 38 of the same season. Essentially I’m comparing the number of draws in games 1 – ‘x’ with the number of draws in games ‘x+1’ – 38.

drawsthroughseason

What we see is a terrible correlation between past and future draws, no matter what time in the season we’re looking. The number of draws that a team has in a given season is largely driven by random variation.

To emphasise my point lets I’ve made the same plot, but this time I’ll use TSR instead of draws.

tsrthroughseason

Simply compare the R^2 values on the two graphs and you’ll quickly see my point. One of these metrics is repeatable and predictive – the other is much more noise than signal.

N.b. In both of the plots in this post it looks like something abnormal is occurring between games 10-17. I’ve checked the database several times and genuinely don’t think there’s an error in there. The second plot looks roughly symmetrical though, which I’d expect, so I’m pretty confident that this is a real effect and not a data processing error.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s