A while back I promised Benjamin Massey (his site can be found here) that I’d post these plots once it stopped being sunny. Then it didn’t rain at all in July. So I’m taking an opportunity whilst there’s a small amount of cloud cover.

There’s been debate about how useful efficiency is. I’ve long argued that, in terms of sh% (the percentage of shots on target that result in a goal) regresses heavily from season to season, and is a pretty pisspoor measure of how good a team is. Will Morgan has also shown this is likely the case with individual players. As an aside I applaud the efforts of Will to go and manually grab the data for that piece – I’m way too lazy to have done so. There are also dissenting voices on the subject, but as of yet I haven’t seen anyone explain how there can be a high level of skill involved in shooting if there is no statistically significant difference between players taking penalties (see the first plot here.

Another area I’ve seen debated on twitter is the proportion of shots that a team takes that go on target (I’m not going to go back and grab the conversation because they’re old and way down timelines – from memory the main people I’ve seen discussing it are Benjamin Pugsley and Ted Knutson – feel free to let me know if I’ve left anyone out).This seemed to me to have a small bit of merit – teams play different systems that, in theory, will lead to them and their opponents to shoot from better/worse positions on the pitch. And obviously this would be a valuable skill – if your opponents takes an off target shot there’s a much lower chance that it will result in a goal than an on target one.

However, I still don’t buy it as a massive skill. In general I think players and managers are smart enough to identify, for example, that shots from a long way out are much more likely to be off than those from in close and (score effects aside) I’d be surprised if it was something that a team has a particularly large amount of control over.

So what does that data tell us? I’ve taken the last 13 Premiership seasons, plotted the proportion of a given teams shots that are on target in year ‘n’ on the x axis, and the proportion of that same teams shots that are on target in year ‘n+1’ on the y-axis:

So there’s a slight positive correlation, and an R^{2} value of 0.28. That’s not bad, but not high either – it suggests that, over the course of a season, the ability to convert shots to shots on target is 53% skill, and 47% luck – a simple way to estimate the the proportion of shots a team will take that are on target in the year ‘n+1’ we should regress their year ‘n’ value half of the way towards the mean.

So there’s a bit of skill here, but it’s not that high. In terms of repeatability it’s just closer to sh% (~40% skill) than the proportion of shots a team takes (~70% skill).

Next, let’s look at the other end of the pitch – how repeatable a skill is preventing your opponents from taking a shot that goes on target?

Similar story, with a lower R^{2} value. In this case the ability of a team to prevent their opponents converting shots to shots on target over the course of a season is 44% skill, and 56% luck – in this case its repeatability is much closer to sv% (~30% skill) than the proportion of shots a team takes (~70% skill).

In short, yes – these taking shots that go on target, and preventing your opposition from doing so, are skills. However, they’re still mainly noise, especially when compared with much more reliable measures like TSR.

“So there’s a slight positive correlation, and an R2 value of 0.28. That’s not bad, but not high either – it suggests that, over the course of a season, the ability to convert shots to shots on target is 53% skill, and 47% luck…”

I believe the R2 value is actually the percent of explanation (28%), rather than the correlation.

Nope, sorry. That’s a common misconception. I encourage you to read these posts by Phil Birnbaum

http://blog.philbirnbaum.com/2006/08/on-correlation-r-and-r-squared.html

http://blog.philbirnbaum.com/2006/11/wages-of-wins-on-r-and-r-squared.html

http://blog.philbirnbaum.com/2006/11/can-money-buy-wins-team-correlation.html

http://blog.philbirnbaum.com/2007/04/can-payroll-buy-wins-percentage-of.html

http://blog.philbirnbaum.com/2007/10/r-squared-abuse.html

http://blog.philbirnbaum.com/2007/11/still-more-on-r-squared.html

Okay, I was referring to percent of variance explained, but I never put much thought into “real-world explanation percentage.” It makes sense that focusing on standard deviations would be preferable to variance (What the hell is a win squared anyway?) which is why, for instance, the MSE isn’t all that useful while the residual standard error is when it comes to quantifying the estimation error.

I’m going to digest this thoughtfully. Thank you.