If you want an indication as to how good a team is then it’s instructive to look at how well it controls play in the attacking areas of the pitch. Fortunately we have a freely available stat that tracks exactly this – shots. The percentage of shots taken by a team over a given period of time as a proportion of the total shots in their games is deemed a teams total shots ratio (TSR). Not only does TSR correlate well with the number of points a team scores in a given season (R^2 = 0.66), but it also is an incredibly repeatable metric.
So what about the TSR of United? Well last year I wrote a post that, based on their TSR, United were a team due to see a decline in performance. An update of that plot, including this seasons data, is below.
There’s been a bit of variation but when using data across two seasons this could be due to a number of things (variability in the quality of competition for one). In all United have been a 0.540-0.580 team over the last two seasons. Not what we’d expect from a team that was one goal away from winning both available titles in that span (for reference the green line indicates the average TSR of a championship winning team).
Anyway it’s this season that I’m really interested in. Through 32 games they’ve scored 80 points, despite the fact that with a TSR over the season of 0.539 we’d expect them to have scored only 53. That’s an enormous difference and we can look look at this in a couple of ways. Firstly I’ve taken every team who’ve played a stretch of 32 games and had a TSR between 0.529 and 0.549 (n = 97, mean TSR = 0.539). The number of points that they score is shown in the distribution plot below.
The maximum points scored by any team in this group is 71 and if I’d run this analysis at the start of the season I’d have told you the odds of a team scoring 80+ points over 32 games whilst having a TSR of 0.539 was a 5,000-1 long-shot. In other words there’s statistical significance here – United are doing something to affect their performance which TSR cannot explain.
Lets slice the data another way and take see what the distribution of TSR’s look like for the spans that have yielded 70-81 points over 32 games. Obviously in an ideal world I’d use a spread of 70-90 points but the most points any team has gathered from 32 games is 81 – on three occasions by Chelsea in ’04-05 and once by United in ’08-09. This time n=106, and the mean number of points scored is ~74.
United would fit into that bottom bracket with a solitary span that yielded 71 points for Liverpool in 2001-02. This time their TSR would be 2.4 standard deviations below the mean, which is less than above but, once again, statistically significant (this time the odds it’s due to chance are around 1 in 120).
The four spans I mentioned above that yielded 81 points apiece had corresponding TSR’s of 0.670, 0.660, 0.657, and 0.651 – and there’s only one prior occasions where a team amassed >78 points with a TSR below 0.628.
I think this demonstrates just how unique United’s season has been thus far. So why isn’t TSR capturing the deviation? Well it certainly doesn’t encapsulate a teams playing style – I’ve noticed for a while that those teams with a reputation for free-flowing football under-perform compared to shots metrics, whereas others, such as Stoke, repeatedly outperform what is expected from them.
Furthermore game states has the potential to be a large factor. Ben Pugsley has been excellently documenting an effect long known of in other sports whereby teams tend to take a larger proportion of the shots when behind than they do when level or ahead, due to their desperation to score a goal. United, unsurprisingly, spend a significant amount of time in the lead, and thus their TSR is likely being suppressed to an extent because of this. I’ve long been of the opinion that game-tied-TSR would be a more effective predictor than the measure I use encompassing all game states, but it’s all the data I have.
Thirdly, as has been looked at exhaustively in the NHL, scorers at different venues have different definitions of certain statistics than others. Some aren’t even sure where fixed positions on the ice are. This could certainly translate to football, where a shot could take any number of deflections before finding its way into the net, or an errant cross ends up threatening the goal. And, hey, if you’re a goalie who knows the scorers and are looking for a better contract then maybe you suggest that he’s liberal in the definition of shot totals against so you’re sv% increases. (I’m not certain that the same people record stats at each ground for football but it would make sense in terms of a personnel location issue).
So should the model be bent to incorporate United, is this just a freak season in terms of, or is the integrity of the data we’re using fundamentally undermined as an unavoidable consequence of humans making the recordings? This is by no means an exhaustive list but they’re the questions at the forefront of my work right now.
I plan to come back and do a similar analysis for United’s sh% and sv%, they look remarkably high but should we expect them to be sustained next season and beyond?