Traffic has picked up again recently (much of the credit is due to @SimonGleave and associated Infostrada accounts) so I thought it was probably a good time to give a quick overview of the blog. A year or so ago I got hold of a big chunk of match by match data and started to play with it. Along the way I found out a few things, what I think makes a good team (link), that teams don’t really have a lot of control over their shooting and save percentages (sh% link, link, sv% link, link, both link) and that Blackpool fans shouldn’t be surprised they were relegated, despite their magical run at the start of the ’10-11 season (link).
A lot of the work here was influenced by the superb analysis of the NHL by Gabe Desjardins (@behindthenet). If any of you are interested in NHL Gabe has done a lot of the pioneering work and I’d strongly recommend looking at his website (link: his author handle is Hawerchuk) and advanced stats site (link).
In terms of the maths here I think most of it should be pretty easy to follow, I don’t have any stats qualifications beyond GCSE’s and I doubt anything here extends far beyond that level.
What data do I have?
In total: 12 years worth from the top four English leagues, totalling 20,000 games and ~500,000 shots. All of this is freely available at football-data.co.uk (link) where there are also numbers for numerous other European leagues.
What do I think makes a good team?
It’s ability to control the ball – especially in the attacking areas of the pitch.
Why do I think that shots are the best indicator as to a team’s ability?
To expand on the previous question, I think the best team is the one that is able to control where on the pitch the ball is. A good team will spend a lot of time with the ball in the attacking half and, as a result, will generate more shots than the opposition. As it turns out that, in terms of reproducibility, the numbers back this up. On a season by season basis, shots are the stat (of those I have) that exhibit the least regression towards the mean (link). In other words a team is much more likely to repeat it’s shot ratio than the number of goals or points it scores.
That being said the number of shots in any given game can be heavily influenced by score effects, i.e., a team will take a significantly larger proportion of shots when they’re one goal behind compared with when the score is tied (link to a great piece by Gabe). Over the course of a season though these situations even out to a certain extent – hence why the numbers are so repeatable over a sample size of a season.
As a result you’ll rarely see me do any predictions for single games, with the Manchester derby being an exception (link).
So do I think that the team taking the highest proportion of shots over a season is automatically the best team?
No, although I think there’s a decent chance they are. Controlling the proportion of shots you take seems to be about 70% skill over the course of a season (link). That’s a good start but it still leaves a pretty large unexplained component. I’d imagine completed passes, completed passes in the final third or a similar metric would do a better job – I simply don’t have the data to find out one way or the other.
Why haven’t I looked at individual players?
I did once (link, link) but that was using a small sample of easy to gather data. Other than that I don’t have any data beyond the team level. If you are willing to send any my way that would be much appreciated.
And finally: welcome