Traffic has picked up again recently (much of the credit is due to @SimonGleave and associated Infostrada accounts) so I thought it was probably a good time to give a quick overview of the blog. A year or so ago I got hold of a big chunk of match by match data and started to play with it. Along the way I found out a few things, what I think makes a good team (link), that teams don’t really have a lot of control over their shooting and save percentages (sh% link, link, sv% link, link, both link) and that Blackpool fans shouldn’t be surprised they were relegated, despite their magical run at the start of the ’10-11 season (link).

A lot of the work here was influenced by the superb analysis of the NHL by Gabe Desjardins (@behindthenet). If any of you are interested in NHL Gabe has done a lot of the pioneering work and I’d strongly recommend looking at his website (link: his author handle is Hawerchuk) and advanced stats site (link).

In terms of the maths here I think most of it should be pretty easy to follow, I don’t have any stats qualifications beyond GCSE’s and I doubt anything here extends far beyond that level.

What data do I have?

In total: 12 years worth from the top four English leagues, totalling 20,000 games and ~500,000 shots. All of this is freely available at football-data.co.uk (link) where there are also numbers for numerous other European leagues.

What do I think makes a good team?

It’s ability to control the ball – especially in the attacking areas of the pitch.

Why do I think that shots are the best indicator as to a team’s ability?

To expand on the previous question, I think the best team is the one that is able to control where on the pitch the ball is. A good team will spend a lot of time with the ball in the attacking half and, as a result, will generate more shots than the opposition. As it turns out that, in terms of reproducibility, the numbers back this up. On a season by season basis, shots are the stat (of those I have) that exhibit the least regression towards the mean (link). In other words a team is much more likely to repeat it’s shot ratio than the number of goals or points it scores.

That being said the number of shots in any given game can be heavily influenced by score effects, i.e., a team will take a significantly larger proportion of shots when they’re one goal behind compared with when the score is tied (link to a great piece by Gabe). Over the course of a season though these situations even out to a certain extent – hence why the numbers are so repeatable over a sample size of a season.

As a result you’ll rarely see me do any predictions for single games, with the Manchester derby being an exception (link).

So do I think that the team taking the highest proportion of shots over a season is automatically the best team?

No, although I think there’s a decent chance they are. Controlling the proportion of shots you take seems to be about 70% skill over the course of a season (link). That’s a good start but it still leaves a pretty large unexplained component. I’d imagine completed passes, completed passes in the final third or a similar metric would do a better job – I simply don’t have the data to find out one way or the other.

Why haven’t I looked at individual players?

I did once (link, link) but that was using a small sample of easy to gather data. Other than that I don’t have any data beyond the team level. If you are willing to send any my way that would be much appreciated.

And finally: welcome

## 2 thoughts on “About the blog”

1. Hi,

Recently discovered your blog. Some good stuff here.

Short Intro;
I’ve been keeping stats of a La Liga team I follow over the past seasons, I get most of the data from ESPN’s Soccernet … they are quite lacking, but I haven’t found any places with more detailed data (unless I’m willing to fork out a ton of cash for a hobby).

Anyway, I’m in the process of incorporating some of the stats you’ve mentioned on your blog (PDO, TSR) into what I’ve got. Can’t wait to see how that ends up.

What would you consider a decent sample size for soccer stats? … I’ve done the team for the past three years and players for their times at the club for as far back as possible.
Maybe I could mail you a sample of the XLS that I’ve put together … It’d be great if you could give me a tip in which direction to go with regards to individual players.
I’m trying to link (team) results in with (static) tactics and get a better breakdown of positions … more along the lines of baseball, e.g. how effective is an outfielder in CF compared to LF/RF? But I’m not sure how detailed I should go into a position breakdown before it all starts getting too vague (also the sample sizes will get really small) or if this is even an interesting step to make.

It is all fairly basic and my education level isn’t that high (high school to be exact) though my life experience is getting on (mid 30s) and I’m keen on these kinds of things. I’m also trying to figure out what makes me like a player that others might not like on first glance.

Also, maybe you could recommend a book on sports statistics or statistics in general that will help me move a step forwards.

Cheers and hope to hear from you,
b.

• Hi Bart, nice to hear from you.

A relevant sample size is tough to define. I don’t think one season is enough and I know I’m not alone with that train of thought (I believe Euro club index uses three seasons for their rankings but I stand to be corrected) however there are a lot of teams that change dramatically over two or three seasons so going to far down that road isn’t necessarily the right move either.

I think it’ll be tough to derive relative weightings for different positions without the advanced stats that cost cold hard cash, but I’m definitely interested to see what you’ve got and where it can go. If you’d like to send me a copy of the xls files I can be found at jim [underscore] grayson [at] hotmail [dot] com. If you’re looking to get hold of more La Liga stats check out football-data.co.uk – it looks like they have match stats going back six years.

Regards

James