On why you should check your data before postulating a theory

I was grabbing some numbers for another project earlier and Steven Gerrard’s attacking metrics stuck out. It appears that his game changed over the summer of 2008. Which is strange, because the manager didn’t change, and the major player brought into the club during that period was Fernando Torres, a whole year earlier. I’ll show you what I mean in the two tables below (all the metrics are taken from Gerrard’s ESPN player bio). First lets lay out the counting numbers:

Screen Shot 2013-05-29 at 1.35.39 PM

So Gerrard has basically scored as many goals since the summer of 2008, whilst requiring half as many shots on target to do so. However his total number of shots are in the same kind of ballpark. Below are the same numbers on a per game basis, along with the proportion of shots taken that are on target, and Gerrard’s conversion rate, added in:

Screen Shot 2013-05-29 at 1.35.46 PM

One more thing before I delve into this – I want to strip out Gerrard’s penalties to make sure that the numbers are just what happens in open play. The penalty data I’ve subtracted here is provided thanks to Infostrada Sports for 2001-12, and Brian T for ’12-13:

Screen Shot 2013-05-29 at 2.11.28 PM

This is Gerrard from open-play. Since the summer of 2008 he’s taken more shots but a much smaller fraction of them have gone on target. However of the conversion rate of the post-summer-2008 shots that are on target is essentially doubled. And to me this is counter intuitive, if Gerrard is getting fewer of his shots on target then presumably he’s further from goal, and if he’s further from goal you’d think his ability to aim for the area of the goal difficult for the keeper to reach would be diminished, and he’d have a lower sh% as a result.

This is something I can’t explain – having discussed it with Mihail Vladimirov on twitter it appears that in ’07-08 Gerrard was used predominantly as the first ‘1’ in a 4-4-1-1 formation, whereas in ’08-09 he was more likely to be at the centre of a ‘3’ in a 4-2-3-1, with the Kuyt and Benayoun coming in off the wing. I’d suggest that Gerrard would get fewer high-quality close in chances in ’08-09 as a result of these tactics, so we’d probably expect to see the proportion of shots he put on target decrease (which we do – from 55% in ’07-08 to 31% in ’08-09), and also his sh% to decrease. However we actually see the opposite – Gerrard’s sh% rockets from 17% in ’07-08 to 39% in ’08-09.

This is bizarre – but is this just a Gerrard thing, or is it something that spread to the other cog in Liverpool’s attacking force – Fernando Torres? Let’s split his numbers up the same way and see what happens (this data is, once more, taken from the players ESPN data page):

Screen Shot 2013-05-29 at 2.57.12 PM

So again we see a marked decrease in the proportion of shots that are on target, yet there’s not a corresponding drop in sh%. As I said above this is counter-intuitive to me, and there’s no way I can explain it, so at this point I naturally became suspicious about the data I was using.

Well lets check that – I’ve picked five players who’ve spent a significant amount of time between ’04-05 and ’12-13 at the same club, and grabbed their stats from ESPN. Their shots on target, total shots, and the sh% of the group is summarised in the table below:

Screen Shot 2013-05-29 at 3.15.07 PM

And, hey, what do you know – in 2008-09 there’s a massive drop in the proportion of shots that are classed as ‘on target’ across the group of players.

Now these are five ‘high-event’ players, so should wash out some of the noise, but they don’t represent the whole league, so I have one last thing to check – if I go through the team data I use from football-data.co.uk do I see any of this change reflected? The answer is no – whilst shots are on the rise during those seasons, the proportion of shots on target to total shots is relatively stable, and sh% is falling.

Basically it’s incredibly likely that the changes observed here are due to a change in the way the stats were recorded – and that if there was a variation in Liverpool’s tactics that altered Gerrard and Torres’ metrics there’s simply too much in the uncertainty to tell.

And there’s the lesson kids, always make sure your data is at least consistent from season-to-season before thinking you’ve found something terribly interesting. Thanks ESPN!

I’d actually like to thank Mihail Vladimirov again for his insight (and Analyse Sport for pointing me in the correct direction) – it’s a shame it couldn’t be put to more use here.


3 thoughts on “On why you should check your data before postulating a theory

  1. So it seems likely that the change is something to do with shots which never score. My first guess would be shots which hit the woodwork were originally counted as on-target but are now not, but then there probably aren’t enough of those. So perhaps it was shots which were blocked by a player other than the goalkeeper. If you stop counting them as shots on target then obviously your shots-on-target will decrease. Since all of those ones that you are now not counting as shots-on-target wouldn’t have resulted in a goal your sh% would increase.

  2. hmmm … I’ll have to go back and check the data that I’ve gathered using ESPN’s site … the players aren’t as prolific as those you have above but hey … 🙂 … not that I’ve gone back much to seasons before the 2008-09 one (time constraints).
    Anyway, I hadn’t noticed the above because the numbers were just too small (e.g. shots 10 on target 4 etc.) per season.

    cheers for the heads up!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s