A second look at the Infostrada penalty data, and stripping them from the football-data.co.uk numbers (part I)

This was planned to be a short post but I keep finding interesting nuggets and no-one in their right mind would read the whole thing in it’s entirety, so I’ll split it up somewhat.

I think we are all aware that the data on record at football-data.co.uk isn’t perfect. I certainly have more major misgivings about using it for smaller sample sizes, and particularly the numbers from lower leagues, than I would with data from other providers. It does, however, have two things going for it – it’s cheap enough for a grad student to afford (i.e. it’s free), and there’s tons of it. now given enough data I think that a lot of the quirks will work themselves out* – and finding strong season-to-season correlations within the data suggests this is the case, so until something better comes along I’m happy to use it once a sample is large enough.

One thing I find particularly irksome is that penalties don’t count towards as shots, and in theory this is going to alter a teams metrics. Whilst it’s only going to alter a teams TSR by a maximum of 0.5% (eg changes from 0.607 to 0.606, or 0.442 to 0.444 were about the most egregious examples I could find in a given season), there’s more scope for the luck driven metrics, such as sh%, sv%, and PDO (this is the seminal piece of that series), to be shifted.

Fortunately the nice people at Infostrada (whose website may be found here) provided me with a ton of penalty data last year that I used for a series of posts (a summary of which may be found here), and I can use that data to strip successful penalties from the football-data.co.uk numbers. (There’s no need to strip missed/saved penalties as they aren’t recorded in either the goal or shot columns).

Why does this matter? Well penalties are a different animal to your average shot. According to the football-data.co.uk data, the average shot on target in the Premiership has a ~20% chance of being scored, whereas penalties are scored, pretty uniformly, 78% of the time. So if a team has an unusually high proportion of shots being taken as penalties then they’ll be getting a boost in their sh%. Obviously the opposite will occur to a teams sv% if they a larger number of shots against them are penalties.

I had a good stab at determining whether the ratio of penalties a team received was related to TSR in the first plot here, and found that there was a strong general trend (albeit with a couple of notable exceptions). This time I’m going to go about it in a different way and see how the answers compare.

I’ve taken each of the 38 teams that were in the Premiership between 2001-02 and 2011-12, and ordered them in terms of their TSR over that span. (The sample consists of ~1,000 penalties, ~11,000 goals, ~100,000 shots on target, and ~200,000 shots in total). I’ve then gone down the list and created four bins of teams – with each bin containing as close to 50,000 shots as possible. As such bin 1 contains the best teams (those of highest TSR), and bin 4 contains the worst (those of lowest TSR). A table of the shots and penalties for each bin is below. To make the writing large enough to be legible I’ve substituted ‘+’ in for ‘for’, and ‘-‘ in for ‘against’ **. Click on the table for a larger version.

Screen Shot 2013-05-16 at 7.38.30 AM

To me that table reads pretty interestingly. The best teams take 61% of the shots in their matches, and are awarded 61% of the penalties. That makes perfect sense to me, and means that (outside of possibly United – see the plot in the prior link) you’re going to have to make a hell of a case if you want to argue that big teams get a greater proportion of their penalty appeals awarded than smaller teams.

Bin 3 also lines up nicely (47% in each category), however this isn’t the case for bins 2 and 4. Looking at it you could even suggest, given that the proportion of penalties won by the teams in bin 2 is low by roughly the same amount as bin 4 is high, that the ‘awful’ teams (bin 4) are being awarded penalties at the expense of ‘good’ teams (bin 2).

And that was my first thought too. Plus, a 2-3% difference between TSR and % of pens + seems like it could add up to a significant amount. But is that the case, and how does this translate into points in the league? Lets calculate how many of the penalties we’d expect the team in each bin to earn, based solely on their TSR:

Screen Shot 2013-05-16 at 7.47.01 AM

Given that each bin is comprised of ~55 team seasons we can conclude that, as a group, the teams finishing between 6th and 10th in the table (represented by bin 2) are sold short to the tune of ~2 penalties per season. ~2 penalties X 78% conversion rate X ~0.6 points per goal = ~1 point. Split that five ways between the teams that finish 6th-10th and it comes to ~0.2 points per team. So whilst it may well be a real effect it’s not something they should be losing sleep over, especially given that most (>66%) of the benefit is felt by teams who they aren’t in direct competition with (those at the bottom of the table).

I’d like to look into this effect further (i.e. whether a team in bin 4 is more likely to get a penalty in a game against a team in bin 2, and vice versa) however it’d take a ton of work, and if we slice a sample of 1,000 penalties 16 ways we’re looking at a sample of ~60 penalties. A sample that size is unlikely to yield a statistically significant result.

So what can we conclude? Well, whilst there are outliers in terms of individual teams, penalties are pretty well correlated to TSR. As such we’d expect to see pretty similar drops in sh%/climbs in sv% for each team when the penalties are stripped out of the football-data.co.uk numbers. That’s something I’ll take a look at next time.

*There are innumerable ways to test this but I simply don’t have the time

**The eagle eyed amongst you will have spotted that the ‘shots’ columns don’t add up. I’ll go through and find the discrepancy at some point but it’s <200 shots out of 200,000 – I doubt that <0.1% of the data would shift the results much in this case.


3 thoughts on “A second look at the Infostrada penalty data, and stripping them from the football-data.co.uk numbers (part I)

  1. Pingback: The State of Analytics: Shot Analysis—the bridge between analytics and football tactics? | Counter Attack | Blogs | theScore.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s