A second look at the Infostrada penalty data, and stripping them from the football-data.co.uk numbers (part II)

So in part I of this series I used some penalty data provided by Infostrada (twitter, website) and found that the distribution of penalties within the Premiership was pretty well linked to TSR, which is also something I’d found on a team-by-team basis, with a single notable outlier, in the first plot here.

Now I’m going to move forward and look at what effect stripping penalties from a teams goal totals (both for and against), has on the relationship between TSR and sh%, sv%, and PDO. Why am I using TSR as a measure of how good a team is? Because it gives an indication of how often a team is in control of the ball in the attacking areas of the football pitch, and becuase it is remarkably repeatable from season-to-season (it breaks down as 87% skill and 13% luck over the course of a Premiership season), much more so than points (~63% due to skill over the course of a season), which makes TSR a great measure of a teams true talent.

We know that there is no relationship between how good a team is and the rate they convert or save penalties (third and fourth plot here), and that there’s no individuals who convert or save penalties at a statistically significant rate. Furthermore, as shown in the part I, penalties are pretty well linked to TSR. Given a combination of these two statements we’d expect that the relationship between TSR and the percentage driven metrics should stay relatively stable when penalties are stripped out. Is this the case?

The following plots feature a point for each team season in the Premiership that have played in the Premiership between ’00-01 and ’11-12 (n=240).


Firstly, a plot of ‘traditional’ sh% vs TSR, with penalties included.

There’s a slight upward trend present here: linear regression suggests that the very best teams have a sh% of ~23%, compared to 20% for the very worst teams. The takeaway point, however, is that the correlation is very poor. Let’s see if this persists when we remove penalties


So the correlation improves, but by a pretty tiny amount. In this case the very best teams have sh% of ~22%, compared to ~18% for the very worst teams.


Once more, a plot of ‘traditional’ sv% vs TSR

The upward trend persists but the difference between the best and worst teams is, once again, small (81% v 77%). The correlation between the two is, again, miniscule.


Once more, the correlation is improved, but marginally, and is still exceptionally low. This time the very best teams have a sv% of 82%, compared to 79% for the very worst teams.


‘Traditional’ (penalties included) PDO

So in this case we have a terrible correlation coefficient, however it is significantly higher than those registered by both sh% and sv% (someone should do a study as to why that is). The very worst teams are racking up a PDO of 970, vs 1040 for the very best teams. And finally, PDO with penalties stripped out:

No real surprise here. We saw small shifts upwards in the correlation coefficients of sh% and sv% when penalties were removed, and this is reflected here. It’s still pretty damn low though. In this case the PDO’s of the very best and very worst teams remain unchanged compared to when penalties are included, at 1040 and 970.

So what’s the takeaway message here? Well, although removing penalties gives us a slightly better idea as to a teams true sh, sv%, and PDO, it doesn’t actually achieve all that much, and all three remain correlated very poorly to a teams true talent even once penalties are removed.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s