## A second look at the Infostrada penalty data, and stripping them from the football-data.co.uk numbers (part II)

So in part I of this series I used some penalty data provided by Infostrada (twitter, website) and found that the distribution of penalties within the Premiership was pretty well linked to TSR, which is also something I’d found on a team-by-team basis, with a single notable outlier, in the first plot here.

Now I’m going to move forward and look at what effect stripping penalties from a teams goal totals (both for and against), has on the relationship between TSR and sh%, sv%, and PDO. Why am I using TSR as a measure of how good a team is? Because it gives an indication of how often a team is in control of the ball in the attacking areas of the football pitch, and becuase it is remarkably repeatable from season-to-season (it breaks down as 87% skill and 13% luck over the course of a Premiership season), much more so than points (~63% due to skill over the course of a season), which makes TSR a great measure of a teams true talent.

We know that there is no relationship between how good a team is and the rate they convert or save penalties (third and fourth plot here), and that there’s no individuals who convert or save penalties at a statistically significant rate. Furthermore, as shown in the part I, penalties are pretty well linked to TSR. Given a combination of these two statements we’d expect that the relationship between TSR and the percentage driven metrics should stay relatively stable when penalties are stripped out. Is this the case?

The following plots feature a point for each team season in the Premiership that have played in the Premiership between ’00-01 and ’11-12 (n=240).

Sh%

Firstly, a plot of ‘traditional’ sh% vs TSR, with penalties included.

There’s a slight upward trend present here: linear regression suggests that the very best teams have a sh% of ~23%, compared to 20% for the very worst teams. The takeaway point, however, is that the correlation is very poor. Let’s see if this persists when we remove penalties

So the correlation improves, but by a pretty tiny amount. In this case the very best teams have sh% of ~22%, compared to ~18% for the very worst teams.

Sv%

Once more, a plot of ‘traditional’ sv% vs TSR

The upward trend persists but the difference between the best and worst teams is, once again, small (81% v 77%). The correlation between the two is, again, miniscule.

Once more, the correlation is improved, but marginally, and is still exceptionally low. This time the very best teams have a sv% of 82%, compared to 79% for the very worst teams.

PDO