How repeatable are goals?

This is the latest in a series of posts looking at which team level metrics are repeatable from season-to-season in the Premiership. The next three paragraphs are generic for the series, feel free to skip.

As I’ve said before, it’s all well and good knowing that team ‘x’ took 20 shots in the first half against team ‘y’, but unless we know whether the number of shots a team takes is repeatable over time then trying to put that number into context is essentially useless (a point that is made far more eloquently by Richard Whittall in this column).

Ultimately, determining how repeatable a metric is allows it to be broken down into a ‘skill’ component – something that teams can control – and a ‘luck’ component – something teams have no control over. (For the story so far see I’ve placed a summary table at the bottom of this post, which I’ll continue to update in the future). Whilst those that are dominated by luck are wonderful insofar as it’s funny to watch the media play out narratives that can be explained very simply by regression towards the mean over time, those that are dominated by skill tell us something useful about the team posting the numbers, which will be repeated season after season, and thus are the metrics we should truly be interested in.

The theory here in this series is pretty simple. I take a big group of teams and compare how well the value they record for a given metric in one season correlates to the same metric the following season, and determining the correlation coefficient (R value) of a plot with year ‘n’ on the x axis and year ‘n+1’ on the y axis allows the breakdown of a metric into skill and luck components to be established. The sample comprises of the 204 pairs of ‘back-to-back’ team Premiership seasons that have occurred since the beginning of the ’00-01 Premiership season (17 non-relegated teams per season x 12 back-to-back seasons).

This time I’m focussing on goals. Goals are clearly important in football – as I showed a couple of years back they’re highly correlated to points, and it’d be ideal if they were highly repeatable from season-to-season. At the end of the season they’re easily found and freely available in the league table so the only extra information we’d need to predictive model is the relationship between points and goals, and how far we should regress the value posted in year ‘n’ to get the best prediction for year ‘n+1’. So lets see if we can go about doing that.

First up, this is the number of goals scored by a given team in year ‘n’, and year ‘n+1’.

goals for

That’s a promising start – the breakdown is 75% skill and 25% luck. The best we’ve seen so far in this series.

Next, lets take a look at goals against.

goals against

Not quite the same – and if you check out the table below it’s a pattern that has been emerging in this series – the metric that is controlled by defense is less repeatable than the equivalent controlled by a teams attacking ability.

Next up let’s take a look at goal difference – it’d be really useful if the correlation here was good – it’s included in every league table, almost regardless of how simple they are.

goal difference

So yer, that’s kind of handy. We’re up to 83% skill and 17% luck. Basically if we look at the league table and a team seems way out of place given their goal difference (hey Newcastle ’11-12) we can reasonably expect some regression the following season.

One last thing, goal ratio, defined as goals for/(goals for + goals against).

goal ratio

So there’s a slight improvement here, but it’s miniscule. The split remains 83/17 and, unless you have a spreadsheet that will do it for you at the click of a button, it’s probably not worth the extra effort it takes to calculate.

In short though, goals are really repeatable, and really useful. Given how well they’re correlated with points, they stand a solid chance of being the basis of a great seasonal predictive model.

Finally, below is a table summarising this series so far, with each metric broken down into its skill and luck components. Skill and luck are defined therein in the context of ‘the repeatability of metric ‘x’ is ‘y%’ skill driven, and ‘z%’ luck driven at the team level over the course of a Premiership season. Click on the names of any of the metrics to be taken to the post with the relevant plots posted.

Metric % skill % luck
Goal ratio 83 17
Goal difference 83 17
Goals for 75 25
Goals against 66 34
% of total shots that are on target (%TSOT) for 53 47
%TSOT for + %TSOT against 52 48
PDO (penalties excluded) (1) 46 54
% of total shots that are on target (%TSOT) against 44 56
PDO 44 56
sh% 43 57
sv% 38 62
sh% on shots from inside the box (2) 37 63
sh% (penalties excluded) (1) 36 64
sv% (penalties excluded) (1) 32 68
sv% on shots from inside the box (2) 24 76
sv% on shots from outside the box (2) 23 77
Penalties awarded differential
(penalties awarded for minus penalties awarded against) (1)
9 91
Having penalties awarded against (1) 9 91
Penalty differential
(penalty goals for minus penalty goals against) (1)
8 92
sh% on shots from outside the box (2) 8 92
Being awarded penalties (1) 4 96
Penalty goals conceded (1) 3 97
Penalty goals scored (1) <1 >99

(1) = A special thanks to Infostrada for the data that made the original post possible
(2) = A special thanks to Dan Kennett for the data that made the original post possible


One thought on “How repeatable are goals?

  1. Have been away for a while … first thing I read of yours in the past couple of months … quick reply …

    So, does this mean that +/- is the best repeatable metric when taking into account skill?
    Hence, as you know I’ve been keeping data on a team I follow over the past few years and I’ve been keeping an eye on an individual player’s +/- (like they do in ice hockey) so an example …

    A team has a +/- of +8 after 8 games and Player A has a +/- of +10 over that spread of games (assuming constant 11v11 situations).
    Can I thus tentatively draw the conclusion that when Player A is on the field he raises (together with the other 10 players when he’s on the field) the skill of the team to raise the overall quality of the team? How much of that +2 difference is down to luck? Is it +0.34?

    I’ve incorporated the following adjustments to +/- numbers for each player …
    thus related the +/- to 90 minutes and to the expected +/- value (relating to quality of teams faced).

    Can I thus slowly start distilling an idea about player quality from the team quality aspects that you’ve covered? Thus maybe moving towards an ideal starting 11 when comparing players per position and then comparing that with what I’ve observed on the field and what my gut instinct is telling me?
    Or maybe even comparing players over the various seasons in a specific position (e.g. all right backs over the past 4 season, who has performed best) by comparing the stats and deviations for each season?

    A lot of questions, but if so, then I’m slowly getting towards my original goal … are my observations based on emotion (i.e. a think a player is good because he comes across as friendly and doesn’t whine to the ref too much or dive etc.) and how close do the observations/opinions approach the actual (numerically based) quality of that player?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s