Excellent question from Jacob Frankel on twitter:
@JamesWGrayson question you might be able to answer: are goals less predictive of points if you exclude penalty goals, or more?
— Jacob Frankel (@jacob_frankel) November 2, 2013
Given the data I have there are three ways I can think of to study this:
1. Look at how goals this season are correlated to points this season
2. Look at how goals this season are correlated to goals next season
3. Look at how goals this season are correlated to points next season
To cover all of the bases I went ahead and looked at all three. In each of those statements ‘goals’ is a placeholder that may be replaced by goals for/against/difference/ratio, each of which may be with and without own goals or goals from penalties being included.
I’m utilising a dataset given to me by Dan Kennett. It contains team numbers for the Premiership between the ’08-09 and ’11-12 seasons – giving a total of 80 team seasons. As intimated above I’m going to look at 16 metrics in total – goals for, goals against, goal difference, and goal ratio (the % of goals a team scores in its matches), with each of those being further split into their raw total, their raw total minus goals scored from penalties, their raw total minus own goals, and their raw total minus both goals from penalties and own goals.
1. How well are goals this season correlated to points this season?
This one should be the most obvious – I think. Goals, however they are scored, directly translate into points in the table – we’re not looking at anything predictive here, simply what has happened in the past. In theory, the more we know about the number/ratio of goals being scored in a given teams matches, the more we should know about the number of points that team has scored.
Now because there are 16 metrics I’m not going to publish the plots – what’s recorded in the table below is the R^2 when I plot ‘points in season ‘n” against ‘metric in season ‘n”. To make it a little easier on the depict I’ve colour coded each individual column – obviously a higher correlation (green) is best, and lower correlations (yellow – orange – red) are less so.
So what to take from this plot? Moving from left to right we can see that goal difference and goal ratio both have a stronger correlation with points than goals for and against do on their own. I think we’d have guessed that but it’s good to establish.
Next lets move down the rows. Firstly – taking out penalties doesn’t make a huge difference to the correlations, but certainly doesn’t improve them. Removing own goals is even more detrimental, and taking away both generally results in the worst correlation. As I mentioned above – I think this makes perfect sense.
2. How well are goals this season correlated to goals next season?
To me this is where the study gets more useful – namely how predictive these metrics are of themselves from season-to-season. The table below is once again a summary of R^2 values, but in this case it’s a correlation of ‘metric in season ‘n” against ‘metric in season ‘n+1”. The results here are interesting, first I’m going to present the table coloured by individual columns as above.
For now lets ignore goals against and compare what we see in the other three columns. In each case removing goals from penalties is a bad thing, whereas removing own goals doesn’t really have an impact. I think I can rationalise this – penalties are awarded when the attacking team has the ball in a dangerous position on the pitch, one in which a high quality chance to score is likely to originate. This doesn’t necessarily have to be true for own goals – sure, there are plenty of examples where dangerous crosses are deflected into a defenders own net, but there are also the back-passes that bounce over the ‘keepers foot, or the speculative ball over the top that Richard Dunne’s unwitting knee will knock past David James. In short I think penalties are more driven by attacking skill than getting the opposition to score own goals for you. There’s some other interesting stuff I’ve dug up about own goals but it doesn’t really belong in this post so I’ll save it for now.
Back to the table above, and goals against kind of stands out – in particular the fact that if own goals are removed then goals against become more predictable than the raw number. I think if we can agree that own goals are at least relatively randomly distributed then, by removing own goals we’re seeing a more accurate reflection of a teams true defensive ability – it’s interesting that this more apparent when looking at goals against than goals for though. Suggestions as to wy are welcome. Also – at some point I should see what drops out when own goals are stripped from sh%/sv%.
Finally, the table below is the same as the one above, except with the colouring no longer being column specific, to show the relative strengths of the correlation.
Once again goal ratio and difference sit as better predictors than simply goals for or against, which marries with what I’ve seen previously when used a larger sample size to look at the repeatability of goals.
3. How well are goals this season correlated to points next season?
The initial question did ask about how predictive goals were of points, so lets use a final ‘this-season-next-season’ regression to see whether own goals and penalties alter the predictive ability of goals this this season on points scored next season.
*As an aside I’ve adopted the ‘this-season-next-season’ phrase used in Phil Birnbaum’s latest post. There are many things I enjoy about his blog, but his ability to use very accessible language to discuss complex mathematical ideas is right up there.
The table below is the resulting R^2 when the metric this season is plotted against points next season. The colouring is across the whole table to show the relative strengths of the correlations.
The first thing to say is that goals do a pretty good job of predicting the number of points a team will score next season. Beyond that there may be a tiny bit of information added if you strip goals from penalties and own goals from goals for and goals against, but they’re marginal improvements – if I can find a more compelling case maybe I will take the effort to in the future.
Secondly, the numbers for ‘excluding penalties’ and ‘excluding both’ here appear to run against those reported in section 2. If removing goals from penalties is bad if we want to predict a metric next season, and that metric is highly correlated to points, then how come excluding penalty goals is useful when predicting next seasons points? I’ve tossed this idea a couple of times in my head and I’m not sure I have a compelling answer. I’ve previously shown that goals from penalties are very unrepeatable from season to season – maybe that has something to do with it? I’m open to suggestions here – it’d be nice to have a rational explanation for what we’re seeing.
In summary there are occasions where removing own goals would be useful – but only really if we’re looking at simply goals for or against. Removing goals from penalties seems less useful in all cases, they’re simply indicative of an attacking action by the team who was awarded and scored the penalty. Removing both is a little less certain. It still removes the attacking actions required to win penalties, and impacts the predictive ability of a metric from this season to next, but doesn’t appear to massively impair the prediction of points next season.