I’ve borrowed the title from this piece in the Guardian today, written by Alistair Tweedale of whoscored.com. The article looks at the number of points the teams that reach the FA Cup final score in the Premiership prior to, and after, reaching the 5th round of the FA Cup (judged as 22 games into the season). In the past five seasons (the sample size looked at) 8 teams are seen to have scored points at a slower rate in the final 16 games of the season, whereas only 2 are seen to improve. From that the author states
“In recent seasons in particular, it seems that having a run in the FA Cup has had an adverse effect on clubs’ league form. Only two of the 10 FA Cup finalists in the last five years have averaged more points per Premier League game after having made it into the fifth round – when we can reasonably start to call it a cup run – than they did before that stage. Those two teams were Wigan Athletic in 2012-13 and Portsmouth in 2009-10. Both were relegated and Portsmouth would have gone down even without their nine-point deduction.”
The link to how the article is marketed on twitter by the Guardian can be found here.
Here are the issues I have with the maths used (or not) in the article:
1. Use of a binary scale. The article judges teams based on the question ‘did they score fewer points/game than before’, and essentially assigns a 1/0 value based on whether the answer is yes/no, regardless of how many fewer/more points per game the team scored. It finds an 8/2 split. Which is interesting, and could well be something, but it’s not statistically significantly different from a 5/5 split (p = 0.18).
What happens if we do this more scientifically, and compare the points per game scored in the first 22 games to the points per game scored in the final 16 games (I’d argue this is a much more sensible method)? The teams score an average of 1.63 points per game in the first 22 games of the season, and an average of 1.40 points per game in the final 16 games of the season, though the result is further still from being statistically significant (p = 0.36).
2. Regression exists. Whilst points scored in the first 22 games of the season are highly predictive of points scored in the final 16 games of the season (R^2 = 0.80), we’d still expect to see some regression towards the mean of 1.37 points per game. In other words our best guess is that any team scoring >1.37 points per game will do worse in the final 16 games of the season, whilst any team scoring <1.37 points per game will do better in the final 16 games of the season.
What do we see? Well exactly that for 8 of the 10 teams in the sample (note 1).
Lets go a step further and use this expected regression to generate an expected points per game for each of the 10 teams in the final 16 games of their season, based on the points per game each one scored in the first 22 games of the season. We find that 8 of the 10 teams under-perform their expected points total, but when regression is taken into account the statistical significance of the observed drop in points drops further (p = 0.40).
3. Outliers. Almost half of the difference in the aggregate points per game scored by the teams in their first 22 games and their points per game in the last 16 games is due to just 2 of the ten teams. Arsenal in '13-14 lost Walcott, Ramsey, and Wilshere for a significant portion of the last 16 games, whilst I can't find as clear an explanation for Liverpool. This isn't suggested as a possibility in the article (though apparently "it is no coincidence that both [Arsenal and Hull] fell away in the league as they went on their cup runs)."
4. Sample size. In the paragraph I quoted near the beginning of the piece the author states.
"In recent seasons in particular, it seems that having a run in the FA Cup has had an adverse effect on clubs’ league form."
I think it's reasonable to read that and think that the author is suggesting that this wasn't the case in the past. And lo and behold, we only need to go back one year further than the study in the article to see that both Chelsea and Everton performed better in the final 16 games of the season than they did in the first 22. In fact, if the sample size is extended to 10 seasons (note 2) then the split of teams is 10 that performed better in the first 22 games of the season, and 9 that performed better in the last 16 games of the season (the 20th team were Swansea, who were in the Championship at the time). Obviously that is not a statistically significant result, either in terms of the number of teams that do better nor in the difference in the points per game (p = 0.61). Further, as a group those teams scored an average of 1.93 points per game in the first 22 games of the season, and 2.01 points per game over the final 16 games of the season.
Finally, the article includes the quote "Aside from clubs near the foot of the table who are fighting for their lives (and perhaps gain confidence from a Cup run), Premier League form tends to suffer from progress to the final." Amongst the teams that improved in terms of points per game in the final 16 games of the season during those extra five seasons were '08-09 CFC, '06-07 CFC, '05-06 LFC, '04-05 AFC, '04-05 MUFC.
5. Cumulative effects? If finalists see their points per game go down a lot I'm assuming that we're thinking that they go down at least somewhat uniformly. I.e., We'd see the largest effect on teams that reach the final, but also some effect on teams that reach the semi-final, and a smaller effect on the teams that reach the sixth round, etc. That isn't looked at or addressed in the article. I'm not going to address it here as it's a lot of work and I doubt we'd be able to discern much (if any) effect, but it would provide solid backup evidence to the original article if it were shown.
6. Summary. So I guess the takeaway from the original article is that in the past five seasons an FA Cup run has been bad for teams. Although the work here suggests that, if there's such a thing (and it seems fairly unlikely) it's fairly recent, because 6-10 seasons ago the teams that did well in the cup did better in the league in the last 16 games of the season. And the sample size in the original study is really small. And it's not that much bigger here. So take from it what you will, I guess.
For the record this is all stuff that is fairly easy to clean up. It took me maybe 20 minutes to do the numbers side of this post, and I’d hope the database available to the author is more user friendly than the excel sheets that I’ve fairly messily merged together over the years.
Note 1. In the past five season 57 of 100 teams have demonstrated this behaviour, which is almost exactly the same proportion as for the past 14 seasons. So eight is maybe a couple more than we'd expect to see, but this group of teams appears to feature a higher proportion of teams that score an extreme number of points than an average group of teams, so it doesn't really surprise me
Note 2. For transparency I picked 5 further seasons as it gave me a sample size of ten seasons, which is a round number (as I suspect the author of the original piece did with 5) however, lest it also look like I cherry picked I just went back and checked and in what would be the 11th season of the analysis MUFC scored more points per game in their first 22 games than the final 16 games. This, however, doesn't really change any of the conclusions in this post, unlike the effect that the sixth season would have had on the original piece.