There’s no plots ahead to look at so I’ve bolded the important stuff. Feel free to skim.
Last year Ravi Ramineni used the MCFC analytics dataset to look at the relationship between final third passes and goals scored in the Premiership. Yesterday Mark Taylor replicated that work and then took it one step further, looking at how well final third touches*touches in the ’11-12 season predicted goals scored in the ’12-13 season. At first glance it looked like they did a pretty good job, but it;s worth putting it into context.
Yesterday I looked at how closely we’d be able to predict the number of goals a team would score in a given season, and found that the random variation of goals in the Premiership was 7.63 goals per season. As we’re only looking in particular at the ’12-13 season I calculated the variation for just that, and it’s 7.77 goals.
In other words, if we’re building a model by which to determine the number of goals a team will score in season ‘n+1’ then we want to be looking for one that gives a standard deviation as low as possible, and the closer to 7.77 goals the better.
So lets start from no knowledge and work our way up. First, lets look at a model that assumes that the goals in ’12-13 would be evenly distributed among all of the teams in the league – truly the dumbest model we could put together. The standard deviation in that case comes to 14.46 goals. In other words we should expect a standard deviation below that of 14.46 goals for every model into which we input some knowledge.
Ok so now that the bounds are set lets start with some knowledge, but the minimal amount – by assuming that each team scores the same number of goals in year ‘n+1’ as year ‘n’. In that case the standard deviation is 11.42 goals, meaning that it bridges 45% of the gap between zero knowledge and a perfect model. Not a bad start.
Next, lets regress those goal totals towards the mean. The R2 value between goals scored in year ‘n’ and goals scored in year ‘n+1’ is 0.868, meaning that we should regress the goals in year ‘n’ 24.7% of the way towards the mean for year ‘n+1’. Adding this knowledge reduces the standard deviation to 9.79 goals, meaning that it bridges 70% of the gap between zero knowledge and a perfect model. That’s a nice little improvement for saying we haven’t added all that much knowledge.
So finally, lets look at the discrepancy between expected goals and actual goals from final 3rd touches* as reported by Mark Taylor yesterday. Well now the standard deviation comes down to 6.50 goals. In other words it bridges more than the entire gap between zero knowledge and a perfect model, 119% of the gap in total. In short this is a remarkably promising start for a metric.
How to explain that the model appears to be better than a perfect model? Well luck means that occasionally this will happen, there’s an explanation of why in one of TangoTigers comments here. Furthermore, as mentioned in my piece yesterday, home advantage means that the 7.77 goals standard deviation of a perfect model is higher than the true value for a perfect model. Either way, Mark’s post has given a surprisingly impressive finding.
*17AUG13 – as pointed out by Mike Goodman – Mark Taylor looked at final third touches, not final third passes. Post updated as such.