Peter LeÖnard sent me some plots on twitter and I invited him to turn the numbers into a guest post:
It has been shown extensively that a team’s share of total shots, total shot ratio (TSR), is one of the most repeatable parameters in football giving only around 14% regression towards the mean from season to season.
What this means is that knowing a team’s previous TSR gives a good indication of their TSR in the near future. However, does a high TSR relate to a successful match outcome or is it simply just a repeatable statistic?
To investigate this I have taken data from the past 14 years giving over 5000 matches (or samples) to analyse. First of all I separated the data based on whether the match outcome was a Home Win, Away Win or Draw. We then calculated the TSR (from the home team’s perspective) for every game and created a histogram of TSR distribution using equi-width buckets, each with a TSR range of 0.05.
The result, shown in the figure above, shows us the % of time that a match with a certain TSR results in a Home Win, an Away Win or a Draw.
On initial inspection, the linear correlation between an increase in the home team’s TSR and the number of home wins looks promising (Home win Trendline Regression R² = 0.31). However the extremities of the TSR (towards 0 and 1) are clearly erroneous, likely due to a lack of data.
In fact, our data only shows four matches in which the Home team have registered a TSR of less than 0.1:
• 25/11/2000 : Derby 0 – 3 Man United
• 18/11/2006 : Middlesbrough 0 – 0 Liverpool
• 23/02/2008 : Birmingham 2 – 2 Arsenal – (The game with that tackle on Eduardo)
• 20/11/2010 : Birmingham 1 – 0 Chelsea
The standard error of a percentage can be estimated based upon the sample size and we have calculated this and shown it in the graph below as error bars. The majority of matches have a TSR of between 0.5 and 0.7 and so the sample size here is statistically significant.
Removing any data points with an estimated error of greater than 10% gives a much better looking graph. (Homewin Trendline Regression R² = 0.97)
There are a couple of interesting points to note from this graph. The first being that in order for the Away Team to have an equal chance of winning the match they need to have a greater share of the total shots. This could be a statistical anomaly, but I postulate that it is the mentality of the away team which causes them to take their shots from less prime positions than the home team might create for themselves.
The other interesting thing to note is the huge peak in Away wins when the TSR is around 0.25-0.3. At first it would seem that this could be a statistical anomaly, however there have been 120 matches within this TSR range and 73 of them have ended in an Away Team win. This is statistically significant. It is also clear, noting the corresponding dip in Draws, that almost all of these additional wins are coming from matches which we would otherwise expect to be a Draw. This is likely to suggest that a TSR of 0.25-0.3 is the critical point where an Away Team is dominating but the Home Team (probably because they are at home, and you must win your home matches don’t you know!) still chases the win. If the Away Team dominates even more then the Home Team parks the bus and if the Away Team dominates less then the Home Team can compete.
Of course, this is postulation and it could be something else entirely (please do share your thoughts!).