Methodology and validation of the Team Ratings


1. Introduction
2. Data
3. Calculations and Results
3.1 Determining the coefficients
3.2 How does the Team Rating perform compared to TSR?
3.3 How does the Team Rating perform compared to xG?
3.4 How well does the Team Rating represent the table at the end of the season?
4. Discussion
4.1 How early in the season can it be used?
4.2 Outliers
4.3 The effect of a shot on a teams rating
5. Appendix

1. Introduction

For a while now I’ve been playing with ways of improving raw TSR to find a metric that is more predictive of what will happen in the future by capturing the parts of the game that TSR misses. That being said, TSR on its own has proven very useful in a predictive capacity – based solely on knowing two numbers about a team from a prior season it does a good job of estimating the number of points that team will score next season. It does, however, have obvious flaws. As has been mentioned innumerable times it treats all shots as equally useful, and this patently isn’t true. For a second it takes no account whatsoever of score effects, which both Benjamin Pugsley and Sander Ijtsma have shown skew the number of shots taken by each team, as well as the quality of those shots.

One answer to this has been the development of Expected Goals (notated as xG from here on). Michael Caley here and Richard Whittall here do a solid job here of summarising the current state of the work being done in that area. Michael has demonstrated well that his xG model does a better predictive job than TSR, whilst also publishing his methodology (see here). Other models also have plots published showing their superiority to TSR in various ways, however the methodology behind those models isn’t available for peer review. For example, there’s a post from Sander IJtsma here and a tweet from Daniel Altman here.

I think with increasing knowledge xG has the potential to be a great metric, and encourage the continuing developments and refinements (as well as the publishing of work for peer review), however I don’t think it’s reached the stage yet where doing anything other than xG has no value. I also think there is probably a simpler method to improve upon TSR without the complexity that the calculation of xG brings. Thus I’m setting out to see how good of a predictive metric can be put together using just six pieces of information from each Premiership game played.

Really what we’re looking for is a measure of the proportion of goals scored by a team in it’s games. If a team increases the proportion of the goals it scores in its matches over the long run it will score more points. To my mind there are three components that go into whether a team scores more goals than it’s opposition over the course of a given number of games. I’ll define each of those components now, and then discuss how important each of them are later in the post. I’ll also add that I’m aware that each of these three statistics can be split up into attacking and defensive components, and it’s something I may look into in the future.

TSR – the proportion of shots a team takes in the matches it plays. As mentioned above it’s a very useful predictive measure in it’s own right, despite it’s obvious flaws. It is calculated as Total shots for/(Total shots for + Total shots against). A league average team would have a TSR of 0.500

%TSoTt – it’s a horrible acronym that I’d really like suggestions for changing but this measures the relative propensity for a team to take shots that are on target and force it’s opponents to take shots that don’t go on target. A team may take more shots on target because it’s gets into better positions to shoot or because it has better players. The calculation is as follows: Shots on target for/Total shots for + Shots off target against/Total shots against. A league average team would have a %TSoTt of 1.000.

PDO – this measures the relative propensity of a team to score shots on target that it takes, and save shots on target that it concedes. As for %TSoTt a higher PDO may be due to the quality of chances that a team generates or the goal-scoring quality of the players it has. It is calculated as 1000*(goals for/shots on target for + (shots on target against – goals against)/shots on target against). A league average team would have a PDO of 1000.

So we’re basically combining ‘team a takes x% of the shots in its game’, ‘when team a takes/concedes a shot it has a y% chance of going on target’, and ‘when team a takes/concedes a shot on target there’s a z% chance that it results in a goal’. Or, mathematically, the Team Rating is expressed as TSR*%TSOTt*PDO.

2. Data

The source of the data is and I’m using data for the 2000-01 – 2013-14 Premiership seasons. I’ve taken the matches for each of the teams to have played in the Premiership in that period and ordered them chronologically. From there the methods diverge depending upon the study, and I’ll go into more detail at the beginning of each section. One thing to mention is that a fair number of these studies will include samples that cross seasons. For example let’s take a team that is in the Premiership for multiple seasons. For a study of how well a team does ‘x’ in a proceeding set of ten games to how well it does ‘x’ in the succeeding ten games there’ll be times when a sample will contain, for example, games 31-40 and games 41-50. For the first set of numbers games 31-38 are in season 1 and 39-40 are in season 2. I expected this to lead to weaker correlations, however the effect of this is surprisingly minimal for the three metrics that comprise the Team Rating (see appendix 1).

3. Calculations and Results

As mentioned in the introduction the basic equation to calculate the Team Rating is expressed as TSR*%TSOTt*PDO. However, the three of these components shouldn’t necessarily be weighted equally (the weightings will also vary by league but here I’m going to concentrate on the Premiership). For example, we know that TSR is a metric that is highly repeatable (and thus skill based), whilst both %TSoTt and PDO have a smaller skill component. As such TSR should be weighted more heavily in our calculation of a teams skill than the other two metrics.

For example (see Section 3.1 for working) it turns out the calculation of a teams rating over a 38 game sample is calculated as follows:

Rating = (0.5+(TSR-0.5)*0.732^0.5)*(1.0+(%TSOTt-1.0)*0.166^0.5)*(1000+(PDO-1000)*0.176^0.5)

So to determine the Team Rating for a team over its past 38 games we keep about 85% of the deviation from the league average value of its TSR, but just 41% and 42% of the deviation from the league average values for %TSoTt and PDO, respectively. (This marries well with work that I’ve done previously).

Over a 38 game sample Team Ratings range from 363 (Derby ’07-08) to 695 (Chelsea ’08-09). To make them easier to understand I’ve rescaled these linearly to a 1-10 scale, through the calculation:

(Team Rating – 363) / (695 – 363)

As such, the Derby ’07-08 season has a Team Rating of 0.00, whilst the Chelsea ’09-10 season has a Team Rating of 10.00.

3.1 Determining the coefficients

All of the Premiership games played by a given team are ordered chronologically. The TSR, %TSoTt, and PDO are calculated after each game. The correlation between the value of the metric for all of the teams in games ‘1-x’ and the value of the metric in games ‘x+1 – 2x’ (e.g., TSR in games 1 – 3 and TSR in games 4 – 6) was determined. The correlation coefficients (R values) are plotted for each of the metrics individually below:

Screen Shot 2014-08-09 at 11.12.02 AM

The Team Rating is calculated in the form (0.5+(TSR-0.5)*x^0.5)*(1.0+(%TSOTt-1.0)*y^2)*(1000+(PDO-1000)*z^0.5). The relevant values of x, y, and z after a specific number of games are read off the plot.

3.2 How does the Team Rating perform compared to TSR?

As TSR is the first bar to clear I’ll quickly show why the Team Rating is an improvement on TSR. As in Section 3.1, here’s the correlation (R^2) between the value of a metric in games 1 to x compared to the value of the metric in games x+1 to 2x:

Screen Shot 2014-08-09 at 11.14.48 AM

Secondly lets look at a plot that correlates TSR and the Rating to points scored in the remainder of the season, i.e. TSR/Team Rating in a teams first 3 games of the season vs. the number of points that team scores in the final 35 games of the season:

Screen Shot 2014-08-10 at 5.43.40 PM

Thirdly, the table below shows the correlations (R^2) between both points/goal difference and Team Rating/TSR.

Screen Shot 2014-08-09 at 11.41.30 AM

So to summarise – the first plot here tells us that the Team Rating is more predictive of itself over time than TSR is predictive of TSR, the second plot tells us that the Team Rating is more predictive of the number of points that a team will score over the remainder of the Premiership season than TSR is, and the Table tells us that a teams Team Rating this season is a) more strongly correlated to the points and goal difference that a team records this season, and b) more strongly correlated to the number of points and goal difference that that same team will record next seeason.

3.3 How does the Team Rating perform compared to xG?

Now that the first hurdle has been cleared lets move on to look at how the Team Rating performs in relation to Expected Goals. Michael Caley has done a good job of validating his Expected Goals model, and the results are published here and here. Essentially for each of his studies he’s taken 9 samples of ‘half Premiership seasons’ (2 half seasons in each of the x seasons ’09-10,’10-11, ’11-12 and ’12-13 seasons, and the first half of the ’13-14 season). The value of a metric in ‘this sample’ has then been compared to the value of the metric in the ‘next sample’ (as per my methodology in Sections 3.1 and 3.2).

First a note about Michael’s studies – rather than splitting the season by the number of games played by each team, it has been done by date. As a result some teams play, for example, 18 games in the first sample and 20 games in the second sample. For my studies each season has been split so that each team has exactly 19 games in each sample. As such our two studies don’t agree exactly on a couple of numbers even though it would appear we’re doing the same thing. Thus for clarity I’ve reported all of the coefficients from Michael’s studies and all of the results from mine.

Essentially Michael’s studies can be split into two parts. The first is the correlation of Goals, TSR, and xG to Goals. I’ve repeated the study and looked at the correlation of Goals, TSR, and Team Rating to Goals.

Nb: GF = Goals For, GA = Goals Against, GR = Goals Ratio (GF / (GF + GA))

Screen Shot 2014-08-11 at 12.23.00 PM

First a quick jargon buster – broadly speaking the better the predictive power of a metric the higher the R^2 value, and the lower the values of the RMSE (Root mean square error) and MAE (Mean average error). To take an example from the top left of the above table, Goals For in ‘sample 1’ have an R^2 to Goals For in ‘sample 2’ of 0.371. This is lower than both the R^2 of TSR in ‘sample 1’ to Goals For in ‘sample 2’ (0.421), and xG in ‘sample 1’ to Goals For in ‘sample 2’ (0.422). In turn the RMSE and MAE are higher for Goals For in ‘sample 1’ to Goals For in ‘sample 2’ than those for TSR in ‘sample 1’ to Goals For in ‘sample 2’, and xG in ‘sample 1’ to Goals For in ‘sample 2’. In other words Goals For isn’t as good a predictor of future Goals For than either TSR or xG.

So what does this table tell us? Well for both of the studies xG and the Team Rating are better predictive metrics than TSR, but how about how xG and the Team Rating compare to one another? Interestingly it seems like the Team Rating is a fair amount better than xG at predicting future goals against, whilst xG appears to be marginally better than the Team Rating at predicting future goals for. When it comes to goal ratio (the one I’d be most interested in) the two metrics appear to be essentially equally good.

The second of Michael’s studies looked at how predictive each of four metrics (points, goal difference, TSR, and xG) were of future points scored by a given team. Again I’ve repeated that study, except replaced xG with the Team Rating. The results are summarised in the Table below:

Screen Shot 2014-08-09 at 7.52.44 AM

As before we’re ideally looking to minimise the RMSE and MAE values. Given the disparity of the RMSE/MAE values for goal difference and points I don’t think too strong a conclusion can be derived from this table, but it appears that at most there’s likely to only be a negligible difference in the predictive ability of the two metrics. The main take away is really that, whilst TSR is more predictive of future points scored than either points or goal difference, xG and the Team Rating are in turn both more predictive of future points scored than TSR.

3.4 How well does the Team Rating represent the table at the end of the season?

Below is a plot of the average Team Rating recorded by each team to finish in a given position in the Premiership (the y-error bars for each position represent 1 standard deviation in the Team Rating of the teams that finished the season in that position).

Screen Shot 2014-08-11 at 10.23.28 AM

There’s a good trend here – on the whole a lower Team Rating results in a finishing position that’s lower in the Premiership table. Perhaps unsurprisingly the decrease essentially halts between the 9th and 16th placed teams – and a plot of Points scored this season vs. Team Rating this season (R^2 = 0.80) features a lot more spread around the line of best fit in that region.

Secondly, let’s group the teams in terms of their Team Rating at the end of the season and see what we find. If the Team Rating makes sense then teams with a lower Team Rating should finish the season in a lower position in the table, have a worse goal difference, be more likely to be relegated and less likely to finish the season in the top four:

Screen Shot 2014-06-16 at 8.29.46 AM

And that’s exactly what we see. One team (Everton ’04-05) is a major outlier to this plot (see Section 4.3 for discussion), but other than that we can draw some pretty broad conclusion:

1. If your team has a Team Rating greater than 6 at the end of the season there’s a good chance that they’ll have finished in the top four.
2. If your team has a Team Rating greater than 2 at the end of the season then the odds that they’ve been relegated are pretty slim.

4 Discussion

4.1 How early in the season can the Team Rating be used?

Methodology: I’ve taken the Premiership table after ‘x’ games and calculated the rating of each team using the relevant coefficients in Section 3.1. From this I’ve calculated the number of points that the teams rating would expect them to score in the remaining ’38 – x’ games, and added that to the points that each team has already scored to give a projected final league table. I’ve then compared where teams are projected to finish in this table to where they actually finished that season.

Screen Shot 2014-08-09 at 7.34.44 AM

So a mere 3 games into the season the projected table correctly identifies almost 2/3rds of teams that will go on to finish in the top four, and half of the teams that will be relegated at the end that season. After 19 games 90% of the top four teams and 2/3rds of the relegated teams are correctly identified.

Furthermore, a teams rating in the first 19 games of the season is able to predict how many points that team will score in the final 19 games of the season with a RMSE of 6.2 points.

4.2 Outliers

For the plot below I’ve calculated the number of points that a team would be expected to score based on its Team Rating. This number of Expected Points has then been subtracted from the number of points that the team actually scored.

Screen Shot 2014-08-09 at 7.48.04 AM

95% of teams finish the season within 14.1 points of the points total expected by their full season rating. That’s good to know, but for now I’m interested in the other 5% of teams – those that deviate further from the points that they were expected to score. If there’s a pattern to those teams it may be possible to use that to improve the Team Rating in the future:

Screen Shot 2014-08-09 at 8.38.34 AM

Now that we’ve identified the teams lets take a look at whether this was a one-off or whether it’s something that they subsequently repeat in the following season?

Screen Shot 2014-08-09 at 8.39.11 AM

So we began with 14 teams. 3 of those teams were relegated and one is ’13-14 Arsenal, who haven’t yet played a follow-up season. That leaves us with 10 teams. Three of those changed their managers between seasons – with two regressing to the expected number of points the following season and Liverpool underperforming their expected points in the first season under Rodgers.

That leaves us with seven teams that are outliers (to 2 STDEV) compared to the points expected from their rating in the first season and don’t change their manager before the following season. Of these, Newcastle regress so far that in ’12-13 they scored one standard deviation fewer than the points expected from their ’12-13 Team Rating. West Ham regressed almost exactly to the mean. So lets focus on the other five teams and managers and see whether they had a track record of scoring fewer or more points than their Team Rating suggests that they should.

In the plot below the solid red line represents United under Ferguson, the dashed red line represents Liverpool under Benitez, the dark blue line represents Stoke under Pulis, the light blue line represent City under Keegan, and the dashed blue line represents Everton under Moyes.

Screen Shot 2014-08-09 at 9.10.03 AM

Ferguson’s United outperformed the points expected by their rating for each of his final 8 seasons in charge, and by an average of 8 points per season over the full 13 year sample. That’s a large amount – I think I can say with a solid amount of certainty that Ferguson was getting his team to do something that was conducive to scoring points but isn’t captured by the Team Rating.

Liverpool under Benitez consistently scored fewer points than would have been expected according to their Team Rating. I’ve said a few time that I think Liverpool were unlucky not to win a title under Benitez. I think they were a good team who got an amount of bad luck but I also think they were somewhat unlucky with their timing in that they happened to be a good team at the same time that United were also good and Chelsea were tremendous. Here though it appears that Benitez was doing something that led to his team having high Team Ratings but wasn’t conducive to them scoring points.

Everton under Moyes has a solid sample size that appears to be split into two groups of time over three periods. Between ’04-05 and ’08-09 Everton were doing something that was conducive to scoring points that the Team Rating doesn’t capture (in those seasons they scored an average of ~11 points per season more than their Team Rating would have suggested). In the other six seasons in the sample (’03-04 – ’04-05 and ’09-10 – ’12-13), however, Everton scored an average of ~1 point fewer per season than their Team Rating would suggest that they would. So they had something special there for a while that seems real given the length of time that it lasted. Furthermore it would be interesting to see what changed in their play between ’04-05 and ’05-06.

Stoke under Pulis were more of a mixed bag. On average they outperformed their Team Rating by 9 points. To the eye it looks like there was a downward trend and whatever Pulis was initially doing was being eroded, but it can also be pretty dangerous to draw conclusions from five data points. The same can also be said about City under Keegan, who scored fewer points than expected in each of the three seasons Keegan was in charge, but again the sample size is only three.

This is maybe something that I’ll look into further but it doesn’t really appear to be an easily discernable pattern where we can look at teams in advance and say ‘we’d expect teams ‘x’ and ‘y’ to score fewer/more points than their Team Rating suggests because of ‘z’’. It would, however, be great to sit down and contrast a significant number of games that Liverpool played under Benitez in, say ’06-07, to United under Ferguson in ’12-13 or ’13-14 or Everton under Moyes in ’04-05.

4.3 The effect of a shot on a teams rating

I thought this would be a nice way to round off the post – just to give an idea of the effect that a single shot would have on the rating of the average team. In the ’13-14 Premiership the average team took 511 shots, 170 of which were on target, and 53 of which resulted in a goal. So below is a summary of the effect of a shot off target, a shot on target, and a goal, on a teams rating:

Screen Shot 2014-08-09 at 7.54.39 AM

5 Appendix

Appendix 1: I took all of the Premiership teams for a pair of consecutive seasons (n = 221). The first thing that I did was split up each season into the first 19 games and the second 19 games and determined the correlations between TSR, %TSOTt, and PDO in those samples. Next I took the final 19 games of this season and look at how well the metrics in those 19 games compared with the metrics recorded by the same team in the first 19 games of next season. The R (not R^2) values are shown below.

Screen Shot 2014-08-11 at 9.53.02 AM

8 thoughts on “Methodology and validation of the Team Ratings

  1. Pingback: 2014-15 Premiership Predictions | James' Blog

  2. Really interesting idea, but I’m not a fan of the way you’ve scaled the rating from 0 to 10. I dont like how the Team Rating of an average team gets mapped from a nice round 500 to an ugly 4.127. Also what happens if a team manages to get a raw Team Rating of less than 363 or greater than 695 over a season? Given the 38 game values of x y and z, the raw Team Rating could theoretically be anywhere from 25 to 1854 (I’m assuming the displayed formula should say y^0.5, not y^2). I dont know how this model would handle a rating of 40 or -10.
    It might be better to scale it more like: (2 * ((Rating-25) / (1854-25)) – 1) to get a score from -1 to +1, and then apply a sinusoid like: sin((pi*Rating)/2) to spread the values out a bit.
    Cant wait to start playing around with this though, good job

  3. Hi James,

    May have missed a step there but …

    would expected points be:

    xP = Team rating/1000 * #games * 3

    Thus if a team played 38 games it would be … team rating/1000 * 38 * 3

    Is this right?


  4. re: a new acronym for %TSoTt :- because you’ve shown them to regress similar amounts I’ve been calling %TSoTt ‘AccuracyPDO’ and the original PDO ‘ConversionPDO’

    Dunno if it’s a good idea

  5. Two things I wasn’t sure if I don’t understand or you made small mistakes:
    – You say Chelsea 08/09 are your 10.0 team, but Chelsea 09/10 look better, with a 0.69 TSR, I think you meant 09/10?
    – In the very first plot, you call the red square SoTr but I think it’s actually the tSoTt% measure?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s