Predicting future performance

In my last post I plotted team points in consecutive seasons to illustrate how to calculate regression to the mean. It also serves a purpose in this post so I’ve included it again, with points scored in year 1 (eg 2000-01) plotted on the x-axis and points scored in year 2 (eg 2001-02) plotted on the y-axis.

The R2 value is 0.608*, meaning that points regress ~22% to the mean on a yearly basis. I want to now move on from points and see if there’s anything that shows a better correlation from season to season.

I’m going to compare points to goals, shots on target and total shots and I’m going to evaluate each one in three different ways.

First is differential which is simply described as:
events for minus events against

Second is the ratio:
Events for / (Events for + events against)

And the third way is using Pythagoras:
(Events for)2 / (Events for + events against)2

For each variable I’ve determined the correlation between performance in year 1 and 2 as shown above in the plot for points. The R2 values are summarised in the table below

Differential Ratio Pythagoras
Goals 0.692 0.689 0.659
Shots on target 0.750 0.761 0.750
Total shots 0.738 0.750 0.741

The table tells us two things. Firstly that shots on target are the most reproducible event on a season to season basis and that option two, using the ratio of two events gives the most repeatable results. Whilst using ratios is only marginally better than using differential the difference is enough to be significant.

The best overall is shots on target ratio (SOTR), with less than 13% regression to the mean from season to season.

In the future I’m going to look at how predictive each of these are over a smaller sample of games (from 2 up to 38) and get a definitive result but for the time being I’m going to use SOTR to predict future performance.

*There is some survivor bias here which is likely to be skewing the correlation. Obviously only the 17 teams that aren’t relegated can be used here. Teams are relegated through a combination of being terrible and bad luck. Regression to the mean (eg of PDO) is likely to cause their point total to increase the following year, thus reducing the R2 value observed

About these ads

3 thoughts on “Predicting future performance

  1. Pingback: SPL 2012/13 Stat Predictions | The Celtic View

  2. dear James, I like your blog, and your analysis regarding the soccer data.
    I’m taking a course in managerial economics in Canada, and we have a term project to select some data and make a regression analysis on it. I like your analysis regarding: is barca better with Xavi or without him? Can you please provide me with the data sheet /excel sheet and how you did the data mining for this analysis? I mean if you have a full analysis in excel it would be great. This is my email
    and i really appreciate your assistance. I want to use your approach to analyze: is real madrid going to win the UEFA in 2012? of course data will come from the UEFA statistics website.

  3. Pingback: Not just a tiebreaker: what goal difference tells us « 5 Added Minutes

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s