Benjamin Massey has basically done all of the leg work for me to write this. In this post Massey put together five seasons worth of team shooting percentages and looked at how they regressed cumulatively over time. With permission I’m going to take those numbers, and essentially repeat the methodology of this post from April 2011 where I looked the season-to-season regression of team shooting percentage in the Premiership.

In short I’m going to take a teams shooting percentage in year one, and plot it on the y axis. On the x axis is the corresponding teams shooting percentage in year 2. The resulting plot is shown below (n=56).

The amount of regression is represented by the R value of the best fit line, which in this case is ~0.26. So if we want a best guess as to the value of a teams shooting percentage in year 2, we should regress their year 1 value 74% of the way towards the league mean shooting percentage of 28.0%.

In short, a seasons worth of team shooting percentage in the MLS is ~26% skill, and ~74% luck, or one part skill to three parts luck. Does this mean that shooting has lower skill component in the MLS than the Premiership, given that I found the component due to luck in the Premiership was only 61%? Well not necessarily. In the sample Massey has used the MLS season contained fewer games than in a Premiership season (30 in 2008-10, and 34 in ’11-12, compared to 38 per Premiership season), and it makes sense that a shorter season will have a larger luck component (for example imagine how much more likely it is to flip 4 heads in a row than it is to flip 50). Is that enough to explain the difference? I’m not sure, and don’t really want to do the maths right now, but it makes sense that the gap would at least be narrowed if not overcome were the MLS season 38 games long.

### Like this:

Like Loading...

*Related*

Perhaps MLS teams change more their composition than EPL teams, and that might account for the worse correlation?

We have also played with these auto-correlation models for different stats, but in my experience compare a season to the next one is not very enlightening, for two reasons: first, there are normally big changes in team composition from a year to the next, and second (at least in the EPL, don’t know if that’s the case in MLS as well) as some teams get promoted/relegated the opposition faced can be radically different.

What has worked better for us is comparing values in the first half of the season with values in the second half, in the EPL 19 games is long enough so that skill takes over luck, and opposition faced is the same (save the home/away factor, which we can take care of separately anyway).

Talent overtakes random variation after 16-18 games – https://jameswgrayson.wordpress.com/2013/04/07/after-how-many-premiership-games-does-talent-become-more-important-than-random-variation/

I’d be very surprised if you’ve found a good correlation between sh% in games 1-19 and games 20-38 – https://jameswgrayson.wordpress.com/2011/05/14/sh-sv-pdo-part-n/

There won’t be a good correlation for sure, as sh% is not a very good performance indicator, but correlation between games 1-19 and 20-38 will certainly be better than across seasons. This principle holds true for most the stats I have checked, which is why I claim it provides a better test to assess suitability of any indicator. Teams simply change too much from a year to the next to draw any meaningful conclusions.

Hey James –

Your last post [Season to season regression of team shooting percentage in the MLS] was freaking awesome. I have gone ahead and added your stuff to my Feedly account. Please keep me updated if you post anywhere else.

Keep rocking –

Jon

“The amount of regression is represented by the R value of the best fit line, which in this case is ~0.26. So if we want a best guess as to the value of a teams shooting percentage in year 2, we should regress their year 1 value 74% of the way towards the league mean shooting percentage of 28.0%.”

Hi James,

How do you come up with that 0.26 number? I’m trying to figure out how much I should regress goal differential from one season to the next. If I have an R value of 0.3457, how do I get the skill and luck % from that ?

In this case 0.26 is the R-value, derived by taking the square root of the R^2 value that any analytical software (excel etc.) will give you if you plot a line of best fit.

I’m assuming two things. One is that you’re R-value is derived from a monovariate regression (the same variables are on the x- and y-axes), and that what you’re referring to as an R-value is actually an R^2 value.

Taking the R-value (which is the root of an R^2 value), crudely speaking, gives the measure of skill in a given metric (the variable you have on the x- and y-axes). If R^2 = .3457, then R = .3457^.5 = .588. In other words the metric you’re measuring is ~59% skill, and ~41% luck.