Simple one this – firstly let’s take the last twelve Premiership seasons and look at the how the number of goals scored and allowed by a team are distributed.

So the distribution of goals conceded somewhat resembles a normal distribution – with an equal mean and median (50). The distribution of goals scored is more interesting though. For one thing it really doesn’t look all that ‘normal’ in shape – there’s a lot more weighting to the low side of the mean than the high. In fact the median of the group is 47 goals, ~6% fewer than the mean. The second thing to notice is that the spread appears naturally wider for goals scored than allowed – and if we look at the most ‘extreme’ seasons in terms of goals scored and allowed then it appears this observation may be true:

The real way to measure the distribution though is via standard deviation. So lets take that group of 240 teams and calculate the standard deviation for the number of goals, shots on target, and total shots that they score and concede.

So the observations from the plot are correct – the spread is wider in terms of attacking events than defensive ones. One thing that sticks out here is that the gap between STDEV of goals F/A is ~10% of their mean value, as opposed to ~5% and ~4% for ST and TS – it’d be great if someone who knows statistics better than I could explain whether that tells us something about the game or its because STDEV, by its nature, takes square roots. I think it’s the former but I’m not going to comment on it until I’m sure.

Anyway this, I think, is the conclusion we’d draw solely by looking at the transfer market where, in general, strikers cost more money than defenders. It seems intuitive to me that you’d spend more money on players who can have the largest effect on your team and, if there’s a larger variation in the number of goals scored, then that’s where your resources should be being directed.

### Like this:

Like Loading...

*Related*

Hi James,

Seems like the right post to throw this question at you.

Interestingly, this past week, I had some spare time and was looking at what I could see in the GF/GA numbers for the second division in Spain … I saw this post and was intrigued … here’s my question (or maybe more, we’ll see) …

Last season (2011/12) there was a team that came in second and posted the following numbers …

Pts85, GF83, GA37, GD+46 (PAR 51.9)

It should be noted that the division contains 22 teams.

Anyway, I then looked at what the difference was between those numbers and what average 2nd placed teams had posted over the past 15 years, this is what I got …

dPts9.4, dGF21.1, dGA-3.0, dGD24.1.

The dGA is negative as they had 3 goals less against during the season than the average 2nd placed team, thus were slightly better.

I have all the numbers for all 22 teams … and looked at “d”s for each of the 21 other teams to see how each team ranked against their position’s history … I then did the following …

I took the STDEV of all 22 teams regarding Pts, goals etc. WITHIN the season … thus, for example STDEV for GF over 2011/12 is 5.56.

If I understand it correctly, this means that that 2nd place team is in the 99.7% group for the 2011/12 season (2 teams fell in the 95% group, both with better than historical average GF) …

Now … if I find the STDEV of GF (and the other numbers) of 2nd place teams over the past 15 years (thus not WITHIN the season) … if this team also falls within the 95%/99.7% range does this then point to that team having a good/exceptional attack over the 2011/12 season?

And then … if I know the PAR that this team posted (and the dPAR being the same as the dPts) … can I somehow link how the offense of this team has contributed to the PAR that the team achieved? If I then take a basic view and look at the goal scorers, can I start to tentatively look at a basic indication of the goal scorers’ quality (e.g., a “good” striker)?

I hark back to your statements that GD is a good indicator of quality.

Anyway, it is all very basic and unfortunately I do not have shot stats.

Basically, what I hope to do in the future is that IF what I’ve written above is sound … I can hopefully link PAR with GD … If I then look at individual teams and seen the +/- and minutes stats of individual players (due to results and team line ups) I can start getting a rough indication of if a player has been beneficial (or not) for the team with regards to GD and thus get a rough indication of player quality based on that.

There are a million things I’m glossing over for the time being (e.g. line up and quality of opposition) but I can come back to that later.

Anyway, if I’m doing things wrongly or making assumptions that aren’t feasible please say so … I can stop wasting my time. 🙂

cheers.

Well lets go through it step by step

On average we’d expect a team with +46 goal difference over 42 games to score 87 points, so we can tentatively assume that this was in the ballpark of their true talent.

That’s a very low # for the STDEV of goals for – do you mean that you took the STDEV of the total goals scored, or the STDEV of the dGF?

If it’s for total goals then, as 99.7% corresponds to 2.75 STDEV above the mean, the average team would score 68 goals per season. Intuitively it doesn’t seem like that you’d have a league with 15% more goals than the Premiership but a STDEV 1/3 the size. Unless you’re using 99.7% group to encapsulate any team of >99.7% chance? Either way, 5.56 is an incredibly small number given that I assume the title winning team scored roughly the same number of goals, if not more – thus out of a group of 22 teams there’d be at least two when we’d expect to see one every twelve seasons.

If it’s STDEV of the dGF then 1) I don’t think that’s a calculation that has a solid mathematical justification and 2) 21.1/5.56 = 3.79 STDEV above the mean, which is about a 13,000 to 1 long-shot.

I’d forget dPAR – as reasonable as it is to say that ‘this was a good second placed team’ if you start getting into territory where you aren’t comparing apples to apples then the danger is that you’ll lose sight of the bigger picture.

That being said if you can find the components that go into building a PAR model then certainly this is the first steps by which to do so. I find it tough to believe you’d be able to do so without passing/shots data though. It’s also tough to do with or without you (WOWY) analysis for football because you rarely have a player who plays between 30 and 70% of his teams minutes, and you need a good sample with him on and off the pitch to begin to determine his value.

sorry … yes I mean the STDEV for dGF within the season … so, how does each team relate to the average for that position over the past 15 years.

For example …

For the team in first the line looks like this …

Pts91 dPts11.1 GF76 dGF10.1 GA45 dGA8.1 GD31 dGD2.0

As a comparison, the team in 12th …

Pts52 dPts-2.1 GF54 dGF9.9 GA64 dGA16.7 GD-10 dGD-6.9

and the team in last …

Pts31 dPts-2.1 GF37 dGF0.6 GA58 dGA-7.1 GD-21 dGD7.7

Again a NEGATIVE dGA means the team is BETTER than the average for that position over the past 15 yrs and a POSITIVE dGA means a team is WORSE than the average.

As for falling into the 95% categories … for the 2011/12 season only #1’s dGF falls into it (on a positive sense) and #12’s dGA falls into it on a negative sense. The rest of the “d” numbers fall within the 68%.

1) Why does the calculation not have solid mathematical justification?

My reasoning (though it may be waaay off 🙂 ) … the dGF (and others) show how teams relate to their positions historically … now if, in a specific season, all/most teams tend to be off by quite a bit then maybe that season was weird …

For example … for the 2011/12 season the STDEV dGF was thus 5.56 … for the previous season (2010/11) the STDEV dGF was 8.34. For those two seasons the STDEV dGA was 8.18 and 8.58 (2010/11). For 2010/11 that 2nd placed team mentioned previously came in 6th, had a dGF of 8.5 (compared to all 6th placed teams historically).

What I thus have above is STDEV dGF(specific season) … I thus need to figure out STDEV dGF(historical).

Season to season they thus went from a 68% team to a 99.7% team with respect to dGF for their positions based on a seasonal view.

I’ve been using that Excel file I sent you.

If thus a GD of +46 gives and expected points of 87 … over the past 15 years no team has had such a high GD, second highest is +41 and they got 81 pts, and came in 1st. No other teams have managed a GD of over 40 over the past 15 years.

On the flip side, there have been 2 teams with a GD of -60 and one at -47, no other under -40.

It is all copy/paste, so will do that later today. I’ll get back to you on the STDEV dGF(historical).

phew, need to sit down with someone (who knows this stuff) with paper in front of us and chat about things … hate communicating like this. 🙂

Thanks for the reply, illuminating.

cheers.

done … the results …

I calculated all the historical STDEVs for all “d” numbers … interesting fact …

last season (2011/2012) there were 11 STDEVs that were in the 95% group! they are:

– dPts for the teams in 1st, 2nd, 3rd, 6th and 7th (all in a positive aspect, i.e. scored more points than what one would normally expect).

– dGF the 2nd placed team just squeaked into the 95% category. Scored way more goals then one would normally expect of a 2nd place team.

– dGA the 12th, 16th and 21st all made the 95% group, all let in a lot more goals than normal for those positions.

– dGD the 2nd and 18th placed team jump out. 2nd have a very high dGD compared to what is normal and the 18th placed team are more than 2 STDEVs below historical average of dGD.

2010/11 have 5 95% stats (all in the dGF and dGA categories).

The other 13 season have only 8 instances of teams getting into the 95% bracket for a certain category (I should say that I still have to double check, may have missed out on one or two close calls).

What does this mean? Both seasons had 133+ more goals scored than the total league goal average (1084.7). Last season also saw the highest league points total (1274 pts scored by all teams combined, average is 1244.3) recorded over the past 15 years (2nd is 2006/07 with 1267pts.)

I’ll have to do this for the primera división too … can get shot totals for that division. Wanted to start out basic so that I don’t get swamped with (crap/needless) info. 🙂

cheers.

oops … is should read … 11 “d” numbers