The pointless search for a metric that predicts the outcome of a single match

As with the recent rash of articles talking about the state of analytics in football and bemoaning the lack of numbers available or being used (seriously, is it some secret rite of passage that I’m missing out on?) I’ve noticed a big increase in the number of discussion on twitter trying to find which metrics are the most important when trying to determine which team is most likely to win a given game. Case in point, from yesterday:

Screen Shot 2013-02-25 at 11.57.32 PM

And this, from the start of the season.

Look, I think it’s great that people are trying to search for these things, but the reason they don’t align 100% are so easily explained that I’m not sure why we don’t think about them for a couple of minutes. The people posting this stuff aren’t like the people at certain sites who post easily dissected rubbish in the name of making money, they (as far as I know) have little in the way of a monetary angle to spin, and seem pretty intelligent.

Aside from being at the very top of the table when it comes to hockey analytics, Gabe Desjardins did some work on football three years ago that, honestly, wipes the floor with anything that I’ve seen before or since. To make things really easy this link will take you straight to his the archive of his football articles, go there and read it. Why am I telling you to do this? Well primarily because he’s a hell of a talented guy, and a good read, but also in the hopes you’ll stumble across this piece about score effects (link).

Score effects are easy to explain and impossible to ignore whilst watching a game once you know they exist. When one team has the lead then the onus is on the trailing team to take the requisite risks necessary to get into a goal scoring position. As the game wears on they are more willing to a) take more risks which leave them vulnerable defensively, and b) take shots from worse scoring positions. This combines to the losing team taking more shots, but from worse positions, than they would have done when the game was tied, whilst the team in the lead can bide their time and wait for better opportunities to present themselves, meaning they’ll take fewer shots than they would when the game was tied. Thus it makes sense that teams that get outshot still win a decent proportion of the games they play. And that’s the crux for me here – why does anyone care when there’s a damn good reason why they shouldn’t always correlate? And if you’re using that as justification as to why shots on target are a better predictor of success than total shots, then I think you’re building a house of cards on very shaky foundations.

Two asides here. 1) I’d instinctively suggest that total shots would be affected more by score effects than shots on target, because, as the trailing team is shooting from worse positions, it follows that they’d be less likely to hit the target. But, hey, that’s not the point of this post. 2) Possession has long since been derided as a pointless stat, and the adage that it’s where on the pitch you have the ball that matters. I’m partly in bed with both of those ideas, and a strong proponent of the weight adjusted measure that Gabe developed, which credits teams only a tiny amount for completing a pass between two centre halves, but much more so for passes within the attacking penalty area.

Let me demonstrate further why this is a pointless exercise. The table below contains data from the 3,358 Premiership wins from the beginning of the ’00-01 season, with the percentage chance that the team having a higher value of a given metric winning the game in the final row.

Screen Shot 2013-02-26 at 12.13.09 AM

Hopefully this sufficiently highlights the asininity – if you were to ask me which single metric a team should do better than it’s opponent at to win a match I would tell you that it is PDO. The metric that is > 90% luck over the course of a season (it’s got to be damn near 100% luck over a single game). To put this another way, we know that the more dominant a team is in terms of shots, the more points it will score (link), however that barely holds true for PDO (link – see plot c), even though, by it’s nature, each bounce you get going your way is beneficial to you points wise.

Show me a model that works over 38 games, where there’s a ton less random variation, then lets start thinking about utilising the best bits of that to get one that works over 20 games, 10 games, 5 games, and finally within a single game. Right now, to steal the analogy from Simon Gleave (although not necessarily about this specific subject), a lot of us are trying to run before we can walk. Personally I’m not convinced we’re at the crawling stage yet.

Advertisements

4 thoughts on “The pointless search for a metric that predicts the outcome of a single match

  1. “As with the recent rash of articles talking about the state of analytics in football and bemoaning the lack of numbers available or being used (seriously, is it some secret rite of passage that I’m missing out on?”

    *Stares at the ground in shame*

  2. I agree with nearly everything you say James, especially the TSR and PDO stuff. However, it’s slightly harsh to pick out Andrew Beasley’s tweet as a “Case in point” of people “on twitter trying to find which metrics are the most important when trying to determine which team is most likely to win a given game.”

    That one tweet was followed by a link to an article by Guardian journalist Sean Ingle. It was taken from an article in the Guardian called ‘Football is a numbers game but only Gary Neville seems to realise it’

    http://www.guardian.co.uk/football/blog/2013/feb/24/football-numbers-game-gary-neville

    Here’s the quote:

    “When you look at the teams with the best possession stats in Europe this season – Barcelona 69.76%, Bayern Munich 63.91%, Manchester City 58.44%, Arsenal 58.24%, Juventus 58.23%, Liverpool 58.1%, Lille 58.01% – it is clear that it does not guarantee success (even Barcelona have lost 35 times since the start of the 2007-08 season).

    Nor does it necessarily tell you which team are on top. Last season Swansea had lots of the ball but little in the attacking third. It is telling that Rob Mastrodomenico of Global Sports Statistics, which uses data and advanced models to help predict future matches, says: “From a purely modelling point of view we don’t use possession. Shot-based stats are more relevant if you are looking for a team to score.”

    Opta’s figures back that up. Of the 181 games won in the Premier League before last weekend, the team who had the most possession only won 103 – 57% in total. The team who had more shots on target than their opponents won 128 matches – 71% of the total.”

    So it was a stat Sean Ingle used when describing the weak use of possession data, and that shots on target data is the way to go when predicting success. And the figures were from Opta, now Andrew.

  3. In retrospect I could have clicked the RT that came up in my timeline, followed it through to that Guardian piece, and would have realised very quickly it wasn’t a stat Andrew had derived. However I also think if you’re taking a stat from someone/somewhere else then the burden is on you to accredit them within that tweet, as it’s perfectly reasonable for someone reading a RT to assume the author of the tweet is the owner of that stat. Also, I imagine you could get in a fair amount of trouble for lack of attribution if twitter one day decided to police such things.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s