As with the recent rash of articles talking about the state of analytics in football and bemoaning the lack of numbers available or being used (seriously, is it some secret rite of passage that I’m missing out on?) I’ve noticed a big increase in the number of discussion on twitter trying to find which metrics are the most important when trying to determine which team is most likely to win a given game. Case in point, from yesterday:
And this, from the start of the season.
Look, I think it’s great that people are trying to search for these things, but the reason they don’t align 100% are so easily explained that I’m not sure why we don’t think about them for a couple of minutes. The people posting this stuff aren’t like the people at certain sites who post easily dissected rubbish in the name of making money, they (as far as I know) have little in the way of a monetary angle to spin, and seem pretty intelligent.
Aside from being at the very top of the table when it comes to hockey analytics, Gabe Desjardins did some work on football three years ago that, honestly, wipes the floor with anything that I’ve seen before or since. To make things really easy this link will take you straight to his the archive of his football articles, go there and read it. Why am I telling you to do this? Well primarily because he’s a hell of a talented guy, and a good read, but also in the hopes you’ll stumble across this piece about score effects (link).
Score effects are easy to explain and impossible to ignore whilst watching a game once you know they exist. When one team has the lead then the onus is on the trailing team to take the requisite risks necessary to get into a goal scoring position. As the game wears on they are more willing to a) take more risks which leave them vulnerable defensively, and b) take shots from worse scoring positions. This combines to the losing team taking more shots, but from worse positions, than they would have done when the game was tied, whilst the team in the lead can bide their time and wait for better opportunities to present themselves, meaning they’ll take fewer shots than they would when the game was tied. Thus it makes sense that teams that get outshot still win a decent proportion of the games they play. And that’s the crux for me here – why does anyone care when there’s a damn good reason why they shouldn’t always correlate? And if you’re using that as justification as to why shots on target are a better predictor of success than total shots, then I think you’re building a house of cards on very shaky foundations.
Two asides here. 1) I’d instinctively suggest that total shots would be affected more by score effects than shots on target, because, as the trailing team is shooting from worse positions, it follows that they’d be less likely to hit the target. But, hey, that’s not the point of this post. 2) Possession has long since been derided as a pointless stat, and the adage that it’s where on the pitch you have the ball that matters. I’m partly in bed with both of those ideas, and a strong proponent of the weight adjusted measure that Gabe developed, which credits teams only a tiny amount for completing a pass between two centre halves, but much more so for passes within the attacking penalty area.
Let me demonstrate further why this is a pointless exercise. The table below contains data from the 3,358 Premiership wins from the beginning of the ’00-01 season, with the percentage chance that the team having a higher value of a given metric winning the game in the final row.
Hopefully this sufficiently highlights the asininity – if you were to ask me which single metric a team should do better than it’s opponent at to win a match I would tell you that it is PDO. The metric that is > 90% luck over the course of a season (it’s got to be damn near 100% luck over a single game). To put this another way, we know that the more dominant a team is in terms of shots, the more points it will score (link), however that barely holds true for PDO (link – see plot c), even though, by it’s nature, each bounce you get going your way is beneficial to you points wise.
Show me a model that works over 38 games, where there’s a ton less random variation, then lets start thinking about utilising the best bits of that to get one that works over 20 games, 10 games, 5 games, and finally within a single game. Right now, to steal the analogy from Simon Gleave (although not necessarily about this specific subject), a lot of us are trying to run before we can walk. Personally I’m not convinced we’re at the crawling stage yet.