## Brief rumination on PDO

James Yorke has a post here today about PDO. I have thoughts about several of the points raised but it’s simpler to write them here than just send a long string on twitter. It’ll be in a quote/response format.

 “…then you get a grey area where some people multiply the derived figure by 1000, some by 100 or others, well, me at least, leave it as a decimal. … Why isn’t there a standardised numerical format? I asked Ben Pugsley, because he knows a ton more than I do: “Why is it rated to 100? Or 1000?” and he said that it was a “long held thing in analytics (…) from baseball (…) 100 is defined as an “average” ” and that the usage of 1000 was purely a method to add detail and see 4 digits.”

I’ll explain quickly why I go with multiplying by 1000. It’s quicker to say and quicker for at least my mind to process – I suspect I’m not the only one. For example, let’s take a PDO of 1028.

For number x 1000: “Ten twenty eight”
For number x 100: “One oh two point eight”
For raw number: “One point oh two eight”

The argument holds for basically every value of PDO. You can still have all of the digits in whatever format you choose, the first one is just easier for my mind to process. And, as there’s never going to be a team on the x 1000 scale who could be confused for a team on the x 100 scale (i.e., teams neither register a PDO of 300 on either of the x 100 or x 1000 scales) it’s relatively intuitive to figure out what the average should be (I’ll come back to this later).

 “I propose this, and I propose it with goodwill but little expectation: (goals for divided by shots on target for) minus (goals against divided by shots on target against) AKA: (shooting % For) minus (shooting% Against) This does two things: We are no longer orbiting around an arbitrary number, we are centering around positive or negative. A high PDO will be positive and a low PDO will be negative. Understanding is relatable here, we have the universal law of goal difference: positive is good, negative is bad. It makes sense.”

Ok, positive is good, negative is bad. That’s reasonable. It’s probably easier than “something with a 1 in front of it is above average, something with a 9/8 in front of it is below average”.

 “We are combining the For and Against aspect of the same metric. Just as we subtract Goals Against from Goals For to create Goal Difference, we do the same to create PDO or “Shooting% difference”.”

I’m actually not sure what the point is here, as PDO also combines the for and against aspect of shooting %. Maybe the point is that both do this but PDO doesn’t have the positive good/negative bad aspect so this is a better method? I’m not sure.

 “And it is entirely related why? The eagle-eyed will have noticed, we’ve essentially derived the same number as PDO, we’ve just decluttered it a bit. The PDO of 107 or 1070 is now defined as 0.07. A PDO of 982 is now -0.18. Average is zero.”

First, I don’t really understand how this chimes with this, from earlier in the piece:

 “I am presuming there was a comfort found in adding your team’s shooting percentage to it’s save percentage; you have built a single figure for your team and you are defining what your team is doing but I feel there is more clarity in the entirely related but subtly different: “What is my team doing and what is the opposition doing against us?””

If two things are entirely related, give a number that tells you the same thing, and you intuit the same information regardless of which way it is reported then I don’t see any subtle difference. In other words you could apply the last line of reasoning from this second quote to PDO and it wouldn’t change a thing. For all we know that is how some people think about PDO right now. Am I missing something?

Second, the first quote there sounds reasonable, but it leaves out a large number of people, in my experience the vast majority, who report shooting percentage as a percentage (say 30), rather than a decimal (0.30). So now we run into exactly the problem that came before, but here there’s actually potential for it to be much worse.

Team A has a sh% of 30%, and a sv% of 70.1%.

Depending on your preference their PDO would be calculated to be 1010, 101.0, or 1.010 – however you calculate it that is relatively simple to intuit as slightly higher than average.

Using the same numbers but using the new method (your sh% minus your oppositions sh%) you’d get a value of .001 if you used decimals and 0.1 if you use percentages. The two scales overlap one another, and whilst 0.1 would be high on the decimal scale it’s essentially nothing on the percentage scale. So now though whilst we have a common centre for all values, unless we explicitly state each time that you’re using the percentage/decimal system, the data is left open to misinterpretation (guess who’s had that happen before). I should add that James suggests this by calculating as a decimal, to 2 decimal places, and if that holds then that’s great.