The aim is to show graphically how regression to the mean occurs. The method I’ve used is as follows. First I’ve determined each teams sh% for games 1-19 of the season. I’ve then taken various samples such as ‘top 25%’ (the 50 teams that had the highest sh% out of the sample of 200) and determined their sh% for each successive 19 game sample (ie 2-20, 3-21.. ..20-38). I then repeated the whole process for sv% and PDO. The resulting plot for sh% is below.
As you can see each and every line regresses towards the mean to some extent. The correlation (R2 value) between sh% in games 1-19 and games 2-20 is 0.121, suggesting ~65% regression towards the mean (which marries nicely with the results from part II).
How about for sv%?
This actually regresses more strongly towards the mean than sh%. In fact the teams that make up the bottom 5% in the first half of the season actually regress past the mean in their final 19 games. The R2 value in this case is a mere 0.066, significantly lower than for sh% (Again this fits well with my findings in part II). I think the plot, coupled with the poor correlation nicely highlight just how random sv% actually is.
Finally for PDO
Again the regression is strong towards the mean. The R2 value in this case is a mere 0.083 (close to the value I observed in part II).
I think this takes sv%, sv% and PDO about as far as possible with the data I have. If I get more I’ll probably come back and take another look but if I am yet to convince you they regress towards the mean I’m guessing I never will.