Sunday, October 6, 2013

Clean Sheet Conversion Rates

Alright, enough is enough. While I'm concerned about delving into data too soon and reaching all sorts of ridiculous small-sample-driven conclusions, I'm equally conscious that people want to start making big decisions with their respective teams and thus it's time to launch the weekly rankings and forecasts (still with that small sample asterisk though).

Before that, let's look at a new addition to the weekly forecasts. With most of the forecasts we do on this site, there are two distinct parts to the puzzle:
  1. What is the expected volume of an underlying event (normally we focus on shots, shots on target etc)
  2. What is the impact of those events on actual footballing events (i.e. goals, assists and clean sheets).
With goals we've spent a reasonable amount of time talking about how we forecast shots and how we convert those expected totals into goals, but I've tended to neglect the defensive side of the game. This had led to the unfortunate position where team totals are given each week but you end up having a tough time comparing, say, your 4th defender and your 4th midfielder, when deciding whether to go 4-3-3 or 3-4-3. With that in mind, this post will attempt to lay out some analysis of how different defensive shot totals are converted to clean sheets, and provide an example of how future forecasts will be made for the defensive side of the pitch.

Converting shots into goals
This is the reverse analysis of the time we've spent looking at player shots becoming goals, though we're going to be looking at it slightly differently as we're dealing with game totals for a team rather than an individual player. The below data is taken from the 2012-13 season only (plus a little bit of the new season), which isn't an ideal sample size, but is all I have to work with for now. If anyone has any differing conclusions stemming from prior years, please share them in the comments below or via email.

The below tables show the percentage of games which became clean sheets after a team conceded the noted number of shots inside the box (SiB) and shots on target (SoT).


The first point to note of course is that (thankfully) the data makes sense - especially in the larger 2012-13 sample - and a team's chance of keeping a clean sheet falls as they concede more shots. Further on that point, you can see that, generally, the first few chances you concede greatly diminish your chance at a clean sheet, where as once you get into the higher ranges, you don't see such variance as each shot is registered. As an example then, if a team is forecast to concede five SiB and two SoT, the data suggests they have a 43% - 50% (35% - 50% based on prior year) chance at a clean sheet, depending on which metric you look at.

Stop the presses! Less shots conceded equals a better chance at a clean sheet! Admittedly this isn't new news, but by putting some actual percentages to the problem, we can hopefully get to a point where we're getting a usable number each week. Right now, the forecast shot data is converted into goals and we'll say a team is forecast to concede 1.2 goals, but as we know, for fantasy purposes, defensive outcomes are very much all or nothing (plus some small impact of playing someone who concedes a lot of goals but you rarely be in that circumstance). A forecast will never suggest a team will score zero goals so you end up with weekly rankings which are useful when determining whether to play goalkeeper A or B but not so good when assessing whether to play defender A or midfielder B or whether or not to sign a defender for the next five weeks.

Team specific data
As we've seen with converting shots into goals, different teams achieve this at different rates, and based on yearly data, there's reason to believe that the difference is sustainable (better players create better chances which lead to better shots and thus a higher chance of conversion). A similar hypothesis can also be put forward then for goals conceded (better teams limit opponents to either shots outside the box, from tight angles or contested efforts and thus will see them converted at a lower rate). Thus, we should look at how different teams saw their surrendered chances converted into goals, before assuming we can use a standard league-wide average. This analysis is prone to small sample size issues, given that a team might only register a particular event once all season. For example, in the last game of last season, Fulham surrendered 16 SiB yet were still able to keep a clean sheet. Given that this was the only time they conceded such a total all year, the data would suggest they have a 100% conversion rate to apply to future games where they concede such a huge haul of shots. This is clearly a perverse situation but is combated by:

  1. Grouping shot totals together into ranges, such 0-4, 5-7 and 8+ SiB conceded rather than looking at each total individually
  2. Regressing totals back to league average based on the number of occurrences in the population. If, for example, Arsenal have 10 games where they've conceded 4 SiB and posted a clean sheet conversion rate of 50% in those games, we have a lot more confidence in that total than the fact that West Ham are 1 for 2 when conceding 8+ shots. We will therefore use a weighted average between the team rate and the league rate to get our overall expected conversion rate.

For reference, the conversion totals (regressed) for each team in each SiB and SoT range for the current year to date are as below:



After just seven weeks, it's to be expected that the team-by-team rates are all very close, as we simply don't have enough data for anyone to really distinguish themselves from average. These rates will likely diverge a bit as the season progresses and we can check back in a few weeks to identify any trends or issues here.

A worked example
Let's look at this week's contest at the Etihad to put the above into context:
  1. As in past projections, let's try and get a forecast for the number of SiB and SoT for the coming week. Going into the week City had registered a -32% +/- score (holding opponents to 32% less SiB than they have averaged elsewhere), so with Everton averaging 8 SiB on the road, we get an expected SiB total of 5.4. On the other side, Everton had posted a +/- of 4% on their travels and City had conceded just 4 SiB at home. This gives an expected SiB total of 4.5. Taking an average of these two totals gives an expectation of very close to 5. 
  2. Using the data from above we see that an expected SiB total of 5-7 is converted into a clean sheet 38% of the time (specific to Man City).
  3. Moving on to SoT, we see that City had posted a very impressive -50% SoT +/- mark at home and so with Everton adding a healthy 4 SoT away from Goodison, we'd give them an expectation of 2 SoT for this game. On the reverse, City have surrendered just 2 SoT per game and Everton's +/- rate of 5% doesn't have much impact here. Again, taking an average of the two marks we get an expectation of 2 SoT for the game.
  4. Using the aforementioned tables again, we see that 2 SoT converts to a clean sheet at a 44% clip.
  5. Long term we can look at which of these two forecasts correlates better to the actual data observed, but for now we will simply take a crude average and get a mark of 41%.
  6. For individual players, the final step is simply to say that a 41% chance of a clean sheet is worth 3.6 points (2 appearance points plus 1.6 for the clean sheet) and that is the number which will then be added to their offensive threat to be included in the eight week forecast / captain rankings. 
So that's the new defensive forecasting system in a nutshell and I'd appreciate any comments / feedback in the comments below or via email / Twitter / Facebook. Regularly scheduled programming will start to resume this week. Thanks for sticking around!

6 comments:

TheD said...

Welcome back... Please don't be away for that long next time.

All the best

Kalix said...

Huzzah!

David Spriggs said...

Surely 3.6 points before offensive considerations is too high in your example as there are minus points for conceding 2,4,6 as well as for cards?

Mike Welsh said...

My fantasy premier league commissioner game disagree. That's my 2 cents anyway.

Try to join my EPL league at http://fanxtpremiership.epl.fanxt.com or just go to http://epl.fanxt.com to see for yourself

HiphopAndBasketball said...

check out my fantasy blog

mobi-ebook-conversion said...

Hey I am so grateful I found your site, i just like to say thank you for a incredible post and a all round enjoyable blog

Optical Character Recognition