Thursday, August 10, 2017

Model behaviour

As democracies teeter on the edge of existence around the world and once-eradicated diseases return due to an apparent loss-of-uptake of vaccinations, we appear to be in a time where the masses are losing faith in experts. Scientists, journalists and those steeped in the scientific process are being drowned out by those who "go with their gut" and follow their id. I am here to stand up for the experts, though alas, I am not one. Those more learned than I will quickly realise that this blog is the work of a child in his father's suit trying to close business deals (actually, that might work these days).

Still, you've made it this far so you might as well stay for a bit of analysis, even if my statistical base knowledge is formed from watching archived Harvard lectures and reading old Fangraphs posts. 

For those just joining us, I ran this blog for a number of seasons but took last season off. I have built a reasonable, though hardly sophisticated model to try and project fantasy football scores and I hope you'll follow along this season as we see where we get things right, and where it all goes horribly wrong.

In advance of the new season, I posted the projections for the first 12 gameweeks and wanted to discuss a few of the names to give everyone a flavour of where the numbers come from (clue: not my gut). A few readers were kind enough to share some of the names which jumped off the page as odd, so we'll start there then I'll add my own concerns:

New signings and injuries
I'll get a couple of quick ones out of the way first - h/t to @JoseMourinhoIND who asked about Mohamed Salah. All new arrivals to the league are missing from my projections for now as I don't have a statistical baseline on which to form their forecast. I have toyed in the past with trying to translate stats from other leagues - and that could be a worthy project again - but with Opta stats hard to come by for other leagues and the small sample of players who move from, say, Serie A to the Premier League, it's tough to get a forecast that I really feel good about. The new players will be added to the model as soon as they enter the Opta database but we'll obviously need to tread carefully and not overreact to a week or two's worth of data.

With regard to injuries, @GoalscorerC notes that the list includes players like Hazard and Sanchez who won't play in the first couple of weeks. This is indeed a problem but one I have just accepted I have to live with. Other responsibilities coupled with no longer being the UK just make it too hard to follow all the team news and keep an up to date list of who's in and who's out. I decided it's better to therefore permanently include everyone and outsource the team news to our friends at Fantasy Football Scout or Sky Sports.

Alli vs Eriksen vs De Bruyne
@mpok3_fpl asks why Alli is rated so much higher than teammate Eriksen and why both are ahead of assist-God De Brunye. Eriksen and De Brunye are relatively close with 53 and 49 points respectively projected for the first 12 gameweeks of the season. This pair are very similar across the board and two of the key metrics in the model - the share of a team's total shots inside the box and created chances a player accounts for - are almost exactly the same (Eriken has a ShareSiB of 10% and ShareCC of 24% versus 9% and 23% for De Brunye). Neither player takes penalties and both provide a good threat from corners so the conclusion is basically that Eriksen enjoys a slightly bigger piece of a slightly bigger pie. 

Alli is a different profile player and the model loves him. Among first team regulars, his share of SiB rate of 18% is bested by only Antonio (25%), Sanchez (24%), Redmond (20%), Arnautovic (20%) and Hazard (18%). Sanchez and Hazard are obviously elite fantasy options (and priced accordingly) and while the other names of this list are reasonably priced, remember that they enjoy a slightly larger share of a significantly smaller expected goal haul (Southampton and Stoke are forecast to score 20 goals between them in the first 12 gameweeks, compared to Spurs' 21 goal total). Alli isn't just a one dimensional player either, accounting for around 10% of his side's created chances. 

A further consideration for this trio is their shot profiles. Last season Eriksen managed to easily eclipse his counterparts with 133 shots compared to Alli's 94 and De Brunye's 86. But, Eriksen took 94 of those efforts from outside the box and converted them to goals at a 3% clip, which is right around league average for long rangers. Alli meanwhile managed 58 SiB and converted these efforts into goals at an excellent rate of 28%. In fact, this rate is so good that it might the cause of the most concern for Alli. His GiB/SiB rate of 28% and G/SoT rate of 40% are both well above league average and might suggest regression this year. It sounds obviously true, but there is at least some doubt about a player's ability to convert chances into goals at a sustainably higher than average rate, but I am largely convinced that goals per overall chance is more controllable than pure goals per SoT, which does seem to have an element of luck involved. As I continue to refine the model in the coming weeks I might look to regress these high GiB/SiB rates more than I currently do, which would hurt elite players like Alli (28%) or Sanchez (24%) and boost players like De Brunye (6%) or Sigurdsson (5%) who were less clinical last season.

How about Liverpool
Our friends over at @The_First_Touch ask about Firmino and Coutinho being surprisingly low and wonder if it's again to do with shot profile. When I first looked I assumed that the model wasn't overly impressed by Liverpool's prospects as a whole but, au contraire, they are actually forecast for the 5th best attack through the first 12 gameweeks. The problem for this pair is that they just weren't very efficient with their chances last year, even accounting for shots taken outside the box. Their GiB/SiB rates were 13% for Coutinho and 16% for Firmino, which are solid but well below the sky high rates mentioned earlier for the likes of Alli and Sanchez. So again, how you feel about this pair probably comes down to how much you think these shot rates regress. For what's worth, if we jump back another season we see rates of 23% for Alli, 21% for Firmino and just 10% for Coutinho, perhaps suggesting Firmino has some room to rise, but maybe Coutinho is right about where he is going to settle.

How to value defenders
This is the big issue for me and one I do not have a satisfactory answer to. I am fairly confident that the model does a decent job and forecasting goals conceded per game which allows for a reasonable ranked assessment of whether, say, Southampton or Swansea are more likely to keep a clean sheet this week. However, when it comes to converting this probability into points, the model (i.e. my small brain) struggles. This tends to undervalue clean sheets and thus players with good attacking stats become overvalued, especially those who have enjoyed attacking success in limited playing time (and whose rates aren't sufficiently regressed by the model). 

One option would be to simply forecast defenders' attacking numbers, which allows for comparability between teammates but is useless when deciding whether to go with defenders from different teams or whether to go with a back three or four. The other is to keep going with the deeply flawed version and try and find a better way to convert predicted shot data in clean sheet probability (suggestions in the comments!). I therefore ask that you if you see the odd weird name in the defensive listing you take it with a pinch of salt and instead primarily focus on the team defensive forecast if you want a bit of help setting your weekly lineup or planning a team's defense to back. More on this topic in coming weeks.

No comments: