Sunday, August 13, 2017

Model Behaviour: Converting shots into goals

Thursday, August 10, 2017

Model behaviour

As democracies teeter on the edge of existence around the world and once-eradicated diseases return due to an apparent loss-of-uptake of vaccinations, we appear to be in a time where the masses are losing faith in experts. Scientists, journalists and those steeped in the scientific process are being drowned out by those who "go with their gut" and follow their id. I am here to stand up for the experts, though alas, I am not one. Those more learned than I will quickly realise that this blog is the work of a child in his father's suit trying to close business deals (actually, that might work these days).

Still, you've made it this far so you might as well stay for a bit of analysis, even if my statistical base knowledge is formed from watching archived Harvard lectures and reading old Fangraphs posts. 

For those just joining us, I ran this blog for a number of seasons but took last season off. I have built a reasonable, though hardly sophisticated model to try and project fantasy football scores and I hope you'll follow along this season as we see where we get things right, and where it all goes horribly wrong.

In advance of the new season, I posted the projections for the first 12 gameweeks and wanted to discuss a few of the names to give everyone a flavour of where the numbers come from (clue: not my gut). A few readers were kind enough to share some of the names which jumped off the page as odd, so we'll start there then I'll add my own concerns:

New signings and injuries
I'll get a couple of quick ones out of the way first - h/t to @JoseMourinhoIND who asked about Mohamed Salah. All new arrivals to the league are missing from my projections for now as I don't have a statistical baseline on which to form their forecast. I have toyed in the past with trying to translate stats from other leagues - and that could be a worthy project again - but with Opta stats hard to come by for other leagues and the small sample of players who move from, say, Serie A to the Premier League, it's tough to get a forecast that I really feel good about. The new players will be added to the model as soon as they enter the Opta database but we'll obviously need to tread carefully and not overreact to a week or two's worth of data.

With regard to injuries, @GoalscorerC notes that the list includes players like Hazard and Sanchez who won't play in the first couple of weeks. This is indeed a problem but one I have just accepted I have to live with. Other responsibilities coupled with no longer being the UK just make it too hard to follow all the team news and keep an up to date list of who's in and who's out. I decided it's better to therefore permanently include everyone and outsource the team news to our friends at Fantasy Football Scout or Sky Sports.

Alli vs Eriksen vs De Bruyne
@mpok3_fpl asks why Alli is rated so much higher than teammate Eriksen and why both are ahead of assist-God De Brunye. Eriksen and De Brunye are relatively close with 53 and 49 points respectively projected for the first 12 gameweeks of the season. This pair are very similar across the board and two of the key metrics in the model - the share of a team's total shots inside the box and created chances a player accounts for - are almost exactly the same (Eriken has a ShareSiB of 10% and ShareCC of 24% versus 9% and 23% for De Brunye). Neither player takes penalties and both provide a good threat from corners so the conclusion is basically that Eriksen enjoys a slightly bigger piece of a slightly bigger pie. 

Alli is a different profile player and the model loves him. Among first team regulars, his share of SiB rate of 18% is bested by only Antonio (25%), Sanchez (24%), Redmond (20%), Arnautovic (20%) and Hazard (18%). Sanchez and Hazard are obviously elite fantasy options (and priced accordingly) and while the other names of this list are reasonably priced, remember that they enjoy a slightly larger share of a significantly smaller expected goal haul (Southampton and Stoke are forecast to score 20 goals between them in the first 12 gameweeks, compared to Spurs' 21 goal total). Alli isn't just a one dimensional player either, accounting for around 10% of his side's created chances. 

A further consideration for this trio is their shot profiles. Last season Eriksen managed to easily eclipse his counterparts with 133 shots compared to Alli's 94 and De Brunye's 86. But, Eriksen took 94 of those efforts from outside the box and converted them to goals at a 3% clip, which is right around league average for long rangers. Alli meanwhile managed 58 SiB and converted these efforts into goals at an excellent rate of 28%. In fact, this rate is so good that it might the cause of the most concern for Alli. His GiB/SiB rate of 28% and G/SoT rate of 40% are both well above league average and might suggest regression this year. It sounds obviously true, but there is at least some doubt about a player's ability to convert chances into goals at a sustainably higher than average rate, but I am largely convinced that goals per overall chance is more controllable than pure goals per SoT, which does seem to have an element of luck involved. As I continue to refine the model in the coming weeks I might look to regress these high GiB/SiB rates more than I currently do, which would hurt elite players like Alli (28%) or Sanchez (24%) and boost players like De Brunye (6%) or Sigurdsson (5%) who were less clinical last season.

How about Liverpool
Our friends over at @The_First_Touch ask about Firmino and Coutinho being surprisingly low and wonder if it's again to do with shot profile. When I first looked I assumed that the model wasn't overly impressed by Liverpool's prospects as a whole but, au contraire, they are actually forecast for the 5th best attack through the first 12 gameweeks. The problem for this pair is that they just weren't very efficient with their chances last year, even accounting for shots taken outside the box. Their GiB/SiB rates were 13% for Coutinho and 16% for Firmino, which are solid but well below the sky high rates mentioned earlier for the likes of Alli and Sanchez. So again, how you feel about this pair probably comes down to how much you think these shot rates regress. For what's worth, if we jump back another season we see rates of 23% for Alli, 21% for Firmino and just 10% for Coutinho, perhaps suggesting Firmino has some room to rise, but maybe Coutinho is right about where he is going to settle.

How to value defenders
This is the big issue for me and one I do not have a satisfactory answer to. I am fairly confident that the model does a decent job and forecasting goals conceded per game which allows for a reasonable ranked assessment of whether, say, Southampton or Swansea are more likely to keep a clean sheet this week. However, when it comes to converting this probability into points, the model (i.e. my small brain) struggles. This tends to undervalue clean sheets and thus players with good attacking stats become overvalued, especially those who have enjoyed attacking success in limited playing time (and whose rates aren't sufficiently regressed by the model). 

One option would be to simply forecast defenders' attacking numbers, which allows for comparability between teammates but is useless when deciding whether to go with defenders from different teams or whether to go with a back three or four. The other is to keep going with the deeply flawed version and try and find a better way to convert predicted shot data in clean sheet probability (suggestions in the comments!). I therefore ask that you if you see the odd weird name in the defensive listing you take it with a pinch of salt and instead primarily focus on the team defensive forecast if you want a bit of help setting your weekly lineup or planning a team's defense to back. More on this topic in coming weeks.

Player Forecast: First Draft

With the new season upon us I wanted to get out some quick and dirty forecasts for the first few weeks of the year. The forecast data is almost entirely based on prior year data so I have not taken into account players changing teams (Lukaku, Walker) or those with new opportunities and teammates. By definition, a lack of historic data means that the new arrivals (Morata, Lacazette) are also missing for now, but will be added as soon as we have a small amount of data to go on. The defensive data looks a bit wacky for now so I would base my defensive decisions more on the team projections for now.

I will also update and share some workings around how the model is going to work this year as I try to improve on the decent, but far from perfect version from a couple of seasons back.

It's great to be back and I thank everyone reading this for giving the blog another chance. Please share your thoughts in the comments or on Twitter and I will make changes in the coming weeks.

Good luck for gameweek one!

Sunday, August 6, 2017

Hello again, old friends

I hope that the old adage about absence and fondness holds true and anyone reading this will still have some time for this little blog in this obscure corner of the internet. I took last season off from fantasy football - basically all football in fact - and now I'm back, armed with a pile of Opta stats and an online subscription to access all the live and archived footage anyone would wish to ingest.

Since I last wrote on these pages I've changed careers and I am delighted to be expecting twins in a few weeks. These two factors have come together to shape the direction of the blog for the upcoming season. I am now working in the international development sector and one of the lessons I have learned so far is around comparative advantage - no one is best placed to do everything on their own. Thus I am not going to try and do everything on this site. I am not really going to cover injuries, suspensions or even expected playing time as many other sites are better at this and have greater resources to do a better job than I ever could (simply not being able to leave Sky Sports News on for 6 hours a day is a big disadvantage!). The expected arrivals next month will also mean that regular, weekly posts might be more tricky to deliver on a firm schedule.

So, I am going to focus on creating usable statistics and particularly data visualizations that I hope can help folks select their team and maximize their transfer budgets while providing some counterpoints to the general stampede to judgement that generally occurs every Saturday afternoon. I still hope to write pieces too, but these will be more focused on particular points of interest, rather than a weekly roundup.

The first thing added to the site today is the new Team Snapshot page, which is currently loaded with data from last season. This will of course be migrated over the new season after a couple of gameweeks, but for now I hope it will be useful when reminding yourself about how teams performed last season and where there is potential for them to regress.

Saturday, September 26, 2015

Players' share of team totals

It wasn't really my intention to roll out pieces of the model in various stages but I've been a bit slower than I hoped in finalising this year's version so wanted to at least present the different pieces as they're available. We first looked at the team +/- which gives an indication of how a team might perform in future weeks beyond a simply shots/game type metric which fails to adjust for strength of schedule. Next up is to look at the players' share of their team totals, which will help turn the forecast team data into something we can use for individuals.

This isn't a complex calculation, but a couple of points are worth noting:
  1. The calculation excludes any games the player misses and only uses team data from games they appear in. It isn't, therefore, the same as simply looking at player's shots to date for the season divided by his team's total. 
  2. I do not make an adjustment for minutes played, so players who make a lot of substitute appearances will suffer a clouded picture. If, for example, a player comes on for 10 minutes and registers his team's only shot inside the box but the team had four before he came on, his percentage for the day will be registered as 20% even though in reality it should be 100%. This is an unfortunate drawback of relying on aggregated data which doesn't include sufficient data tags to identify when individual events happened.
With those caveats in mind, here is a quick visualisation of player's share of team totals to date.