Friday, August 25, 2017

Some help to pick your defenders

Quick note: One of the reasons I restarted the blog this year was that my wife and I were expecting twins in late September so I knew I would be home a lot more with plenty of sleepless hours available to mindlessly consume football matches. Well, the twins didn't think this was a great idea and decided to convert xTwins into Twins on August 13th. This means I've been a bit delayed rolling out some of the models and ideas I had to kick off the season but once we get into a routine I will ramp up the content a bit. I like to think that they are already concerned about the potential issues around small sample sizes and wanted to force me to wait until the season was a few weeks old before overreacting.

Selecting defenders and 'keepers requires a different strategy to your attacking options, given that so much of their value is locked up in their team's performance. Of course, playing on a good team will help any player but while it's possible to have a lot of success as a forward or midfielder on a mediocre team, it's really hard to do so as a defender, barring a freaky season where you get extremely lucky with goal conversion or are fortunate enough to be played higher up the pitch.

Thus, with a few exceptions over the years, I tend to pick the team first and then figure out who offers the best way to access that unit. This will require a weighting of three factors:

  • playing time
  • price, and
  • attacking threat

Generally you will want to ensure the playing time factor first and then weigh the attacking threat against price tag and see how much bang for your buck you can get. Hopefully the below viz allows this to be done quickly for each team. Let's run few a couple of teams below to get an idea of how I hope it might work.

Note: The data within is based on the 2017-18 season only and thus needs to be taken with all the relevant small sample warnings. If you have a significantly higher or lower opinion of a player's attacking ability then don't totally discount it, but this might serve to adjust your prior, especially once we get another week or two under our proverbial belts

Let's start with an easy - if somewhat theoretical - example. I must concede here that I haven't seen Brighton play a full game this season so this is a theoretical pick for illustrative purposes. Chris Hughton has picked a consistent back four in the first two games, all of whom cost 4.5m so if you wanted a Brighton defender you should simply be looking to maximise attacking threat, which to date points to Lewis Dunk by a reasonable distance. Suttner's 11 passes in the final 3rd point towards a player who might create chances in the future but otherwise the pick here should be Dunk. If these numbers hold up then it would start to suggest that holding other Brighton players would be a mistake.

The first thing to note here is that Bertrand comes at a premium to his teammates yet, at least so far this season, actually has a lower attacking threat than both Cedric and Yoshida. Now, in this particular case I think one might want to exercise a bit of caution as Bertrand has a decent history of delivering solid attacking returns, but if this trend continues then it makes little sense to own the former Chelsea man. The data would suggest that Yoshida is an interesting differentiating option as he's provided the same attacking threat as Cedric but is owned by just 2% of managers compared to Cedric's 12%.

David Luiz's performance last week in midfield received good reviews which makes him well placed to earn minutes in this team. At just 6.0m he brings with him a good attacking threat which in theory then renders Cahill, Azpilicueta and Rudiger as poor investments. The question then becomes how much better Alonso is and whether that gap justifies the extra 1.0m. If you are aiming for a 2,200 point season then that 1.0m will need to earn you somewhere in the vicinity of 22 points (this gets more complicated later in the year, but bear with me for now) which is a goal and a handful of assists or 3-4 goals. That's a fairly significant haul and so while every else dives on Alonso after last week's heroics, I would suggest that the decision is closer than it looks.

With just two gameweeks in the bag, it's hard for players to really start to distinguish themselves from their teammates but I hope this little viz will be useful in the coming weeks as you look to structure your defense, perhaps during a wildcard international break.

Sunday, August 13, 2017

Model Behaviour: Converting shots into goals

Thursday, August 10, 2017

Model behaviour

As democracies teeter on the edge of existence around the world and once-eradicated diseases return due to an apparent loss-of-uptake of vaccinations, we appear to be in a time where the masses are losing faith in experts. Scientists, journalists and those steeped in the scientific process are being drowned out by those who "go with their gut" and follow their id. I am here to stand up for the experts, though alas, I am not one. Those more learned than I will quickly realise that this blog is the work of a child in his father's suit trying to close business deals (actually, that might work these days).

Still, you've made it this far so you might as well stay for a bit of analysis, even if my statistical base knowledge is formed from watching archived Harvard lectures and reading old Fangraphs posts. 

For those just joining us, I ran this blog for a number of seasons but took last season off. I have built a reasonable, though hardly sophisticated model to try and project fantasy football scores and I hope you'll follow along this season as we see where we get things right, and where it all goes horribly wrong.

In advance of the new season, I posted the projections for the first 12 gameweeks and wanted to discuss a few of the names to give everyone a flavour of where the numbers come from (clue: not my gut). A few readers were kind enough to share some of the names which jumped off the page as odd, so we'll start there then I'll add my own concerns:

New signings and injuries
I'll get a couple of quick ones out of the way first - h/t to @JoseMourinhoIND who asked about Mohamed Salah. All new arrivals to the league are missing from my projections for now as I don't have a statistical baseline on which to form their forecast. I have toyed in the past with trying to translate stats from other leagues - and that could be a worthy project again - but with Opta stats hard to come by for other leagues and the small sample of players who move from, say, Serie A to the Premier League, it's tough to get a forecast that I really feel good about. The new players will be added to the model as soon as they enter the Opta database but we'll obviously need to tread carefully and not overreact to a week or two's worth of data.

With regard to injuries, @GoalscorerC notes that the list includes players like Hazard and Sanchez who won't play in the first couple of weeks. This is indeed a problem but one I have just accepted I have to live with. Other responsibilities coupled with no longer being the UK just make it too hard to follow all the team news and keep an up to date list of who's in and who's out. I decided it's better to therefore permanently include everyone and outsource the team news to our friends at Fantasy Football Scout or Sky Sports.

Alli vs Eriksen vs De Bruyne
@mpok3_fpl asks why Alli is rated so much higher than teammate Eriksen and why both are ahead of assist-God De Brunye. Eriksen and De Brunye are relatively close with 53 and 49 points respectively projected for the first 12 gameweeks of the season. This pair are very similar across the board and two of the key metrics in the model - the share of a team's total shots inside the box and created chances a player accounts for - are almost exactly the same (Eriken has a ShareSiB of 10% and ShareCC of 24% versus 9% and 23% for De Brunye). Neither player takes penalties and both provide a good threat from corners so the conclusion is basically that Eriksen enjoys a slightly bigger piece of a slightly bigger pie. 

Alli is a different profile player and the model loves him. Among first team regulars, his share of SiB rate of 18% is bested by only Antonio (25%), Sanchez (24%), Redmond (20%), Arnautovic (20%) and Hazard (18%). Sanchez and Hazard are obviously elite fantasy options (and priced accordingly) and while the other names of this list are reasonably priced, remember that they enjoy a slightly larger share of a significantly smaller expected goal haul (Southampton and Stoke are forecast to score 20 goals between them in the first 12 gameweeks, compared to Spurs' 21 goal total). Alli isn't just a one dimensional player either, accounting for around 10% of his side's created chances. 

A further consideration for this trio is their shot profiles. Last season Eriksen managed to easily eclipse his counterparts with 133 shots compared to Alli's 94 and De Brunye's 86. But, Eriksen took 94 of those efforts from outside the box and converted them to goals at a 3% clip, which is right around league average for long rangers. Alli meanwhile managed 58 SiB and converted these efforts into goals at an excellent rate of 28%. In fact, this rate is so good that it might the cause of the most concern for Alli. His GiB/SiB rate of 28% and G/SoT rate of 40% are both well above league average and might suggest regression this year. It sounds obviously true, but there is at least some doubt about a player's ability to convert chances into goals at a sustainably higher than average rate, but I am largely convinced that goals per overall chance is more controllable than pure goals per SoT, which does seem to have an element of luck involved. As I continue to refine the model in the coming weeks I might look to regress these high GiB/SiB rates more than I currently do, which would hurt elite players like Alli (28%) or Sanchez (24%) and boost players like De Brunye (6%) or Sigurdsson (5%) who were less clinical last season.

How about Liverpool
Our friends over at @The_First_Touch ask about Firmino and Coutinho being surprisingly low and wonder if it's again to do with shot profile. When I first looked I assumed that the model wasn't overly impressed by Liverpool's prospects as a whole but, au contraire, they are actually forecast for the 5th best attack through the first 12 gameweeks. The problem for this pair is that they just weren't very efficient with their chances last year, even accounting for shots taken outside the box. Their GiB/SiB rates were 13% for Coutinho and 16% for Firmino, which are solid but well below the sky high rates mentioned earlier for the likes of Alli and Sanchez. So again, how you feel about this pair probably comes down to how much you think these shot rates regress. For what's worth, if we jump back another season we see rates of 23% for Alli, 21% for Firmino and just 10% for Coutinho, perhaps suggesting Firmino has some room to rise, but maybe Coutinho is right about where he is going to settle.

How to value defenders
This is the big issue for me and one I do not have a satisfactory answer to. I am fairly confident that the model does a decent job and forecasting goals conceded per game which allows for a reasonable ranked assessment of whether, say, Southampton or Swansea are more likely to keep a clean sheet this week. However, when it comes to converting this probability into points, the model (i.e. my small brain) struggles. This tends to undervalue clean sheets and thus players with good attacking stats become overvalued, especially those who have enjoyed attacking success in limited playing time (and whose rates aren't sufficiently regressed by the model). 

One option would be to simply forecast defenders' attacking numbers, which allows for comparability between teammates but is useless when deciding whether to go with defenders from different teams or whether to go with a back three or four. The other is to keep going with the deeply flawed version and try and find a better way to convert predicted shot data in clean sheet probability (suggestions in the comments!). I therefore ask that you if you see the odd weird name in the defensive listing you take it with a pinch of salt and instead primarily focus on the team defensive forecast if you want a bit of help setting your weekly lineup or planning a team's defense to back. More on this topic in coming weeks.

Player Forecast: First Draft

With the new season upon us I wanted to get out some quick and dirty forecasts for the first few weeks of the year. The forecast data is almost entirely based on prior year data so I have not taken into account players changing teams (Lukaku, Walker) or those with new opportunities and teammates. By definition, a lack of historic data means that the new arrivals (Morata, Lacazette) are also missing for now, but will be added as soon as we have a small amount of data to go on. The defensive data looks a bit wacky for now so I would base my defensive decisions more on the team projections for now.

I will also update and share some workings around how the model is going to work this year as I try to improve on the decent, but far from perfect version from a couple of seasons back.

It's great to be back and I thank everyone reading this for giving the blog another chance. Please share your thoughts in the comments or on Twitter and I will make changes in the coming weeks.

Good luck for gameweek one!

Sunday, August 6, 2017

Hello again, old friends

I hope that the old adage about absence and fondness holds true and anyone reading this will still have some time for this little blog in this obscure corner of the internet. I took last season off from fantasy football - basically all football in fact - and now I'm back, armed with a pile of Opta stats and an online subscription to access all the live and archived footage anyone would wish to ingest.

Since I last wrote on these pages I've changed careers and I am delighted to be expecting twins in a few weeks. These two factors have come together to shape the direction of the blog for the upcoming season. I am now working in the international development sector and one of the lessons I have learned so far is around comparative advantage - no one is best placed to do everything on their own. Thus I am not going to try and do everything on this site. I am not really going to cover injuries, suspensions or even expected playing time as many other sites are better at this and have greater resources to do a better job than I ever could (simply not being able to leave Sky Sports News on for 6 hours a day is a big disadvantage!). The expected arrivals next month will also mean that regular, weekly posts might be more tricky to deliver on a firm schedule.

So, I am going to focus on creating usable statistics and particularly data visualizations that I hope can help folks select their team and maximize their transfer budgets while providing some counterpoints to the general stampede to judgement that generally occurs every Saturday afternoon. I still hope to write pieces too, but these will be more focused on particular points of interest, rather than a weekly roundup.

The first thing added to the site today is the new Team Snapshot page, which is currently loaded with data from last season. This will of course be migrated over the new season after a couple of gameweeks, but for now I hope it will be useful when reminding yourself about how teams performed last season and where there is potential for them to regress.