Tuesday, September 19, 2017

Elite forwards: a response to City's past week

Over an eight day period last week, Man City thumped their three opponents by a combined score of 15-0, with Liverpool, Feyenoord and Watford unable to contend with Guardiola's men. It goes without saying that City were extremely impressive in these games and fantasy football managers are obviously taking note with Aguero and Jesus shooting to the top of many managers' transfer target lists (at the time of writing some 440,000 managers had already brought in Aguero and another 90,000 have targeted Jesus). The point of this post is not to suggest that these transfers are incorrect - indeed I might follow suit myself - but I did want to run over a few facts to maybe turn the temperature down on the need to make these moves right now.

The chart below shows the expected points each of the listed elite forwards have registered through the first 5 gameweeks. You can see Aguero's terrific GW5 effort eclipsing his rivals this past week but you will also note that Kane has three gameweeks with an expected points total close or above Aguero's game against Watford. Now, it should be noted here that the expected points number I am using is not as complex as some models, and indeed is a simplified version of my own, but it does a good enough job highlighting that Kane and Lukaku have been wracking up very solid shot and created chance numbers on week-on-week and so one very good game from Aguero, and to a lesser extent Jesus, does not need to completely change your transfer plans.

Note: Firmino's sky high xP in GW1 is due to his penalty which is scored in the model as essentially 4 guaranteed points on top of his other goal and assist potential.

The model forecast still likes Kane quite a bit more than any other forward using the blended or prior year season conversion rates, which ranks Lukaku just ahead of Jesus and Aguero in terms of projected goals. Aguero tops all his rivals when it comes to assist threat which is a useful tool to have and would push him sufficiently far ahead of Jesus as to justify the extra million or so pounds.

To me, there remains a relatively clear hierarchy of Kane at the top, Lukaku and Aguero in a near tie for second and then Jesus, Morata and to an extent Firmino following in the lower tier. Therefore the switch from Lukaku to Aguero makes good sense but is not suddenly a "must do" transaction if there are other areas of the team you need to address.

Friday, September 15, 2017

Gameweek 5 Projection

Note: I have still not figured out a method to allocate clean sheet points to players that I'm happy with, so for now the below projection is for attacking points only (plus two for playing time). Clean sheet forecasts can be found in the team projections here.

Friday, September 8, 2017

Revised player forecast

One of the key complications with player forecasting - or indeed I imagine any forecasting - is deciding which data set to use. When it comes to fantasy football, we obviously want to include as much recent data as possible but the issue is when we can exclusively rely on this season's data and when we need to look to the past seasons for guidance. I am generally quite happy to solely rely on this season's raw event data such as shots or created chances fairly early on as they tend to occur with relative frequency and thus stabilise in a short time frame. How these events get converted into goals can fluctuate a lot more though, as the key driver there - goals - happen much less frequently. With this mind, the revised player projection table below allows you to choose how you are converting the raw chances into goals and assists:

  • Past season - uses the benefit of having a 38 game sample to see how different teams convert chances into goals. The negative, of course, is that teams have changed since the previous season, both in terms of personnel and with the arrival of three new promoted teams (who use a historical average for promoted sides in this forecast);
  • Current season - the most up-to-date conversion rates will help to spot players and teams who have genuinely improved since last season but will be subject to a much greater sample size risk so will kick out some unusual results;
  • Blended - this rate uses a combination of historic and current data, increasingly weighted towards the latter as the season progresses.
A couple of highlights below that jump out as of gameweek 4:
  • I generally dislike the term "must own" as there are thousands of ways to build a successful team but there is no denying that Kane's data so far this season is simply incredible. The fact that his ownership percentage has fallen to 26% - including a 0.1m price reduction - represents a great opportunity and it's hard to see how one could select a wildcard team now without the Spurs man.
  • Mahrez is an interesting case of low ownership with just 5% of managers fancying the former Champion. Perhaps this was due to the presumption he would leave Leicester during the transfer window, but now that is closed, a lot more attention needs to be directed his way. Using the prior season conversion rate cuts his xG significantly which reflects the fact that Leicester have been so clinical (lucky?) this term, but nevertheless he remains a legitimate elite midfielder who at 8.5m isn't too badly priced.
  • The xA listing offers more moderately priced options than the xG list, which makes me wonder if it will be better to focus resources on elite forwards and midfielders who can score goals, knowing that you can find low-risk-high-reward midfielder picks who contribute assists with cheaper options like Brady, Carroll or Loftus-Cheek.
  • With Spurs offering about as good defensive prospects as any team, the fact that Ben Davies ranks so highly in xA while also being cheaper than most of his teammates makes him a very promising prospect.

Select tabs below for goal and assist forecasts for the next 6 and 12 gameweeks:

Saturday, September 2, 2017

Expected goals, assists and points

The visualization below plots the expected fantasy points arising from expected goals versus those arising from expected assists. The idea here is that this is a quick snapshot of how a player has performed to date and where their points are coming from. It should be noted that xG and xA numbers used to generate the xP are based on shot, created chance and possession date for the 2017-18 season only but the conversion rates to convert those raw events into goals are regressed using team and league rates for both the current and prior seasons. I hope this eliminates some noise from the small sample sizes of the early season but it's still worth noting that this is a snapshot based on three games so should act as a data point for your transfer assessments but not an all encompassing answer.

Friday, August 25, 2017

Some help to pick your defenders

Quick note: One of the reasons I restarted the blog this year was that my wife and I were expecting twins in late September so I knew I would be home a lot more with plenty of sleepless hours available to mindlessly consume football matches. Well, the twins didn't think this was a great idea and decided to convert xTwins into Twins on August 13th. This means I've been a bit delayed rolling out some of the models and ideas I had to kick off the season but once we get into a routine I will ramp up the content a bit. I like to think that they are already concerned about the potential issues around small sample sizes and wanted to force me to wait until the season was a few weeks old before overreacting.

Selecting defenders and 'keepers requires a different strategy to your attacking options, given that so much of their value is locked up in their team's performance. Of course, playing on a good team will help any player but while it's possible to have a lot of success as a forward or midfielder on a mediocre team, it's really hard to do so as a defender, barring a freaky season where you get extremely lucky with goal conversion or are fortunate enough to be played higher up the pitch.

Thus, with a few exceptions over the years, I tend to pick the team first and then figure out who offers the best way to access that unit. This will require a weighting of three factors:

  • playing time
  • price, and
  • attacking threat

Generally you will want to ensure the playing time factor first and then weigh the attacking threat against price tag and see how much bang for your buck you can get. Hopefully the below viz allows this to be done quickly for each team. Let's run few a couple of teams below to get an idea of how I hope it might work.

Note: The data within is based on the 2017-18 season only and thus needs to be taken with all the relevant small sample warnings. If you have a significantly higher or lower opinion of a player's attacking ability then don't totally discount it, but this might serve to adjust your prior, especially once we get another week or two under our proverbial belts

Let's start with an easy - if somewhat theoretical - example. I must concede here that I haven't seen Brighton play a full game this season so this is a theoretical pick for illustrative purposes. Chris Hughton has picked a consistent back four in the first two games, all of whom cost 4.5m so if you wanted a Brighton defender you should simply be looking to maximise attacking threat, which to date points to Lewis Dunk by a reasonable distance. Suttner's 11 passes in the final 3rd point towards a player who might create chances in the future but otherwise the pick here should be Dunk. If these numbers hold up then it would start to suggest that holding other Brighton players would be a mistake.

The first thing to note here is that Bertrand comes at a premium to his teammates yet, at least so far this season, actually has a lower attacking threat than both Cedric and Yoshida. Now, in this particular case I think one might want to exercise a bit of caution as Bertrand has a decent history of delivering solid attacking returns, but if this trend continues then it makes little sense to own the former Chelsea man. The data would suggest that Yoshida is an interesting differentiating option as he's provided the same attacking threat as Cedric but is owned by just 2% of managers compared to Cedric's 12%.

David Luiz's performance last week in midfield received good reviews which makes him well placed to earn minutes in this team. At just 6.0m he brings with him a good attacking threat which in theory then renders Cahill, Azpilicueta and Rudiger as poor investments. The question then becomes how much better Alonso is and whether that gap justifies the extra 1.0m. If you are aiming for a 2,200 point season then that 1.0m will need to earn you somewhere in the vicinity of 22 points (this gets more complicated later in the year, but bear with me for now) which is a goal and a handful of assists or 3-4 goals. That's a fairly significant haul and so while every else dives on Alonso after last week's heroics, I would suggest that the decision is closer than it looks.

With just two gameweeks in the bag, it's hard for players to really start to distinguish themselves from their teammates but I hope this little viz will be useful in the coming weeks as you look to structure your defense, perhaps during a wildcard international break.

Sunday, August 13, 2017

Model Behaviour: Converting shots into goals

Thursday, August 10, 2017

Model behaviour

As democracies teeter on the edge of existence around the world and once-eradicated diseases return due to an apparent loss-of-uptake of vaccinations, we appear to be in a time where the masses are losing faith in experts. Scientists, journalists and those steeped in the scientific process are being drowned out by those who "go with their gut" and follow their id. I am here to stand up for the experts, though alas, I am not one. Those more learned than I will quickly realise that this blog is the work of a child in his father's suit trying to close business deals (actually, that might work these days).

Still, you've made it this far so you might as well stay for a bit of analysis, even if my statistical base knowledge is formed from watching archived Harvard lectures and reading old Fangraphs posts. 

For those just joining us, I ran this blog for a number of seasons but took last season off. I have built a reasonable, though hardly sophisticated model to try and project fantasy football scores and I hope you'll follow along this season as we see where we get things right, and where it all goes horribly wrong.

In advance of the new season, I posted the projections for the first 12 gameweeks and wanted to discuss a few of the names to give everyone a flavour of where the numbers come from (clue: not my gut). A few readers were kind enough to share some of the names which jumped off the page as odd, so we'll start there then I'll add my own concerns:

New signings and injuries
I'll get a couple of quick ones out of the way first - h/t to @JoseMourinhoIND who asked about Mohamed Salah. All new arrivals to the league are missing from my projections for now as I don't have a statistical baseline on which to form their forecast. I have toyed in the past with trying to translate stats from other leagues - and that could be a worthy project again - but with Opta stats hard to come by for other leagues and the small sample of players who move from, say, Serie A to the Premier League, it's tough to get a forecast that I really feel good about. The new players will be added to the model as soon as they enter the Opta database but we'll obviously need to tread carefully and not overreact to a week or two's worth of data.

With regard to injuries, @GoalscorerC notes that the list includes players like Hazard and Sanchez who won't play in the first couple of weeks. This is indeed a problem but one I have just accepted I have to live with. Other responsibilities coupled with no longer being the UK just make it too hard to follow all the team news and keep an up to date list of who's in and who's out. I decided it's better to therefore permanently include everyone and outsource the team news to our friends at Fantasy Football Scout or Sky Sports.

Alli vs Eriksen vs De Bruyne
@mpok3_fpl asks why Alli is rated so much higher than teammate Eriksen and why both are ahead of assist-God De Brunye. Eriksen and De Brunye are relatively close with 53 and 49 points respectively projected for the first 12 gameweeks of the season. This pair are very similar across the board and two of the key metrics in the model - the share of a team's total shots inside the box and created chances a player accounts for - are almost exactly the same (Eriken has a ShareSiB of 10% and ShareCC of 24% versus 9% and 23% for De Brunye). Neither player takes penalties and both provide a good threat from corners so the conclusion is basically that Eriksen enjoys a slightly bigger piece of a slightly bigger pie. 

Alli is a different profile player and the model loves him. Among first team regulars, his share of SiB rate of 18% is bested by only Antonio (25%), Sanchez (24%), Redmond (20%), Arnautovic (20%) and Hazard (18%). Sanchez and Hazard are obviously elite fantasy options (and priced accordingly) and while the other names of this list are reasonably priced, remember that they enjoy a slightly larger share of a significantly smaller expected goal haul (Southampton and Stoke are forecast to score 20 goals between them in the first 12 gameweeks, compared to Spurs' 21 goal total). Alli isn't just a one dimensional player either, accounting for around 10% of his side's created chances. 

A further consideration for this trio is their shot profiles. Last season Eriksen managed to easily eclipse his counterparts with 133 shots compared to Alli's 94 and De Brunye's 86. But, Eriksen took 94 of those efforts from outside the box and converted them to goals at a 3% clip, which is right around league average for long rangers. Alli meanwhile managed 58 SiB and converted these efforts into goals at an excellent rate of 28%. In fact, this rate is so good that it might the cause of the most concern for Alli. His GiB/SiB rate of 28% and G/SoT rate of 40% are both well above league average and might suggest regression this year. It sounds obviously true, but there is at least some doubt about a player's ability to convert chances into goals at a sustainably higher than average rate, but I am largely convinced that goals per overall chance is more controllable than pure goals per SoT, which does seem to have an element of luck involved. As I continue to refine the model in the coming weeks I might look to regress these high GiB/SiB rates more than I currently do, which would hurt elite players like Alli (28%) or Sanchez (24%) and boost players like De Brunye (6%) or Sigurdsson (5%) who were less clinical last season.

How about Liverpool
Our friends over at @The_First_Touch ask about Firmino and Coutinho being surprisingly low and wonder if it's again to do with shot profile. When I first looked I assumed that the model wasn't overly impressed by Liverpool's prospects as a whole but, au contraire, they are actually forecast for the 5th best attack through the first 12 gameweeks. The problem for this pair is that they just weren't very efficient with their chances last year, even accounting for shots taken outside the box. Their GiB/SiB rates were 13% for Coutinho and 16% for Firmino, which are solid but well below the sky high rates mentioned earlier for the likes of Alli and Sanchez. So again, how you feel about this pair probably comes down to how much you think these shot rates regress. For what's worth, if we jump back another season we see rates of 23% for Alli, 21% for Firmino and just 10% for Coutinho, perhaps suggesting Firmino has some room to rise, but maybe Coutinho is right about where he is going to settle.

How to value defenders
This is the big issue for me and one I do not have a satisfactory answer to. I am fairly confident that the model does a decent job and forecasting goals conceded per game which allows for a reasonable ranked assessment of whether, say, Southampton or Swansea are more likely to keep a clean sheet this week. However, when it comes to converting this probability into points, the model (i.e. my small brain) struggles. This tends to undervalue clean sheets and thus players with good attacking stats become overvalued, especially those who have enjoyed attacking success in limited playing time (and whose rates aren't sufficiently regressed by the model). 

One option would be to simply forecast defenders' attacking numbers, which allows for comparability between teammates but is useless when deciding whether to go with defenders from different teams or whether to go with a back three or four. The other is to keep going with the deeply flawed version and try and find a better way to convert predicted shot data in clean sheet probability (suggestions in the comments!). I therefore ask that you if you see the odd weird name in the defensive listing you take it with a pinch of salt and instead primarily focus on the team defensive forecast if you want a bit of help setting your weekly lineup or planning a team's defense to back. More on this topic in coming weeks.

Player Forecast: First Draft

With the new season upon us I wanted to get out some quick and dirty forecasts for the first few weeks of the year. The forecast data is almost entirely based on prior year data so I have not taken into account players changing teams (Lukaku, Walker) or those with new opportunities and teammates. By definition, a lack of historic data means that the new arrivals (Morata, Lacazette) are also missing for now, but will be added as soon as we have a small amount of data to go on. The defensive data looks a bit wacky for now so I would base my defensive decisions more on the team projections for now.

I will also update and share some workings around how the model is going to work this year as I try to improve on the decent, but far from perfect version from a couple of seasons back.

It's great to be back and I thank everyone reading this for giving the blog another chance. Please share your thoughts in the comments or on Twitter and I will make changes in the coming weeks.

Good luck for gameweek one!

Sunday, August 6, 2017

Hello again, old friends

I hope that the old adage about absence and fondness holds true and anyone reading this will still have some time for this little blog in this obscure corner of the internet. I took last season off from fantasy football - basically all football in fact - and now I'm back, armed with a pile of Opta stats and an online subscription to access all the live and archived footage anyone would wish to ingest.

Since I last wrote on these pages I've changed careers and I am delighted to be expecting twins in a few weeks. These two factors have come together to shape the direction of the blog for the upcoming season. I am now working in the international development sector and one of the lessons I have learned so far is around comparative advantage - no one is best placed to do everything on their own. Thus I am not going to try and do everything on this site. I am not really going to cover injuries, suspensions or even expected playing time as many other sites are better at this and have greater resources to do a better job than I ever could (simply not being able to leave Sky Sports News on for 6 hours a day is a big disadvantage!). The expected arrivals next month will also mean that regular, weekly posts might be more tricky to deliver on a firm schedule.

So, I am going to focus on creating usable statistics and particularly data visualizations that I hope can help folks select their team and maximize their transfer budgets while providing some counterpoints to the general stampede to judgement that generally occurs every Saturday afternoon. I still hope to write pieces too, but these will be more focused on particular points of interest, rather than a weekly roundup.

The first thing added to the site today is the new Team Snapshot page, which is currently loaded with data from last season. This will of course be migrated over the new season after a couple of gameweeks, but for now I hope it will be useful when reminding yourself about how teams performed last season and where there is potential for them to regress.