Thursday, November 16, 2017

Bench pressing

As alluded to in last week's post about Guardiola's "wild" rotation (or lack thereof), I am currently of the opinion that the various City options that are available (plus a couple of other elite players around the league) are so good that it is worth absorbing the inevitable rotation hits that come, even as those punches start to increase in number over the busy holiday period. One of the risks and possible downsides here is that with rotation generally not being something one can plan around (unlike, say, actual injuries or suspensions), the characteristics of your bench need to be tweaked somewhat.

Two key points need to be changed in my own personal bench strategy. First, I need to pay more attention to it. I am generally pretty focused on maintaining a good substitute keeper that rotates nicely with the other option and then one other sub, but after that I do not prioritise having a deep bench. Second, because you won't necessarily be able to choose when to deploy your bench options, I think you are better focusing on higher frequency scoring events such as clean sheets or bonus points rather than just goals. For example, if 2 players are each expected to play 90 minutes in each of the next six games with one projected for 3 - 3 - 5 - 2 - 3 - 3 points where as the other is projected for 2 - 3 - 4 - 7 - 2 - 2 then while the second player is expected to score an extra point overall, one can make the argument that seeing as you expect to deploy this player just once or twice, you are better off with the 5/6 chance of earning more than 2 points with player one than the 3/6 chance with player two. Of course, this only applies to an extent and if there is a bargain player who offers significant scoring potential then they are obviously the better pick.

The below table shows midfielders and forwards available for under 5.0m (though you can set the threshold as you wish) and plots the percentile they fall into with regards to:

  • the share of their team's BPS they account for (which I am using a proxy for bonus point potential),
  • clean sheets (for midfielders only)
  • expected goals
  • expected assists

I have highlighted a few names below the chart that I will personally be considering as stocking stuffers for the holiday season.

Tom Carroll
Carroll has a bit of a cult following in fantasy circles though despite his promising underlying numbers he hasn't really been able to produce much by way of concrete returns. Here though we're interested not just in his assist potential but also Swansea's reasonable clean sheet record and his solid share of BPS which suggest bonus points could flow his way, especially given Swansea's lack of established stars. However, the model doesn't really like Swansea's defense going forward and their opponents in January - TOT, @NEW, LIV, ARS - make me nervous. Still, he's played every game and doesn't look likely to lose his place to the ineffective Sanches and so for 4.5m Carroll remains a reasonable bench option.

Nemanja Matic
Matic was one of the players I had in mind when considering this strategy and the data supports this thought process to a degree. What you're buying here is a very high chance of a clean sheet to accompany an almost guaranteed appearance, though bonus points are always going to be hard to come by with several other good players in this United side. I would prefer Matic at 4.5m as you are really not getting a great deal of offensive threat here but the fixtures from mid-December to mid-January are so good that one could easily see him racking up five clean sheets in eight games, which is just as good as a goal but spread evenly over the fixture list.

N'Golo Kante
Kante was the other player I had in mind when I started this and he too looks like a decent fit here. Chelsea's defense isn't quite as good as United's to date, although the model likes them slightly more in the coming weeks. Kante also provides a fairly significant increase in attacking potential, though again doesn't offer much by way of bonus point potential with other players like Morata and Azpilicueta attracting the lion's share to date. Given his increased attacking threat and Chelsea's incredibly favourable run of fixtures in December and January, Kante is my number one pick here, even at a slight premium to some of the other bargain basement picks.

Tom Cleverley
Richarlison has rightly won the plaudits in this Watford midfield and Doucoure has received serious backing having notched 4 goals to date already, but I actually quite like the unheralded Cleverley as a bench option here. The former United man actually has a slightly higher xG than Doucoure (though neither offers much to get overly excited about with Doucoure's 4 goals from 5 SoT incredibly unsustainable) though it's once again his ability to chip in across the categories we're interested in. Watford's defense projects reasonable well despite being leaky of late and another healthy of fixtures over the holidays look promising.
I'm still not entirely convinced by this strategy, with the other alternative being to grab 13 or 14 affordable players to build a balanced squad where your bench options can offer more than just scraps. With Aguero's potential injury I was considering this alternative strategy but I can't see not owning the Argentine, Kane and Salah when healthy which pretty much rules out a totally balanced side. Thus I will try and spend the next couple of weeks strengthening my awful bench for the inevitable rash of rotation frustration coming our way.

Tuesday, November 7, 2017

On Guardiola's rotation

There seems to be a growing chorus that Man City players are becoming harder and harder to own due to constant rotation from Pep Guardiola. There are two key pieces of information that are useful to assess how damaging rotation can be to a player:
  • The predictability of when the rotation happens (before of after Champions League games, away from home etc),
  • If not selected for the first team, how often do they come on as a sub. If you have a decent bench then - while frustrating - you can deal with players not playing at all. However, if they consistently come off the bench for only 15 minutes or so then you are lowering your odds at success.

Predicting team sheets is not really my forte (nor a particular interest) so we're going to focus on the second point here:

We can see four of the City options have spent time on the bench this season while De Bruyne and Silva have been ever present to date. The calculation for this pair is slightly different as you're getting increased certainty but with lower upside (their 5.8 and 5.5 points per appearance trail their teammates). De Bruyne also comes at a fairly significant premium which personally excludes him from consideration for my own team which I am looking to spread the wealth more evenly, but I can see the appeal of the Belgium's overall package.

Let's look at the other two midfielders first: Sane and Sterling. Each has spent some time on the bench, though Sane is coming off 6 straight starts and has clearly won the confidence of Guardiola despite his young age. The first point to note is how incredible this pair have been to date. For simplicity's sake I am going to continue to use points per game (PPG) numbers here (where as I would usually prefer underlying stats) but I do so with the confidence that their success has been anything but a fluke.

In his 7 starts, Sane has averaged 8.1 PPG, while Sterling slightly eclipses him with 8.2 PPG in his 6 starts. For comparison, the 25% owned Eriksen is averaging 5.7 PPG while his 21% owned teammate Alli is further behind with just 4.9 PPG. Thus, while it is extremely frustrating to have your player on the bench, you need to keep in mind that when they play, they are really operating on another level of production so far this season.

On the subject of being benched, between them, this duo have been benched 7 times, getting some playing time on all but one occasion. With the busy holiday period on the horizon I think these rest numbers could increase somewhat and there might be occasions where these players are completely left out of the days action, but, when they have been on the field for a short cameo, the results have been far from a disaster. In their 7 sub appearances, Sterling and Sane have averaged just 24 minutes but a frankly absurd 4.1 PPG. This number is of course skewed by Sane's brace against Liverpool but still, the pair have delivered returns in 3 of their 7 sub appearances. For reference, that 4.1 PPG compares favorably to how players like Firmino (4.3) or Ozil (3.3) have been performing while playing full matches.

Let's put together a scenario to see how this could play out, using Sterling as an example (who my model likes marginally more but you could replace with Sane without changing the conclusion):

In 10 games over the holidays, Sterling will get 7 starts, 2 sub appearances and will be dropped totally for one game. If he continues to average 7 PPG in those starts and 3 points coming off the bench then he'd accumulate 55 points. We'll then throw in another 2 points for your bench player who will replace him when he misses out. So a total of 57 points for an investment of 13.2m (8.2m for Sterling (or less if you've held him for a while) plus a 5.0m scrub on the bench who just needs to show up).

How else can we piece together such a return? Assuming we could find a player who will start every game over the holidays (doubtful) we're still struggling. Mahrez? 5.1 PPG. Mikhitaryan? 4.7 PPG. The pricier Alli? 4.9 PPG. Eriksen's 5.7 PPG would get you to the 57 points but you've spent 1.5m more money and need him to play every week just to break even. If you played the fixtures well, you might be able to do something with Richarlison and another mid-level option, but then, you can easily afford those players alongside Sterling anyway and it isn't like there is five of them to build a balanced, mid-range midfield (in fact, the list gets extremely limited after Richarlison, maybe Ramsey, I suppose GroƟ).

Let's build this the other way. Let's assume one of the above options can get you 50 points. At 7 points a start and 3 points per sub appearance, Sterling would need just 5 starts and 4 sub appearances to match that output. Then you've got to factor in the huge upside of him playing more due to injuries to others, being more productive off the bench etc, getting extended sub-minutes after City wrap up the game in the first 45 minutes etc.

Up front, one could conclude that the presence of Lukaku, Morata and Kane makes Aguero easier to ignore as they are ready made options who provide a great balance of upside and certainty. But, if you believe in his numbers to data, Aguero is operating on another level, averaging over 9 PPG compared to the ~6 PPG offered by his peers. This means that if Lukaku has continued his 5.8 PPG rate over the full 10 games, 5 games of Aguero plus 5 games of a bench scrub would also come to the same total (5x9 + 5x2 = 55 points). Throw in the fact that most people feel you need two of these elite forwards, and you now need to find two of these options who can match Aguero's prowess if you are going to ignore him.

Jesus is arguably the easiest player of the bunch to ignore given his more modest PPG total (6 in starts, 5.4 overall) and his relative lack of stature in the team.I It's true that City lack great depth up top, but Guardiola has also only deployed both his front men in 4 of 11 fixtures this season, and not since GW5 and one has to feel that Aguero gets slightly more games than his young teammate, all else being equal (Sterling or Sane could even be deployed up top if needed). If you are looking to really spread your money across the team though, I can still see a scenario in which the Brazilian makes sense.

Risks to this analysis
There are, of course, risks to this analysis. The most obvious is probably that City slow down and the gap their players enjoy over their peers is cut. This is definitely possible but I haven't seen many flags in their data to suggest it's on the horizon. You're also not locking anything in either, so if that does happen you are still well placed to jump ship for the other options discussed above. The next risk is that by using such a crude measure (PPG) we are being too simplistic and not accounting for quality of opponent.

However, over a 10 game run you are going to get a reasonable averaging out of opponents and in this particular case, City's 10 game stretch from Dec 3 - Jan 20 is arguably no harder than their fixtures to date, with only three fixtures looking really tough on paper (@MUN, TOT and @LIV although the latter might even be a stretch given the way City massacred Klopp's men in GW4).

My own approach
Personally I am happy to absorb the risk associated by Aguero, Sterling or Sane but will place extra emphasis on sorting out my bench to ride the inevitable benchings. This City side is special and with several mid-priced options available to get access to it, I simply think the opportunity is too good to pass up.

One small side note, is that when selecting my bench I am going to lean more towards certainty of playing time and look to players who might log consistent extra points from clean sheets or bonus points, knowing that they will be called into action but without me being able to choose when to deploy them. This means that players like Kante and Matic will come into play, more so than the likes of Choupo-Moting or Ince who project well but lack the consistent returns to justify the higher price tag given their specific role in my team.

Wednesday, November 1, 2017

Expected Goals - a comparison with Opta

The popularity of "expected goals" as a metric has exploded over the past year or so, with mainstream TV broadcasts now starting to dip their toes in the water of advanced analytics. One inevitable, if slightly unfortunate, consequence is that there are now multiple xG models, which could potentially disagree by a reasonable amount, which to those who need a bit more persuasion as to the merits of statistical analysis, might suggest a lack of accuracy. This has somewhat been the case in baseball with the two big "Wins Above Replacement" (WAR) metrics sometimes disagreeing by a relatively large amount, especially when it comes to valuing pitchers. There is sound methodology behind each metric, of course, but for those who aren't well versed in the intricacies of the debate, the differences can be distracting and serve as fuel for those who want to dismiss analytics and focus on old fashioned "eyeball tests" etc.

I, of course, have my own model which probably predates a lot that are currently around but also lacks some of complexity that is now possible with the proliferation of individual event-by-event data from Opta, which allows one to do a better job at understanding the likelihood of a goal based on the exact location of a shot. My concern is that this model is confusing issues, although with my readership of 15 people against the millions of people exposed to Opta stats, this isn't a huge concern!

Nevertheless, let's first take a look at how different my model is than Opta's, then have a look at a few examples of the kind of player my model gets wrong, and then finally a few words on why you should continue (or start!) to care about the projected data in these electronic pages.

The above data is only based on 2017-18 data and the correlation looks very promising (an 88% r-squared is likely distorted a bit by a high volume of players with very low xG, but still, you can see a strong correlation between the two models and not many wild outliers), especially as there is still something of a small sample size issue with just 10 games in the bag. In terms of looking historically to analyse a game that has happened there is no argument that the Opta model is more sophisticated than mine but I am happy that the data here shows that what I've been offering is at least based in science and comparable to those more learned than I (thanks to ill advised transfers and a failure to heed my own advice, the league position of my own fantasy team doesn't always give the same assurances).

We can however see a few names whose xG varies quite significantly between the models and I want to highlight a couple of these to illustrate where my numbers need to be taken a pinch of salt:

Harry Kane vs Romelu Lukaku
Readers will know that my model loves Kane and he's been the top ranked player for much of the season. However the Opta model liks Lukaku a little more to date with the United man being worth almost a half goal more than his Spurs rival. In terms of sheer volume of shots, this outcome is hard to compute, with Kane taking 20 more shots than Lukaku and outshooting him 38 to 31 inside the box (all in one game fewer after this week's hamstring issue). If you want to take one step further away from shots then Kane still enjoys an advantage with 67 touches in the box to Lukaku's 54. Opta's model therefore must see some additional value in the quality of those chances, which is somewhat hinted at in his 12 big chances compared to Kane's 10. United have been more clinical, converting 18/29 (62%) of their big chances and 21/94 (22%) of their SiB compared to marks of 11/20 (55%) and 16/98 (16%) respectively for Spurs, though this argument gets a bit circular as others would argue that this efficiency is an effect of Lukaku's presence rather than Lukaku's conversion rate being a product of the team.

In short, the difference here isn't huge but is noteworthy and I will try and tweak my model a bit to increase the weighting given to team conversion rates (this data is factored in, but especially at this early stage of the season I heavily regress it back to league averages).

Man City
I feel like everything I write this season includes a section on Harry Kane and Man City and so here we go again. We can see that the Opta model rates Sane and Jesus's seasons significantly higher than mine, though interestingly Sterling and particularly Aguero are much closer (with the Argentine being almost 1:1). Like Lukaku above, this is almost certainly a recognition of the quality of City's chances, although watching the games I would have thought Sterling would really benefit from this as he seems to have gotten multiple tap ins this year with his pace enabling him to catch up with counter attacks and arrive at the far post for a square ball and easy goal. Still, this one makes total sense and will again be helped by an increased weighting in team conversion rates. Although, I don't think folks really need a model to tell them that this team is red hot, and the decision is really between Sane, Silva and Sterling or Aguero and Jesus, and adjusting the team rate wouldn't help you there.

Dominic Calvert-Lewin, Eric Maxim Choupo-Moting and Andy Carroll
This unlikely group of players is a cautionary tale of over-valuing solid stats from players on poor teams and could possibly be awarded the Adel Taarabt Memorial Trophy. I will focus on Calvert-Lewin because I own him and have thus paid most attention to his games (enduring 90 minutes of Everton with regularity is true dedication). The leaderboard among forwards in SiB goes Kane, Lukaku, Aguero, Morata, Lacazette, Calvert-Lewin, Jesus. One of those things is not like the others. That group ranks 1st, 2nd, 3rd, 4th, 6th and 10th among forwards in fantasy points and Lazazette has the worst goal haul with five. Calvert-Lewin is 25th among forwards with a measly 25 points and zero goals. Here I think the issue is both a team problem (Everton have converted just 6/66 (9%) SiB) and Calvert-Lewin himself, who just doesn't seem to be taking quality shots and, watching the games, he is lively but really doesn't seem like someone with crippling bad luck. He's hit the target six times, which in a vacuum would suggest more like 1-2 goals rather than zero, and there remains solid reasons for not overly focusing on SoT over SiB due to small sample size noise, but still, I think there's probably a gap in the model and also in my common sense in overly relying on it when it comes to players racking up shot totals without digging a little deeper as to their quality. This one is harder to fix without the advanced data so it might just be a case of raising those flags before highlighting this kind of player for potential success.

Looking forward
The pleasing thing about this bit of analysis is that while we can definitely identify blind-spots in my model, it's close enough to the Opta version to suggest we're on the right path. The reason this is exciting is because by basing the xG on simple events like shots and SiB, I feel we have a good chance at predicting future xG, which becomes trickier if you are trying to forecast not only how many shots Harry Kane will get, but where he will take them from, where the defenders will be, who passed to him etc. I hope that team shot data stabilises relatively quickly and is less impacted by individual idiosyncratic events and thus we can use it with some certainty to predict team totals which can be allocated to each player to give us our predicted shots to form the basis of xG.

I am not of course suggesting in any way that my model is as good as some of the others out there at determining why what happened, happened, but in terms of translating that data into predictive information I think we're in a good place and I'm fairly happy with how the model is working for now.

Gameweek 11 Projections