Saturday, December 16, 2017

Raising their game or feasting on the weak (or, which teams do players do well against)

However good a player projection system is, it will always have certain assumptions built in which require some judgement on behalf of the model's creator. This might include how much weight to put on recent games versus historic data or how much to regress team or player conversion rates back to a league or historic average. One such assumption I do not currently factor into my own model is which kind of games a particularly player performs well in. For example, if Harry Kane accounts for 30% of his teams shots inside the box and Spurs are forecast for 10 SiB then his forecast will be three SiB regardless of who the opponents are. The strength of those opponents is of course somewhat baked into how we get to the 10 SiB projection in the first place, but no attention is paid to whether Kane has tended to over or under perform expectations against weaker or stronger teams, or whether he's struggled against teams who deploy three centre backs.

The data below takes the first steps to trying to factor that data in. Let's look at Harry Kane as an example:


I think there is a perception among some that Kane hasn't punished weaker opponents this year, which probably arose having failed to net against Burnley, Swansea and Bournemouth at home to start the season. In terms of expected goals though, we can see that he has actually excelled in all but one of his easiest fixtures (GW11 vs Crystal Palace). Generally his chart is exactly as an independent observer would expect - with him putting the sword to weaker opponents but having less potential success against stronger teams. In Kane's particular case there aren't any huge lessons here - you're not dropping him against anyone - but I do think this is a data point against necessarily captaining him without much thought to Spurs' opponent.

Let's check in with a couple of other players who have interesting profiles, then the visualization is at the bottom of the post for you to play around with.

Lukaku
Lukaku is probably the name most synonymous with feasting on weaker opponents, and this is indeed borne out in this data. In 10 fixtures which rank as easier than average, the United front man has averaged 0.8 expected goals per game, compared to seven trickier fixtures in which he's averaged just 0.3 expected goals per game. In real life this is a problem for Jose Mourinho, but in fantasy terms it's arguably beneficial to have someone who is predictably good against weaker sides and struggles against the top sides as he becomes easier to transfer in and out of your team (or captain). From GW19-24 United face only one fixture that ranks significantly harder than average in terms of xG conceded, making the Belgian a better target than I think most people realise.

Richarlison
Richarlison is generally one of the most interesting stories of the season so far, and his data here doesn't disappoint. The Brazilian has absolutely dominated in his sides tougher games, amassing 4.5 xG in 9 games against above average opponents (scoring 3 goals in the process). Yet, in eight easier games he's still netted twice but based on significantly worse underlying data (just 1.7 xG). This could potentially be a case of Richarlison excelling awat from home, where his pace can be better utilized in that inside forward role, where as at home he is perhaps getting isolated out wide as teams sit back and make Watford break them down. I don't know if this would encourage me to drop Richarlison in GW20 versus a shaky Leicester side, but it would definitely make me feel better about starting him during his away games over the holiday break (perhaps even at Man City depending on the strength of your squad).

Lacazette
This is someone I have paid attention to for the past few weeks after making a panicked transfer to bring him in when I had a big pile of cash to use on a forward (but not enough to snag Lukaku or Kane). His data isn't super interesting but it does tell a cautionary tale about relying on a single data point to make a decision. If you sort all forwards by Opta's xG then through 17 weeks you'll see Lacazette sat nicely in fifth place behind Kane, Lukaku, Aguero and Jesus i.e. right where his price tag says he should be. However, teasing the data out a bit shows that he amassed 3.0 of his 8.4 expected goals in a single fixture (GW15 vs Man Utd), without which his total for the season would be down with Okazaki and Abraham. Now, it isn't totally fair to start cherry picking games to remove from a player's season but it's worth remembering to dig into the data a bit more as we progress into the halfway point of the season and not be overly reliant on season totals on their face. On the positive side, this data shows that for the most part Lacazette has performed well in his easier fixtures, making him somewhat worth keeping in mind should Arsenal enjoy a couple of good fixtures and you need a short term fix.

I wanted to share this viz now to try and help with transfers over the busy period but will come back to it with any further players of interest I find and a corresponding viz for assist potential.


Friday, December 8, 2017

The case for the other Liverpool wide man

If you are reading this then you almost certainly own Mohamed Salah. The Egyptian winger has been sensational for Liverpool this season and fantasy managers have responded by adding him to their teams by the thousand. At the time of writing is ownership is up to 51% and his value has increased by almost a million pounds since the game started. Today's piece is not about him though, but his colleague who hopes to operate on the other side of the field - Sadio Mane. A red card, injury and the form of Salah have pushed Mane out of many managers minds, but as teams begin to converge and differentiation becomes increasingly difficult, the Senegal star is a promising option.

First, a quick word on Liverpool. I think many people might be hesitant to invest close to 20 million on two Liverpool players (assuming you already have Salah) but I might suggest that fear is misplaced. They've scored just two goals less than Man United, four more than Arsenal, five more than Chelsea and ten more than Tottenham (based on Kane's 37% ownership and Eriksen / Ali's combined 30% ownership I'm assuming many manager out there has over 20 million invested in a Spurs duo). Liverpool are 3rd in shots inside the box, 3rd in total shots, 2nd in shots on target and 3rd in created chances. Opta's xG metric also places them 3rd with 29.3 expected goals to date.

While they can't match Man City's firepower, one major advantage to this Liverpool side is that there is no real question as to how the attacking options will line up when all healthy and not being rested. Salah, Mane, Coutinho and Firmino are a terrific foursome - as evidenced by some of the football on show against Spartak Moscow this week - and there's really no one knocking at the door and demanding minutes to replace them, other than when they simply need a break. There is a perception that Klopp is rotation happy, but putting aside his injuries and suspension, Mane has only been benched twice, and one of those games he didn't feature at all, which would have at least allowed your sub to come in. With Champions League football now behind us for a couple of months, I would be very surprised to see Mane miss more than the odd game over the next 6 weeks or so, and the fact he sat out against Brighton could actually be seen as a good thing, and he's less likely to be on the rotation block in the next couple weeks.

Contrast that with Man City where one of Aguero and Jesus needs to sit almost every week and the likes of Bernardo Silva and Gundogan are waiting in the wings to steal minutes from your fantasy men, or even across the city at United where only Lukaku and Pogba (when available again) are really locked into that team.

Let's have a quick look at some data for the elite midfield options, alongside a chart showing how Mane's ownership is lagging behind his production and certainly his potential:


Mane isn't in the same class as the top three picks here, although they each come with a drawback, be it saturation of ownership (Salah), price tag (Sanchez) or rotation threat (Sterling). Among the rest of the group though, Mane can certainly hold his own with my biggest concern being the lack of xA which might suggest he's a little one dimensional with his fantasy production. That said, his 1.8 CC per 90 mins and his 13% share of Liverpool's CC when he plays aren't bad numbers so there is hope that his assist threat might tick up a bit as he plays a few more games. I think the biggest shock name on here might be Hazard who doesn't seem to get much love from xG models. I suspected this was a flaw in my own numbers (my player projections don't tend to rate him as elite either) but it looks like he suffers under Opta's model too. This is worth digging into as to why Chelsea (and Hazard specifically) might be outperforming their underlying numbers, but however much you inflate him numbers, it will be tough to justify the extra 1.4m investment required to get the Belgium over Mane, based on these numbers alone (strength of fixtures and team coverage also need to be covered of course).

In the remaining six fixtures of 2017, Liverpool are set to welcome Everton, West Brom, Swansea and Leicester to Anfield, with a tricky trip to the Emirates being the only obvious concern, although even then Arsenal only rank 9th in xG conceded at home. After that there are a few tricky games over a relatively short period, but that's also getting into the new transfer period when wildcards are in play so I am personally happy to try and milk this underrated Liverpool side for the next month or so, letting everyone else put all their money into Chelsea or keep it in a Spurs side who have failed to hit the expected heights so far this year.

Thursday, November 16, 2017

Bench pressing

As alluded to in last week's post about Guardiola's "wild" rotation (or lack thereof), I am currently of the opinion that the various City options that are available (plus a couple of other elite players around the league) are so good that it is worth absorbing the inevitable rotation hits that come, even as those punches start to increase in number over the busy holiday period. One of the risks and possible downsides here is that with rotation generally not being something one can plan around (unlike, say, actual injuries or suspensions), the characteristics of your bench need to be tweaked somewhat.

Two key points need to be changed in my own personal bench strategy. First, I need to pay more attention to it. I am generally pretty focused on maintaining a good substitute keeper that rotates nicely with the other option and then one other sub, but after that I do not prioritise having a deep bench. Second, because you won't necessarily be able to choose when to deploy your bench options, I think you are better focusing on higher frequency scoring events such as clean sheets or bonus points rather than just goals. For example, if 2 players are each expected to play 90 minutes in each of the next six games with one projected for 3 - 3 - 5 - 2 - 3 - 3 points where as the other is projected for 2 - 3 - 4 - 7 - 2 - 2 then while the second player is expected to score an extra point overall, one can make the argument that seeing as you expect to deploy this player just once or twice, you are better off with the 5/6 chance of earning more than 2 points with player one than the 3/6 chance with player two. Of course, this only applies to an extent and if there is a bargain player who offers significant scoring potential then they are obviously the better pick.

The below table shows midfielders and forwards available for under 5.0m (though you can set the threshold as you wish) and plots the percentile they fall into with regards to:

  • the share of their team's BPS they account for (which I am using a proxy for bonus point potential),
  • clean sheets (for midfielders only)
  • expected goals
  • expected assists

I have highlighted a few names below the chart that I will personally be considering as stocking stuffers for the holiday season.



Tom Carroll
Carroll has a bit of a cult following in fantasy circles though despite his promising underlying numbers he hasn't really been able to produce much by way of concrete returns. Here though we're interested not just in his assist potential but also Swansea's reasonable clean sheet record and his solid share of BPS which suggest bonus points could flow his way, especially given Swansea's lack of established stars. However, the model doesn't really like Swansea's defense going forward and their opponents in January - TOT, @NEW, LIV, ARS - make me nervous. Still, he's played every game and doesn't look likely to lose his place to the ineffective Sanches and so for 4.5m Carroll remains a reasonable bench option.

Nemanja Matic
Matic was one of the players I had in mind when considering this strategy and the data supports this thought process to a degree. What you're buying here is a very high chance of a clean sheet to accompany an almost guaranteed appearance, though bonus points are always going to be hard to come by with several other good players in this United side. I would prefer Matic at 4.5m as you are really not getting a great deal of offensive threat here but the fixtures from mid-December to mid-January are so good that one could easily see him racking up five clean sheets in eight games, which is just as good as a goal but spread evenly over the fixture list.

N'Golo Kante
Kante was the other player I had in mind when I started this and he too looks like a decent fit here. Chelsea's defense isn't quite as good as United's to date, although the model likes them slightly more in the coming weeks. Kante also provides a fairly significant increase in attacking potential, though again doesn't offer much by way of bonus point potential with other players like Morata and Azpilicueta attracting the lion's share to date. Given his increased attacking threat and Chelsea's incredibly favourable run of fixtures in December and January, Kante is my number one pick here, even at a slight premium to some of the other bargain basement picks.

Tom Cleverley
Richarlison has rightly won the plaudits in this Watford midfield and Doucoure has received serious backing having notched 4 goals to date already, but I actually quite like the unheralded Cleverley as a bench option here. The former United man actually has a slightly higher xG than Doucoure (though neither offers much to get overly excited about with Doucoure's 4 goals from 5 SoT incredibly unsustainable) though it's once again his ability to chip in across the categories we're interested in. Watford's defense projects reasonable well despite being leaky of late and another healthy of fixtures over the holidays look promising.
I'm still not entirely convinced by this strategy, with the other alternative being to grab 13 or 14 affordable players to build a balanced squad where your bench options can offer more than just scraps. With Aguero's potential injury I was considering this alternative strategy but I can't see not owning the Argentine, Kane and Salah when healthy which pretty much rules out a totally balanced side. Thus I will try and spend the next couple of weeks strengthening my awful bench for the inevitable rash of rotation frustration coming our way.

Tuesday, November 7, 2017

On Guardiola's rotation

There seems to be a growing chorus that Man City players are becoming harder and harder to own due to constant rotation from Pep Guardiola. There are two key pieces of information that are useful to assess how damaging rotation can be to a player:
  • The predictability of when the rotation happens (before of after Champions League games, away from home etc),
  • If not selected for the first team, how often do they come on as a sub. If you have a decent bench then - while frustrating - you can deal with players not playing at all. However, if they consistently come off the bench for only 15 minutes or so then you are lowering your odds at success.

Predicting team sheets is not really my forte (nor a particular interest) so we're going to focus on the second point here:


We can see four of the City options have spent time on the bench this season while De Bruyne and Silva have been ever present to date. The calculation for this pair is slightly different as you're getting increased certainty but with lower upside (their 5.8 and 5.5 points per appearance trail their teammates). De Bruyne also comes at a fairly significant premium which personally excludes him from consideration for my own team which I am looking to spread the wealth more evenly, but I can see the appeal of the Belgium's overall package.

Let's look at the other two midfielders first: Sane and Sterling. Each has spent some time on the bench, though Sane is coming off 6 straight starts and has clearly won the confidence of Guardiola despite his young age. The first point to note is how incredible this pair have been to date. For simplicity's sake I am going to continue to use points per game (PPG) numbers here (where as I would usually prefer underlying stats) but I do so with the confidence that their success has been anything but a fluke.

In his 7 starts, Sane has averaged 8.1 PPG, while Sterling slightly eclipses him with 8.2 PPG in his 6 starts. For comparison, the 25% owned Eriksen is averaging 5.7 PPG while his 21% owned teammate Alli is further behind with just 4.9 PPG. Thus, while it is extremely frustrating to have your player on the bench, you need to keep in mind that when they play, they are really operating on another level of production so far this season.

On the subject of being benched, between them, this duo have been benched 7 times, getting some playing time on all but one occasion. With the busy holiday period on the horizon I think these rest numbers could increase somewhat and there might be occasions where these players are completely left out of the days action, but, when they have been on the field for a short cameo, the results have been far from a disaster. In their 7 sub appearances, Sterling and Sane have averaged just 24 minutes but a frankly absurd 4.1 PPG. This number is of course skewed by Sane's brace against Liverpool but still, the pair have delivered returns in 3 of their 7 sub appearances. For reference, that 4.1 PPG compares favorably to how players like Firmino (4.3) or Ozil (3.3) have been performing while playing full matches.

Let's put together a scenario to see how this could play out, using Sterling as an example (who my model likes marginally more but you could replace with Sane without changing the conclusion):

In 10 games over the holidays, Sterling will get 7 starts, 2 sub appearances and will be dropped totally for one game. If he continues to average 7 PPG in those starts and 3 points coming off the bench then he'd accumulate 55 points. We'll then throw in another 2 points for your bench player who will replace him when he misses out. So a total of 57 points for an investment of 13.2m (8.2m for Sterling (or less if you've held him for a while) plus a 5.0m scrub on the bench who just needs to show up).

How else can we piece together such a return? Assuming we could find a player who will start every game over the holidays (doubtful) we're still struggling. Mahrez? 5.1 PPG. Mikhitaryan? 4.7 PPG. The pricier Alli? 4.9 PPG. Eriksen's 5.7 PPG would get you to the 57 points but you've spent 1.5m more money and need him to play every week just to break even. If you played the fixtures well, you might be able to do something with Richarlison and another mid-level option, but then, you can easily afford those players alongside Sterling anyway and it isn't like there is five of them to build a balanced, mid-range midfield (in fact, the list gets extremely limited after Richarlison, maybe Ramsey, I suppose GroƟ).

Let's build this the other way. Let's assume one of the above options can get you 50 points. At 7 points a start and 3 points per sub appearance, Sterling would need just 5 starts and 4 sub appearances to match that output. Then you've got to factor in the huge upside of him playing more due to injuries to others, being more productive off the bench etc, getting extended sub-minutes after City wrap up the game in the first 45 minutes etc.

Up front, one could conclude that the presence of Lukaku, Morata and Kane makes Aguero easier to ignore as they are ready made options who provide a great balance of upside and certainty. But, if you believe in his numbers to data, Aguero is operating on another level, averaging over 9 PPG compared to the ~6 PPG offered by his peers. This means that if Lukaku has continued his 5.8 PPG rate over the full 10 games, 5 games of Aguero plus 5 games of a bench scrub would also come to the same total (5x9 + 5x2 = 55 points). Throw in the fact that most people feel you need two of these elite forwards, and you now need to find two of these options who can match Aguero's prowess if you are going to ignore him.

Jesus is arguably the easiest player of the bunch to ignore given his more modest PPG total (6 in starts, 5.4 overall) and his relative lack of stature in the team.I It's true that City lack great depth up top, but Guardiola has also only deployed both his front men in 4 of 11 fixtures this season, and not since GW5 and one has to feel that Aguero gets slightly more games than his young teammate, all else being equal (Sterling or Sane could even be deployed up top if needed). If you are looking to really spread your money across the team though, I can still see a scenario in which the Brazilian makes sense.

Risks to this analysis
There are, of course, risks to this analysis. The most obvious is probably that City slow down and the gap their players enjoy over their peers is cut. This is definitely possible but I haven't seen many flags in their data to suggest it's on the horizon. You're also not locking anything in either, so if that does happen you are still well placed to jump ship for the other options discussed above. The next risk is that by using such a crude measure (PPG) we are being too simplistic and not accounting for quality of opponent.

However, over a 10 game run you are going to get a reasonable averaging out of opponents and in this particular case, City's 10 game stretch from Dec 3 - Jan 20 is arguably no harder than their fixtures to date, with only three fixtures looking really tough on paper (@MUN, TOT and @LIV although the latter might even be a stretch given the way City massacred Klopp's men in GW4).

My own approach
Personally I am happy to absorb the risk associated by Aguero, Sterling or Sane but will place extra emphasis on sorting out my bench to ride the inevitable benchings. This City side is special and with several mid-priced options available to get access to it, I simply think the opportunity is too good to pass up.

One small side note, is that when selecting my bench I am going to lean more towards certainty of playing time and look to players who might log consistent extra points from clean sheets or bonus points, knowing that they will be called into action but without me being able to choose when to deploy them. This means that players like Kante and Matic will come into play, more so than the likes of Choupo-Moting or Ince who project well but lack the consistent returns to justify the higher price tag given their specific role in my team.

Wednesday, November 1, 2017

Expected Goals - a comparison with Opta

The popularity of "expected goals" as a metric has exploded over the past year or so, with mainstream TV broadcasts now starting to dip their toes in the water of advanced analytics. One inevitable, if slightly unfortunate, consequence is that there are now multiple xG models, which could potentially disagree by a reasonable amount, which to those who need a bit more persuasion as to the merits of statistical analysis, might suggest a lack of accuracy. This has somewhat been the case in baseball with the two big "Wins Above Replacement" (WAR) metrics sometimes disagreeing by a relatively large amount, especially when it comes to valuing pitchers. There is sound methodology behind each metric, of course, but for those who aren't well versed in the intricacies of the debate, the differences can be distracting and serve as fuel for those who want to dismiss analytics and focus on old fashioned "eyeball tests" etc.

I, of course, have my own model which probably predates a lot that are currently around but also lacks some of complexity that is now possible with the proliferation of individual event-by-event data from Opta, which allows one to do a better job at understanding the likelihood of a goal based on the exact location of a shot. My concern is that this model is confusing issues, although with my readership of 15 people against the millions of people exposed to Opta stats, this isn't a huge concern!

Nevertheless, let's first take a look at how different my model is than Opta's, then have a look at a few examples of the kind of player my model gets wrong, and then finally a few words on why you should continue (or start!) to care about the projected data in these electronic pages.

The above data is only based on 2017-18 data and the correlation looks very promising (an 88% r-squared is likely distorted a bit by a high volume of players with very low xG, but still, you can see a strong correlation between the two models and not many wild outliers), especially as there is still something of a small sample size issue with just 10 games in the bag. In terms of looking historically to analyse a game that has happened there is no argument that the Opta model is more sophisticated than mine but I am happy that the data here shows that what I've been offering is at least based in science and comparable to those more learned than I (thanks to ill advised transfers and a failure to heed my own advice, the league position of my own fantasy team doesn't always give the same assurances).

We can however see a few names whose xG varies quite significantly between the models and I want to highlight a couple of these to illustrate where my numbers need to be taken a pinch of salt:

Harry Kane vs Romelu Lukaku
Readers will know that my model loves Kane and he's been the top ranked player for much of the season. However the Opta model liks Lukaku a little more to date with the United man being worth almost a half goal more than his Spurs rival. In terms of sheer volume of shots, this outcome is hard to compute, with Kane taking 20 more shots than Lukaku and outshooting him 38 to 31 inside the box (all in one game fewer after this week's hamstring issue). If you want to take one step further away from shots then Kane still enjoys an advantage with 67 touches in the box to Lukaku's 54. Opta's model therefore must see some additional value in the quality of those chances, which is somewhat hinted at in his 12 big chances compared to Kane's 10. United have been more clinical, converting 18/29 (62%) of their big chances and 21/94 (22%) of their SiB compared to marks of 11/20 (55%) and 16/98 (16%) respectively for Spurs, though this argument gets a bit circular as others would argue that this efficiency is an effect of Lukaku's presence rather than Lukaku's conversion rate being a product of the team.

In short, the difference here isn't huge but is noteworthy and I will try and tweak my model a bit to increase the weighting given to team conversion rates (this data is factored in, but especially at this early stage of the season I heavily regress it back to league averages).

Man City
I feel like everything I write this season includes a section on Harry Kane and Man City and so here we go again. We can see that the Opta model rates Sane and Jesus's seasons significantly higher than mine, though interestingly Sterling and particularly Aguero are much closer (with the Argentine being almost 1:1). Like Lukaku above, this is almost certainly a recognition of the quality of City's chances, although watching the games I would have thought Sterling would really benefit from this as he seems to have gotten multiple tap ins this year with his pace enabling him to catch up with counter attacks and arrive at the far post for a square ball and easy goal. Still, this one makes total sense and will again be helped by an increased weighting in team conversion rates. Although, I don't think folks really need a model to tell them that this team is red hot, and the decision is really between Sane, Silva and Sterling or Aguero and Jesus, and adjusting the team rate wouldn't help you there.

Dominic Calvert-Lewin, Eric Maxim Choupo-Moting and Andy Carroll
This unlikely group of players is a cautionary tale of over-valuing solid stats from players on poor teams and could possibly be awarded the Adel Taarabt Memorial Trophy. I will focus on Calvert-Lewin because I own him and have thus paid most attention to his games (enduring 90 minutes of Everton with regularity is true dedication). The leaderboard among forwards in SiB goes Kane, Lukaku, Aguero, Morata, Lacazette, Calvert-Lewin, Jesus. One of those things is not like the others. That group ranks 1st, 2nd, 3rd, 4th, 6th and 10th among forwards in fantasy points and Lazazette has the worst goal haul with five. Calvert-Lewin is 25th among forwards with a measly 25 points and zero goals. Here I think the issue is both a team problem (Everton have converted just 6/66 (9%) SiB) and Calvert-Lewin himself, who just doesn't seem to be taking quality shots and, watching the games, he is lively but really doesn't seem like someone with crippling bad luck. He's hit the target six times, which in a vacuum would suggest more like 1-2 goals rather than zero, and there remains solid reasons for not overly focusing on SoT over SiB due to small sample size noise, but still, I think there's probably a gap in the model and also in my common sense in overly relying on it when it comes to players racking up shot totals without digging a little deeper as to their quality. This one is harder to fix without the advanced data so it might just be a case of raising those flags before highlighting this kind of player for potential success.

Looking forward
The pleasing thing about this bit of analysis is that while we can definitely identify blind-spots in my model, it's close enough to the Opta version to suggest we're on the right path. The reason this is exciting is because by basing the xG on simple events like shots and SiB, I feel we have a good chance at predicting future xG, which becomes trickier if you are trying to forecast not only how many shots Harry Kane will get, but where he will take them from, where the defenders will be, who passed to him etc. I hope that team shot data stabilises relatively quickly and is less impacted by individual idiosyncratic events and thus we can use it with some certainty to predict team totals which can be allocated to each player to give us our predicted shots to form the basis of xG.

I am not of course suggesting in any way that my model is as good as some of the others out there at determining why what happened, happened, but in terms of translating that data into predictive information I think we're in a good place and I'm fairly happy with how the model is working for now.