Friday, October 18, 2013

Gameweek forecasts: a couple of case studies

Producing forecasts is a tricky business. Even with hindsight it is tough to predict the expected outcome of a given game (i.e. how shots transform into goals) and that problem increases exponentially when you also need to try and forecast the underlying data. Throw in uncertainty around how much players will play and the issue of small sample sizes and you have a recipe for some funky results over these early weeks of the season. First, to make sure the model isn't totally off track, let's look at how it performs retrospectively over the first seven weeks of the season (using actual shot totals as inputs):

Though we can see outliers in the above chart (especially at the top end of the market), the overall trend is promising and the r-squared of 57% for players with a risk factor of 2.5 or less is encouraging enough. That's not to say it's infallible, but it's a good start and for the majority of the extreme outliers we can point to specific factors which have led to their strong results (Yaya Toure is unlikely, for example, to convert his shots on target into goals at a 66% clip for the rest of the season).

So if we're happy that the model is working relatively well over a medium length period, let's get into the forecast side (which include predicting how many shots a player will get and how they will be converted) and look at a couple of actual examples from this week's forecast:

Loic Remy 7.3 points (you'll need to slide the risk slider to 2.7 or more to see him included)
Remy represents one of the dangers of forecasting in the early weeks; a problem that is compounded by the fact that Remy has only started four times. When he has played he's been nothing short of spectacular, averaging 4.4 shots per 90 minutes, hitting the target 50% of the time (a very useful rate). Despite this success, I imagine most people who asked about this forecast were confused given Newcastle's opponents. With just four goals conceded on the season, Liverpool have appeared to be a very useful defensive side and thus don't immediately jump out as a team you want to start your forwards against. However, digging a bit deeper we see a team with a +/- SiB rate of 22% away from home, surrendering the same number of SiB as teams like Villa and Cardiff (this despite playing AVL, SWA and SUN). This isn't to say, of course, that Newcastle are set to put Liverpool to the sword this week, but looking purely at the data, the Magpies' prospects this week are better than most would probably think (the model estimates them to notch around 9 SiB this week). With Remy accounting for 40%+ of his team's SiB, the model likes his chances this week, even if the intangible factors (playing time risk, wide role) suggest more caution.

Daniel Sturridge 4.6 vs Luis Suarez 4.1
If Remy is giving us trouble with five appearances, things get even tougher with Suarez who has played just twice. I suppose we could lean on prior year data but without wanting to reinvent the whole model for every idiosyncrasy, we'll just live with it. Long term I would personally back Suarez to top his English colleague in the scoring charts, though as this is a stat based site, I've nothing to base that on other than past events so it doesn't make it into the model. As always, these forecasts should support our decisions and if something seems off then we should simply ignore it.

Stevan Jovetic 7.3 (risk rating 3.9)
This is something of a "damned if you do, damned it you don't" situation as if I exclude players from the weekly listings I inevitably get questions where Player X is, and then when they're in, it seems ridiculous to rank the scarcely-used Jovetic as the top option. For clarity, all players make the listings and you can filter down to more reliable options using the risk slider. Jovetic's sample size is almost certainly too small to be particularly reliable and thus the 7.3 points ranking isn't wholly useful, but if you think this then simply ignore him. I prefer this approach rather than me deciding who is/isn't relevant and having readers miss out on a sleeper prospect they are targeting. This is the same approach used by publications like Baseball Prospectus who generate PECOTA forecasts for minor league players on the assumption they get major league playing time (and it's to the manager, player and luck whether the youngster gets his shot). For what it's worth, in his limited time Jovetic has been excellent (6 shots, 4 SiB and 2 SoT in just 85 minutes) and thus the model likes him to succeed were he to be given time in this talented City side.

Julian Speroni 5.7
This is probably the strangest forecast for the week and I must admit, I went back to recheck the data myself before posting these numbers. The fact that it's Speroni (and hence Crystal Palace) is somewhat surprising though when you consider their opponents' pathetic efforts away from home to date, it doesn't seem overly unusual to see Palace ranked well this week (indeed they would rank 3rd in the standard "goals per game" projections from last season). The issue then, is more that a goalkeeper is the week's highest ranked player with a low risk rating which doesn't feel quite right. Part of this is simply the perception of fantasy players and part of it is a lack of sophistication with the model.

On the first point, Artur Boruc ranks 9th among all players this year and three 'keepers place in the top-15 so in reality it isn't particularly surprising to suggest a 'keeper will score well (remember that they tend to earn more points than defenders per clean sheet due to the presence of saves). They lack the upside of defenders who can also score or notch assists, but for most players the chance of those events is fairly low and thus doesn't have a huge factor on a weekly ranking. The second point is a bigger issue and that's how 'keeper points are calculated. Right now, saves are awarded on a very crude average basis which doesn't take into account the propensity for earning saves in a given game. Thus, while Speroni has racked up decent save totals to date, if his chances of a clean sheet are higher this week then his save totals should go down, but the model doesn't make such an adjustment. This is unlikely to result in more than a half point variance in a given week though.

The wider point on 'keepers and to a lesser degree defenders is that the forecast is more likely to be wrong on them in a given week as their chances of success is a little bit "all or nothing" (where as midfielders can earn points for clean sheets, assists, goals and are more likely to get the bonus nod). I still feel fairly confident is using this data for ranking purposes but wouldn't suggest captaining Speroni (5.7) over someone like Michu (4.9) given the exponentially higher upside enjoyed by the Swansea man. To repeat, this data is based on logic but a simple model cannot account for every possibility so personal judgement is still required.

Hopefully these examples gave a bit more detail to how the weekly rankings are made and we'll continue to check back to see how they are performing as the season goes on (we'll hopefully see some of the stranger outliers disappear as sample sizes start to increase). Thanks for reading and for sticking with the blog during the quiet opening weeks and please continue to send your questions to @plfantasy, on Facebook or in the comments below.

1 comment:

James Richard Klien Jr said...

A great article that has been doing the rounds in parts of FPL blogland. I thought the more stat-savy readers of this blog might enjoy.