Thursday, October 18, 2012

Forecasting player performance: assists

Let's try this again. The first draft of this post was erased when Blogger assumed that Ctrl-Z meant "select all, delete, save" rather than the more standard "undo". Anyway, after a few choice words were screamed at any - and all - inanimate objects in sight, the end result is probably positive for you, the reader, as the second time round is often more concise with less - though still considerable amounts of - needless waffle.

We started this series with some tentative steps into forecasting individual player data by looking at goals, and the presumption that we surely do better than "he looks in form" or has an "easy" opponent this week. This post on predicting assists is going to follow a similar path, though we will find a couple of different issues along the way that will need to be expanded on in future posts.

Before we get too deep, we need to throw out a quick caveat and acknowledge the proverbial elephant: the definition and categorisation of 'assists' is itself pretty lame and needs serious work if we were going to be signing actual players based on this data. A mazy run followed by a reverse pass which takes a deflection in the box before being tapped home is not an assist per the premierleague.com definition, where as touching your foot on an indirect free kick before your teammate curls home from 35 yards is.  The good news for us is that for fantasy purposes we don't need to be overly concerned with this. Getting in the right position and playing dangerous balls into the box should lead to success in any circumstance, accepting of course that we will likely miss out on the odd assist due to the weird rules. Over a large enough sample size though, I'm okay ignoring the limitations of the 'assist' classification for now.

What then has an impact on the number of assists we would forecast for a given player, whether for this week alone or for a coming run of games? As with the goals analysis, we could potentially dig much deeper than I will here, but to keep things simple we will look at four separate topics:

1. The positional deployment and assignment of a player
If I asked a group of readers to describe the skill set and average pitch position of players most likely to create plenty of assists, we'd get a number of different answers. Some might go to a Gareth Bale wide man capable of whipping in crosses which should lead to goals on volume of chances if nothing else. Others may look to a David Silva type, lurking in gaps between midfield and defense, ready to split the opposition open with a pin point pass. Some might even look to a forward within a deadly front two, possibly in the false-nine role, dropping off a front man, or even as a Peter Crouch type, winning flick ons and nod downs to be converted by others. As you may have guessed, the data doesn't really narrow this field down, with stats such as touches in the final third, total touches or crosses, not having particularly strong correlation to assists:

MID FWD
Percentage Passes in Opp Half 38% 25%
Percentage Passes in Final Third 35% 21%
Total Successful Passes 39% 57%
Successful Passes - Opp Half 56% 63%
Crosses 57% 55%
Successful Passes - Final Third 69% 68%
Chances Created 77% 76%

Those first two are particularly noteworthy to me, as I'll often throw an average position chart into a weekly post suggesting that player x's new advanced position might lead to more chances. Of course, this isn't to say that playing further upfield won't improve a player's fortunes, only that it shouldn't be taken as a single indicator of success. 

So we're not seeing data which correlates to assists well as shots on target do to goals, but 'chances created' clearly stands out as a useful measure and it's here we're going to focus much our attention from here on in. I think there's work to be done to possibly fill in the gap between chances created and assists, perhaps looking at the type of created chance (cross, through ball), which side of the field of came from etc but for now we'll stick with Opta's basic definition.

2. How influential a player is within his team
Following a similar trend to the goal analysis, we'll now look at how influential a player is within his own team. Unlike goals however, this is very simple for assists: simply defined as a player's chances created as a percentage of his team's total. This feels too simplistic, and it obviously is, but given the above correlations, I don't think factoring in things like total touches or passes in advanced parts of the field would really help. For some players it might not result in a huge difference, but for players who are often rotated, we will need to take care to make sure we only look at chances created while that player is on the field when determining his share. Note that I will also differentiate between stats accrued at home and on the road.

As an example, if we wanted to see forecast Arsenal's chances created at Norwich, we'd take Arsenal's historic performance on the road to date to get a +/- premium/discount of how much they are out/under performing the league against their opponents to date. We would then apply that premium/discount to the average chances conceded by Norwich at home to date:


In this case, Norwich concede 9 chances per game at home, so applying Arsenal's 35% premium would give us 12.1 forecast chances for the week.

One area I would like to investigate is how a player performed against different teams, though I have struggled to date to come up with scientific criteria which avoid my own biases. For example, if we want to know whether or not Kevin Nolan performs equally well against 'good' teams as 'bad' teams, we need a way to come up with those definitions at which point it gets a bit tricky. If we look at league position we could be classifying a team as 'good' when in reality they have a shaky defense but an elite attack. Having success against a such a team would hardly be an indicator that Nolan was dominating everyone he faced. The best solution I have for now would be to rank teams based on their chances surrendered per game, but even then we get into issues of teams faced to date, small sample sizes and even deeper problems like teams being particular vulnerable against certain opponent formations. For now then, I am going to omit this adjustment completely from the analysis, but welcome any suggestions on how to improve on this.

3. The quality of a player's teammates
This is a key point for this analysis - perhaps the key point - and will hopefully differentiate this forecast data from simply anyone who happens to glance at the Opta data and see that a given player has created a lot of chances.

With the ever changing landscape of football, along with the inherent limitation of a sport with 38 games a season (rather than, say, 162 in baseball), our sample sizes are often going to small and I think we need to just live that and consider regressing data to a mean rate rather than getting overly married to unusually high or low conversion rates. With that in mind, the below graph shows the prior year data only and attempts to classify the different attacking types in the league, to illustrate that a chance created by a Man Utd player does not have the same value as one created by Liverpool:


So the two Manchester teams along with Arsenal and, to an extent Chelsea, are the league's elite attacking units. Thanks for reading: analysis over! Okay that statement is hardly revolutionary but I do believe this chart is useful as it highlights just how potentially large the gap is. Consider that at these conversion rates, in order to generate five assists, a United player would need to create just 37 chances while his colleagues from Tottenham and Liverpool would need 57 and 100 respectively.

For fantasy purposes, the second most valuable group could be argued to be the top-left 'efficient' quadrant, as while they clearly create far fewer chances than some of their counterparts, so long as these are concentrated towards one or two players, the efficiency rates could well make for more reliable and consistent fantasy options.

Anyway, I believe this data gives us two options:
  1. We can use team average conversion rates and use them to forecast each individual player's assist totals to the expected amount to date. For example, if we forecast a Chelsea midfielder to create 30 chances over the next 10 gameweeks, we'd expect three assists to be earned (30 x 10% conversion rate).
  2. If we're not convinced about the reliability of the data, we could first regress it to the mean conversion rate of 9%, before then applying it to the individual player. This would give the advantage of removing any extremes we might be seeing due to a small sample size, but also risks losing some of the potential value of noticing true differences if we partly treat all teams as equal.
For now I am going to stick with the simpler option one and then reassess in a few weeks to see how the conversion rates have varied from prior year. If we see team's totals varying greatly then we'll have to either abandon the team rate completely, or more likely, regress the rates to the mean to reduce some of the extremes.

4. His opponents in the coming games
This one is pretty simple as we'll once again create a +/- premium/discount of how the opponent have performed at home/away to date and then apply that to the average chances created by our team we are forecasting. Sticking with our Norwich vs Arsenal game:


In this case, Norwich have been slightly above average defensively, surrendering 4% less chances than other teams in the same games to date. That means in our example, Arsenal, who average 11.8 chances per game on the road are downgraded slightly to 11.3.

One issue with this is again our sample size so we're left with another choice:
  1. Use current year data only, and accept the fact that, at best, we'll have just 19 games of data to rely on
  2. Use a weighted average of both both home and away games. This improves on the size of sample but could dilute it with unhelpful data. 
  3. Use home/away splits only but include prior year data. Again, this helps reduce the unreliability of a small sample size but risks incorporating our of date data.
I'm conscious that this might lead to fluctuations early on, but I believe it's reasonable to use current year data only, which also makes the model easier to manage. As the season progresses I will track how team averages are comparing to prior year and if significant differences can be reasonably explained. If it starts to look like there is significant variance here, we might have to consider using data from multiple years to level things out.

Making the forecast
So what does this give us in total? Let's run through a quick example to see the kind of calculations that will go into version one of the assist model:

Santi Cazorla vs Norwich (at Carrow Road):

Norwich avg chances conceded at home (table 1)
9
Arsenal score +/- away (table 1)
35%
Forecast chances conceded (table 1)
12.1
Average Arsenal chances away (table 2)
11.8
Norwich concede +/- home (table 2)
-4%
Forecast chances created (table 2)
11.3
Average forecast chances (avg of the two forecasts)
11.7


Cazorla share of chances away from home
21%
Cazorla's forecast chances created (total x Cazorla's share)
2.5
Arsenal conversion rate away from home
13%
Forecast assists for the gameweek
0.32
Table 3 - Example assist forecast. Generated by HTML Tables

So there it is. It's dirty, simplistic and may ultimately fail. But it's also a starting point and it's the best I've got so far. As always, suggestions and revisions are welcome below in the comments or in the new FPL Analytical forum, which is currently hosted at Shots on Target, but will also be integrated on this site soon. Regularly scheduled programming will now resume including the weekly forecast and a new Moneyball piece at FFS, due in the next day or so.

4 comments:

John Doe, 2008 said...

Brilliant as always. Questions/comments:

1. I hate the Blogger interface. It is truly horrendous. I have done the same thing albeit in a much smaller post. I've gone to writing the post in text pad and copying it over when finished.

2. Are you using FPL's assists or Opta's assists?

3. One area I would like to investigate is how a player performed against different teams, though I have struggled to date to come up with scientific criteria which avoid my own biases. For example, if we want to know whether or not Kevin Nolan performs equally well against 'good' teams as 'bad' teams, we need a way to come up with those definitions at which point it gets a bit tricky.

I think what I really want is to understand how players perform against types of opposition. For example, Bale and his pace might be a huge problem for a team with a huge and powerful yet slow backline (e.g. Stoke) but might not be such an issue with a team playing with 3 defenders and 2 wingbacks. Or maybe it's vice versa. Whatever the case, I think we can, at some point, get to an understanding of how players perform against different formations and player types, even if that understanding comes with a sizable margin of error.

4. Re: teammate quality. Insightful section here and one that I probably have not investigated enough.

I think that we should use a hybrid model using current and previous season data as you suggested in the next section. Using current season data is simply to prone to scheduling incongurencies and the one off poor performance or 10-men game. It does complicate things, but necessarily so in my opinion.

5. I think the model is sound and in use by others as well (s_o_t definitely uses something similar).

Good stuff.

@shots_on_target said...

I do JD! I do a pretty similar thing but I don't adjust the players form using discoutn/premium, but the teams, and the player takes his share from this. I also am using a static shot:goal conversion rating at the mo. I need to do amend this in the same way you have Chris. i have been working quite hard on unpicking htis and will have a theory to publish soon.

Something you need to know, if you don't already, is that from opta data key passes exclude actual assists.

I think the aspect of building in a teams and players style of play will have to wait for now but I believe it can be done, and will give this kind of modelling a real edge.

Great stuff Chris. Can't wait to see to what degree it affects your captain picks for this weekend. I would be dead keen to know if there has been any significant change in the forecast between your new model and old.

And finaly, I don't mind the blogger interface, had no hiccups yet. I do mind these captcha's though. It take me at least 3 goes to get one right!

AnonCargoCult said...

I was wondering whether the Key Passes included or excluded the actual assists, and had assumed it had (based on SoT & Goals). But the numbers and percentages in the graph seem to suggest otherwise.

Do we have a consensus?

Gummi said...

Excellent again Chris.

One particularly tough variable is how do we calculate for reactive managers.

For example, both Lambert and Moyes are known to adjust their tactics dramatically based on the opposition. E.g. against Liverpool, Everton pressed the fullbacks to prevent Liverpool from playing their passing game.

I think simple is the way forward, at least as a reference point, and small adjustments should be made in a way that we can compare it to the basic model.

I'm clearly not as "learned" as John Doe, 2008 and @shots_on_target, but I hope I can contribute as the season progresses.