Monday, October 29, 2012

Individual forecasts: Historic player data

One of the issues of the new forecast model is deciding which kind of shots to use to forecast player success: all of them, only those in the box or only those on target. Those on target have the best correlation to goals scored, however, there is a slight concern with limiting ourselves to that data alone. Consider the below example:

Robin van Persie: Appearances 10, Total Shots 55, 10 Shots On Target, 3 Goals

If we only look at his shots on target we see that he is averaging just one a game, with 30% of them hitting the back of the net. The issue is that he has historically hit the target at a much better rate than 10/55 (18%), having a success rate more in the 44% range. We therefore need to adjust for the fact that we believe 24 of his next 55 shots (44%) will hit the target and thus his expected goals will be higher.

We have some issue about how to generate this historic rate, especially with regards to what data to use, but for now I'm happy to look at such data for players like van Persie who have a proven history in the Premier League, even if in his case, it was with another team (someone like Berbatov, who has moved to a weaker team might be different).

The next issue is whether we should adjust the rate at which shots on target are converted into goals. This one is much less clear. If you look at the below table, it shows all the significant fantasy players and the rate at which their shots on target have become goals. If you look to the bottom of the table you will note an average conversion rate, and the colouring shows whether players are above/below that average. The keen eye will note that only five players (Balotelli, Berbatov, P Cisse, Fletcher and Nolan) have exceeded league average every year while only four (Bale, Fellaini, Santi Cazorla and Silva) have consistently been below average.

My (admittedly early) conclusion is that there is little reason to think that SoT conversion is consistent year on year and, frankly, is probably a combination of luck, strength of opponent 'keeper, and of course some skill. Issues like where shots are taken from, quality of teammate etc can also be factored in, but for now I'm happy to apply a standard league average rate to all players (probably split by midfielders/forwards once I run the actual numbers). So if you hit the target 10 times, we're going to give you credit for ~3 goals.

We wouldn't want to expand that rate to include defenders of course, or possibly even the league's weaker players, but given that we're really only looking at the top 100 or so attacking players, I'm comfortable that such a generalisation will do more good than harm (are you, for instance, ready to give Cisse a goal for 53% of all his SoTs, when so far this year that rate is 20%). As always I welcome any objections/suggestions below.


Pulma said...

Great work!
I love this, I do most of my picks based on data like this. I think you can even predict assists that way, at least in long or medium term(if the situation around the players is fairly similar and they are also playing in their usual position) U can tell if someone is over or underperforming and therefor if they have a strong chance of doing well in the next weeks or bound to drop off.

One thing you should take note of is that some conversion rates of past years are a bad indicator, in case a player has taken a lot of penalties and is maybe not on them any more. I like to exclude penalty shots and goals from this analysis just in case unless you are 100% sure the players will take the penalties again this year.

SuperGrover said...

So Suarez isn't a significant fantasy player now? And I thought the media was tough on him! :)

Interesting stuff. Do you have the data on shot accuracy as well (i.e., shots on target per total shots)? You seem convinced accuracy is somewhat predictable by player so it would be nice to see that data as well.

Also, I noticed you have Cazorla in here. Do you feel comfortable interspersing data from major European leagues? I am not sure I am there yet.

Jimbo said...

Great stuff,love the site.I have a question about this've said you think it best to use the league average for each player because their sot conversion rate varies each year.isn't the same thing going to be true for the number of shots someone is getting?it might be high after 9 games but then revert to the mean over the season and if you looked over 3 years it would be up and down.would it maybe be better to use an average of the 3 years for each player instead?

JT said...

I've pondered this very conundrum many a time when doing my own personal projections. I've come to the conclusion that some adjustment should be made for individual players based on personal historic rates where I think they're sustainable, i.e. with elite players such as RVP.

As has been discussed before on these pages I also like to look at the player's proportion of shots from inside/outside of the box and this slightly influences the rate at which I predict the player will convert shots in the future.

I'm therefore happy to predict that RVP will consistanly have a higher conversion rate compared to Bale, partly due to the higher proportion of RVPs shots consistantly being inside the box and partly due to his consistently higher conversion of shots on target.

I do however adjust an individuals conversion rate to bring it closer in line to the league average, even more so when I'm dealing with a player who hasn't had a couple of seasons of consistent numbers. I think some consideration of historic rates is important but I also think it shouldn't have too much of an influence on the final goal projections.

JT said...
This comment has been removed by the author.
JT said...

Oh, and I also agree with Pulma in that I'm always sure to remove shots resulting from penalties and have a separate way of projecting increased point returns for penalty takers based on the league average of penalties awarded and converted/not converted. It's possible to argue that different players are more/less likely to convert penalties but I decided that this is a little too much as predicting penalties is hardly an exact science as it is!

For example, on top of my projections for a player, I add up to 0.48 points per game for strikers who take penalties and up to 0.6 points per game for midfielders. I reduce this number if I feel there is more than one penalty taker at the club.

Keep up the good work! :-)

gurka said...

Chris I am trying to make a bit of a case for Steven Gerrard in the 9+ midfield bracket.

I noticed he is returning a more shots on target then in the past, is he still getting forward though? or just having less shots but more on target (I have not watched liverpool much)

Keen regular reader keep up the good work.


Chris Glover said...

Pulma - It's a good point on penalties but the key is that I WON"T be using these conversion rates as my central thesis is that they are essentially random every year. For now, I'm just using a conversion rate of 35% for forwards and 30% for midfielders until I see strong suggestions as to what else to do (those numbers are based on all players from prior year per Opta).

SuperGrover - haha, not sure what happened to ol' Luis. Not surprisingly his rates aren't great (23% and 17%) based on ESPN data which does not split between long and short shots (of which he takes too many of the former).

You're right I keep saying I believe that's skill but I should really substantiate that claim. The problem (and it's also a problem for the posted goal conversion % I should have mentioned) is that ESPN only gives total shots, not those split between inside and outside the box. I decided that wasn't a huge issue for goal% conversion as I'm really focusing on forwards who tend to shoot inside the box, but for on target % it's going to be an issue. I'll post what I have but it's going to be heavily caveated.

I agree on the Euro data. If I was going to use this data to be predictive I wouldn't include it.

Jimbo - you might be right but my current thinking is that while player ability tends to stay constant (that of course can change, but bear with me) the number of shots they receive can be massively influenced by their situation and thus I wouldn't want to cloud Berbatov's forecast for example by giving him credit for his time at United. For the total shots a player gets given in the forecast I look at the % of his team's effort he gets and then use the team forecast to generate the prediction, rather than the player's own average (which could be subject to strength of schedule issues). So if van Persie gets 33% of United's shots and United forecast to get 15 this week, he gets forecasted for 5.

JT - well said. Totally agree with all the points. There's definitely work to be done, and the main problem is that I don't have that in/out data for all three seasons. You've given me an idea though: we have full data for 2011-12 so we could develop a profile for each player of in/out box shots split and then apply to prior years. It won't be ideal but it's better than what I'm currently showing. Thanks man.

Gurka - I must admit I've been eyeing Gerrard myself who continues to put up decent numbers. Liverpool have the third most shots inside the box this year and while they've not quite clicked yet you do get the feeling they're improving. Strangely they've disappointed in the last two home games but have impressed away from home, though that wouldn't be a big problem for Gerrard as you'd play him every week. The upcoming games look okay but Liverpool get crushed in the forecast because of their brutal shot conversion rate this year (still just 9% only for shots inside the box, Some team have 7% on shots OUTSIDE the box). I like him and there's reasons to support the pick, but however you slice it, it would be a calculated risk.

Gummi said...

Excellent yet again. Thanks for all your work on the subject.

Although it is in some ways a disappointment that all this analysis leads to us using the league average, we can still be comfortable using those numbers after your analysis.

@JT: The problem with doing selective adjustments is that it goes against he reason for using a model. Perhaps it is an easy call for RvP, but the selection process gets muddy quickly. Isn´t it better to use the model "cleanly" and then make your judgement calls?