Wednesday, September 9, 2015

Which stats should we focus on?

With the proliferation of Opta and multiple news sources starting to dip their proverbial toes into the world of statistical analysis, casual fans have access to a greater depth of data than at any point before. Converting that data into useful information therefore comes more and more into focus and that’s why we need to review any proposed “advanced” metrics to ensure they remain relevant and as accurate as possible (all while acknowledging we are a long way from even touching the kind of analytics that are prevalent in other sports).

Before we go on it should also be noted that any reference to terms like “advanced” should be taken lightly. By “advanced” I mean, slightly more useful than looking at the “goals scored” chart and assuming that the past explains the future. I am not a stats professor nor even a student and more complex models surely exist which might shave a point or two off the margin of error from the analysis in these electronic pages. However, I believe the outputs are good enough to raise warning flags and give clues of over and undervalued assets, and that seems sufficient to warrant committing these words and numbers to my little corner of the Internet.

Which stats to look at
One of the challenges of the increased availability of data is which particular stats do we pay attention to? If we’re trying to forecast individual goals do we want to know how many total shots they have or just those on target? Do we care how often they play? How about their involvement throughout the game, using total passes or touches as a proxy?

The simple way I approach this is to look at the correlation between the individual stats and the goals scored on an individual level. It isn’t perfect, of course, and one could look to exclude players who didn’t play enough or who are obviously not fantasy targets (holding midfielders) but for simplicity, my sample is the entire population of players from the prior season. The obvious starting point is shots, based on the motivational-poster-inspired adage of players missing every shot they don’t take. Those results throw up some predictable but not irrelevant notes:

Total shots 86% correlation to goals
Shots outside the box 58%
Shots inside the box 88%
Shots on target 90%

Unsurprisingly, if you take a lot of shots you have more chance of scoring, and those odds increase if your shots are (a) closer to goal and (b) hit the target. These points sound obvious when written down but they bear repeating to avoid the potential pitfall of equating the value of a long range effort with those of a short range attempt. The final point to note here is that while shots on target show the best correlation, that event also occurs the least and thus can take a little longer to stabilize. For example, if we calculate the SoT and SiB per 90 minutes for players through four gameweeks of last season and compare the results with the final stats in those categories, we see a 50% correlation for SoT but a 65% for SiB. In other words, after four weeks we have a more solid grasp on how many shots inside a box a player will accrue versus his shots on target and thus forecast models may be a little more reliable. This is just one way of looking at the issue, of course, but it’s how I’ve cut the data up and represents one of the reasons I tend to gravitate towards SiB when making quick decisions about players, particularly in these early stretches of the season.

After shots, what else might equate well with goals? More playing time? Increased involvement during games? Let’s take a look:

Minutes played 43%
Total passes completed 10%
Touches in the penalty box 82%

The first number is a bit misleading as you obviously wouldn’t want to completely ignore playing time when selecting your team (Leo Messi is unlikely to rack up many Premier League fantasy points despite his ability as he will log exactly zero minutes this season). The reason I included that number is to highlight that just because a player is playing, doesn’t mean they will have success. We often see debate about a player being “locked” into a starting lineup or suggestions that being captain guarantees him a starting spot and thus success. These factors don’t necessarily forecast glory for the given player.

As for involvement we can see that it’s both vital and essentially useless, depending on the type of action a player is involved with. As with most of this rudimentary analysis, I think we all know that the kind of touches the Michael Carricks and Clause Makeles of the world produce are likely unlinked to goals (at least in terms of directly scoring them) but yet we still often see the nice “action zone” maps highlighted on blogs or the media as evidence that a player is somehow “running” his team and therefore likely to share in the spoils. It should go without saying: context it key.

Touches in the penalty box is a very nice stat as it combines a good overall correlation with goals (82%) while also stabilizing very quickly. Indeed, if we compare the average number of touches in the penalty box players accumulated after just four weeks of last season with their final season totals we see a very promising correlation of 73%. In other words, after just a handful of games we can already identify a stat in which we have relative confidence that it will not only lead to goals but will also continue to be generated at a reasonably consistent pace. For the record, a players actual goals scored after the same number of weeks correlates just 44% with his future goal haul.

I always tend to dwell on goals analysis for two reasons. First, goals are the dominant part of the game and still dictate much of our fantasy success (few non-defenders can be productive without at least some contribution in the goals column). The second reason is that the picture with assists is much clearer thanks to Opta’s extremely useful – if potentially subjective – chances created stat. Last year saw a 95% correlation between that number and eventual assists, so there’s little reason to really push too much further into total passes, crosses or any other number of potential stats. There could well be value in looking as to what type of created chance leads to more goals, or whether having better players around you leads to a better conversion rate, but for now, we can just note that to find assists, just chase the players creating the chances (again, not rocket science).

I hope this served as something of a primer for the season and a warning for new readers (or reminder for old readers) that the main conclusion is that this is all just the ramblings of a crazy guy with a spreadsheet.


Milosh Stojkovic said...

great read=>big repect

Tim Walpole said...

This is worth a read (if a bit heavy):

I am wondering if shots on target in the box (if you can get that from anywhere) might be the best indicator?

Johnathan Cole said...
This comment has been removed by the author.