Saturday, September 26, 2015

Players' share of team totals

It wasn't really my intention to roll out pieces of the model in various stages but I've been a bit slower than I hoped in finalising this year's version so wanted to at least present the different pieces as they're available. We first looked at the team +/- which gives an indication of how a team might perform in future weeks beyond a simply shots/game type metric which fails to adjust for strength of schedule. Next up is to look at the players' share of their team totals, which will help turn the forecast team data into something we can use for individuals.

This isn't a complex calculation, but a couple of points are worth noting:
  1. The calculation excludes any games the player misses and only uses team data from games they appear in. It isn't, therefore, the same as simply looking at player's shots to date for the season divided by his team's total. 
  2. I do not make an adjustment for minutes played, so players who make a lot of substitute appearances will suffer a clouded picture. If, for example, a player comes on for 10 minutes and registers his team's only shot inside the box but the team had four before he came on, his percentage for the day will be registered as 20% even though in reality it should be 100%. This is an unfortunate drawback of relying on aggregated data which doesn't include sufficient data tags to identify when individual events happened.
With those caveats in mind, here is a quick visualisation of player's share of team totals to date. 

Thursday, September 17, 2015

Fanning the Flames, Dousing the Fire Gameweek 5

I was away last week visiting beautiful Whistler (highly recommended to all) but we're now back on track with this week's Fanning the Flames piece. We're not too far removed from the Gameweek 3 piece so there's some familiar names and narratives here, but let's press on regardless.

Alexis Sanchez continues to produce shots at an alarming rate but simply cannot catch a break. The Chilean's 28 total shots and 21 SiB lead the league and come close to doubling the totals of his fellow midfielders. His production is a microcosm for Arsenal's "struggles" to date with the team leading the league in SiB by a distance (ARS 79, SOT 59, MCI 51) yet only having five goals to show for their efforts. Sanchez has had a large amount of his efforts blocked which is somewhat tricky to explain, and a quick look at last year's stats tells a similar, if less dramatic story (42/121). Last year also tells us that he was able to convert 13 of his 71 SiB into goals and so there's really no reason to expect his conversion rates to be particularly unusual. With seven SoT we'd expect him to have at least a couple of goals by now and we can be fairly comfortable that continued play like this will lead to such rewards in the future. Indeed with just 13% ownership, and a decent run of fixtures on deck after a trip to Stamford Bridge, he seems like a strong buy candidate.

Harry Kane has been somewhat maligned this year with some "one year wonder" whispers creeping into the darkest depths of internet comment sections (partly reflected in his early price drop). In truth his numbers are relatively good to date and the sticking point should be his price tag rather than his performances to date. If you liked him at 9.5m to start the year I haven't seen much to change that view, but I'd still personally find it hard to justify that investment over the likes of Lukaku, Benteke or Giroud.

It's a somewhat similar story with Wayne Rooney who's seen a lot of divestment already, although partly encouraged by his injury status of course. His data is fine and doesn't do much to alter ones opinion, but again, I didn't like him over the aforementioned players to start the year and the production doesn't sway me the other way either.

I don't think anyone is seriously fooled by Steven Naismith's haul this week so we'll pass on him but the next name is one that has piqued some interest. Dimitri Payet's ownership is up to 10% which is quite significant for a largely unknown asset with a mid-level price tag. If you see his shot line of 8 total, 4 SiB, 3 SoT and 3 goals and believe that warrants a 7.6m investment then I have a sizeable plot of Florida swamp lamp you might be interested in. Now, some of this investment may be based on his performances on the field (I haven't seen a full West Ham game yet this term) and he does bring a healthy assist threat to the party (16 created chances) so it's not that there isn't hope here, I'm just not sure he's all that more than a promising lottery ticket, which could also be said of someone like Xherdan Shaqiri who's produced a similar shot line (7/4/1) in just over half as many minutes as Payet and comes at a lower cost and significantly lower ownership (3%). The standard line applies here: buy if you like the player but there's not enough here if you're just chasing the production.

There's a trio of surprise success stories with Bafetimbi Gomis, Riyad Mahrez and Callum Wilson enjoying explosive starts to the year. Those all appear on the list, the real story is not the gap between their xG and actual goals, but the fact that their xG numbers are terrific. At 3.0, 2.9 and 2.7 respectively, this trio have numbers which can go toe-to-toe with anyone in the league right now, and yet all come in a bargain price. Yes, everyone is on Mahrez already (43%) with Gomis (28%) and Wilson (26%) not too far behind but these ownership rates are not only justified but likely too low. These aren't just cheap players getting a but lucky, they are performing at an elite level and the kind of flexibility you get by installing a ~6.0m players who can start every single week is huge.

I think we can safely conclude that Arsenal are due for some better luck with no less than five players making an appearance here, having collectively "earned" an expected 5.8 assists but having actually been rewarded with just the one. Other than that quintet not too much stands out here.

David Silva has played well this year and has widely been singled out as the lynchpin in this impressive City side, and while that may be true, from a fantasy perspective his totals aren't quite as convincing. He's never been one for huge goal threat numbers (8 shots, 4 SiB and 2 SoT) but his assist totals are supposed to compensate for any deficit here yet 12 created chances place him alongside the likes of Nathan Redmond and Jose Jurado. Of course, we can reasonably suggest that Silva's created chances will likely be converted at a higher rate than that pair thanks to presence of the likes of Sergio Aguero but when you consider than Yaya Toure has the same CC to date and Navas is just behind with 10, it starts to look like Silva is a touch overvalued (it should be noted he has played a game less than Yaya, which given the small samples here is not insigificant). Silva could well have a huge season and he remains one of the best in the league, but these numbers should give some pause before writing him in for a historic assists haul.

People seem to have avoided the Wes Hoolahan bandwagon as yet but the Marc Albrighton express is getting somewhat crowded. The Leicester man has created 10 chances and earned 3 assists for a 30% A/CC rate. Even if you believe Leicester are for real and will continue to create chances at a healthy clip, the rest of the team except Albrighton are seeing their chances converted at a 19% rate; significantly lower than the former Villa man. Given his profile as wide player throwing balls into the box, it's tough to think they are subject to a better conversion rate than a carefully crafted through ball.

Wednesday, September 9, 2015

Which stats should we focus on?

With the proliferation of Opta and multiple news sources starting to dip their proverbial toes into the world of statistical analysis, casual fans have access to a greater depth of data than at any point before. Converting that data into useful information therefore comes more and more into focus and that’s why we need to review any proposed “advanced” metrics to ensure they remain relevant and as accurate as possible (all while acknowledging we are a long way from even touching the kind of analytics that are prevalent in other sports).

Before we go on it should also be noted that any reference to terms like “advanced” should be taken lightly. By “advanced” I mean, slightly more useful than looking at the “goals scored” chart and assuming that the past explains the future. I am not a stats professor nor even a student and more complex models surely exist which might shave a point or two off the margin of error from the analysis in these electronic pages. However, I believe the outputs are good enough to raise warning flags and give clues of over and undervalued assets, and that seems sufficient to warrant committing these words and numbers to my little corner of the Internet.

Which stats to look at
One of the challenges of the increased availability of data is which particular stats do we pay attention to? If we’re trying to forecast individual goals do we want to know how many total shots they have or just those on target? Do we care how often they play? How about their involvement throughout the game, using total passes or touches as a proxy?

The simple way I approach this is to look at the correlation between the individual stats and the goals scored on an individual level. It isn’t perfect, of course, and one could look to exclude players who didn’t play enough or who are obviously not fantasy targets (holding midfielders) but for simplicity, my sample is the entire population of players from the prior season. The obvious starting point is shots, based on the motivational-poster-inspired adage of players missing every shot they don’t take. Those results throw up some predictable but not irrelevant notes:

Total shots 86% correlation to goals
Shots outside the box 58%
Shots inside the box 88%
Shots on target 90%

Unsurprisingly, if you take a lot of shots you have more chance of scoring, and those odds increase if your shots are (a) closer to goal and (b) hit the target. These points sound obvious when written down but they bear repeating to avoid the potential pitfall of equating the value of a long range effort with those of a short range attempt. The final point to note here is that while shots on target show the best correlation, that event also occurs the least and thus can take a little longer to stabilize. For example, if we calculate the SoT and SiB per 90 minutes for players through four gameweeks of last season and compare the results with the final stats in those categories, we see a 50% correlation for SoT but a 65% for SiB. In other words, after four weeks we have a more solid grasp on how many shots inside a box a player will accrue versus his shots on target and thus forecast models may be a little more reliable. This is just one way of looking at the issue, of course, but it’s how I’ve cut the data up and represents one of the reasons I tend to gravitate towards SiB when making quick decisions about players, particularly in these early stretches of the season.

After shots, what else might equate well with goals? More playing time? Increased involvement during games? Let’s take a look:

Minutes played 43%
Total passes completed 10%
Touches in the penalty box 82%

The first number is a bit misleading as you obviously wouldn’t want to completely ignore playing time when selecting your team (Leo Messi is unlikely to rack up many Premier League fantasy points despite his ability as he will log exactly zero minutes this season). The reason I included that number is to highlight that just because a player is playing, doesn’t mean they will have success. We often see debate about a player being “locked” into a starting lineup or suggestions that being captain guarantees him a starting spot and thus success. These factors don’t necessarily forecast glory for the given player.

As for involvement we can see that it’s both vital and essentially useless, depending on the type of action a player is involved with. As with most of this rudimentary analysis, I think we all know that the kind of touches the Michael Carricks and Clause Makeles of the world produce are likely unlinked to goals (at least in terms of directly scoring them) but yet we still often see the nice “action zone” maps highlighted on blogs or the media as evidence that a player is somehow “running” his team and therefore likely to share in the spoils. It should go without saying: context it key.

Touches in the penalty box is a very nice stat as it combines a good overall correlation with goals (82%) while also stabilizing very quickly. Indeed, if we compare the average number of touches in the penalty box players accumulated after just four weeks of last season with their final season totals we see a very promising correlation of 73%. In other words, after just a handful of games we can already identify a stat in which we have relative confidence that it will not only lead to goals but will also continue to be generated at a reasonably consistent pace. For the record, a players actual goals scored after the same number of weeks correlates just 44% with his future goal haul.

I always tend to dwell on goals analysis for two reasons. First, goals are the dominant part of the game and still dictate much of our fantasy success (few non-defenders can be productive without at least some contribution in the goals column). The second reason is that the picture with assists is much clearer thanks to Opta’s extremely useful – if potentially subjective – chances created stat. Last year saw a 95% correlation between that number and eventual assists, so there’s little reason to really push too much further into total passes, crosses or any other number of potential stats. There could well be value in looking as to what type of created chance leads to more goals, or whether having better players around you leads to a better conversion rate, but for now, we can just note that to find assists, just chase the players creating the chances (again, not rocket science).

I hope this served as something of a primer for the season and a warning for new readers (or reminder for old readers) that the main conclusion is that this is all just the ramblings of a crazy guy with a spreadsheet.

Thursday, September 3, 2015

Team Plus-Minus Page

If you direct your attention to the navigation bar you will note that there is now a new link to the team plus-minus page. I've added a brief summary below but it's a pretty simple principle. One word of caution: the data is based solely on 2015-16 data so the sample sizes are ludicrously small. That said, these numbers (particularly with penalty box touches and total shots) do stabilise relatively quickly so it's worth at least having a glance at these numbers in the coming weeks if you are worried that your initial assessment of a team is a bit off (are Chelsea really struggling? Are Leicester for real?).

Plus-minus (+/-) is a very simple metric which simply tries to adjust stats to put them in a context of a team's opponents to date. For example, we might note that two teams have each registered five shots on goal per game and conclude they are of equal ability but if one team did it against Man City and Everton while the other did it against Norwich and Sunderland we would want to adjust our expectations accordingly. As with many "advanced" stats (and I'm using that term lightly here), the goal is simply to explain a concept we are all comfortable and familiar with but often ignore when looking at numbers, hence falling into traps of being overly influenced by stats which don't tell the whole story.

The calculation itself is very simple. Let's take Palace at home for this season to date (through GW4) as a quick example. More specifically, let's look at their Shots on Target plus-minus (hSoT +/-). Their two appearances at Selhurst Park can be summarised as below:

GW2 vs ARS, 4 SoT, Arsenal concede on average 2 SoT away from home, SoT +/- would be 100% (4-2)/2
GW3 vs AVL, 6 SoT, Villa concede on average 4 SoT away from home, SOT +/- would be 50% (6-4)/4
The teams total hSoT +/- is a simple average of the games to date, so 75%

A quick word on the stats used. I've selected the four listed stats - penalty box touches, total shots, shots inside the box and shots on target - because they have shown the greatest correlation to goals scored over past seasons. Perhaps I'll do a separate post on that, though there isn't much too it. The correlation I speak of is very rudimentary and might make more learned statisticians cringe as it's closer to "draw a line of best fit in science class" than anything Nate Silver might put together but I think it does the trick to highlight the key stats.

One key to keep in mind is that while the correlation strength over a season goes from SoT, SiB, Shots and Penalty Box Touches, that's also a reverse order of the frequency with which those events happen and that can lead to wilder swings in the early part of the season. I haven't tried to pin down exactly when those numbers stabilize but I have found in the past that something like SoT can take a bit longer so exercise some caution there.

I will add the same numbers for defensive stats soon.

Tuesday, August 25, 2015

Dousing the Fire, Fanning the Flames Gameweek 3

Three weeks into the new season and everything is pretty much back where we expect it. Man City and their rejuvenated cross-town rivals are flying high, Arsenal fans are panicking about their perceived lack of striking options and I've decided to re-vamp the projection model a bit. Chelsea's seven goals conceded and Leicester's flirtation with the big boys, however, show that there`' still room in this league for variance (however fleeting) and it's in that spirit that I've made the latest changes.

I hope the tables are fairly self explanatory, but as a quick background, the focus has been to (a) keep our search for potential flags as wide as possible and then narrow them down with narrative and (b) acknowledge that this entire process is fraught with uncertainty. On the former point, you'll see that I've included four different "projection" metrics to reflect the different identifiers we have for a player's success. For goals, we're now looking at a player's touches in the box, total shots, shots inside the box and shots on target separately to better illustrate why a player may be over or underachieving. For assists, we have the number of chances created by a player and the number of passes they complete in the opponents final third. Those metrics will stabilise at different rates and will need tweaking as more research is done to see if, for example, players from top teams need less shots to score, but for now they give a decent set of warning flags or prompts for potential investment.

On the issue of uncertainty, I was influenced by a recent discussion in baseball where two pundits (presumably new to the stats world) were arguing that a given player was better than the other because he was worth 6.1 fWAR (Wins Above Replacement) versus the other's 5.9. That metric is extremely complex and even its creators would be quick to note the level of uncertainty regarding some of the inputs that go into it, so to argue superiority based on a rounding difference is likely misjudged. I realised I was guilty of that too with my wildly imprecise (and vastly inferior to fWAR) metrics so I've opted for the more visual-based approach to give a more approximate judgement rather than focusing too much on decimals.

Despite being limited to just a cameo role in the opener, Alexis Sanchez has already jumped to the top of the xG table alongside teammate Giroud, with particularly strong indicators with his touches in the penalty box and shots inside the box. The price is obviously high and Sanchez faces competition from the likes of David Silva and Eden Hazard in that elite midfielder category but right now, no-one is doing more for his team and it's just a matter of time before the points start to follow the underlying stats. Troy Deeney is someone I was largely unfamiliar with coming into the season but he's shown plenty of promise in the opening month and deserves attention in that third forward role (though you might want to wait until after this week's trip to the Etihad). Only Giroud has more shots than the Watford man and his eight SiB put him 5th among his peers. The downside is you're not going to get a price rise from Deeney with Premier League debutant Caullum Wilson and old faithful Jermaine Defoe making some waves in the budget market, so while the numbers say Deeney is a nice pickup, at this early stage he might be more obscure than you need to go. It's a similar story with Diafra Sakho, who at 6.5m has performed well but may be too cute of a pick right now.

The next group of names don't need much introduction with Memphis Depay, Aaron Ramsey, Harry Kane and Juan Mata all having plenty of pedigree and promise. All are very ownable, though I wouldn't let this "underachievement" sway you one way or another. If you were high on these players before the year then the data suggests there's no reason to panic but if you preferred the likes of Mesut Ozil or Giroud coming into the year then feel free to ride those players a bit longer.

The last player to highlight is Christian Eriksen, not because he's particularly undervalued but because he illustrates the potential value of the new data table. You can see that his shots on target number is large, suggesting he should have scored more goals than he has, yet the other metrics are not so optimistic. A look at his stat line shows six shots, three in the box and four on target in two games of action. The four shots on target are a nice haul and put him tied for 5th among midfielders but taking a step back we can now see that if he averages three shots a game with just half coming in the box, he's unlikely to keep racking up healthy shots on target totals and thus his expected goals will likely fall. Players do appear to have the ability to hit the target more often than others but a 50% rate for a player who takes plenty of efforts from outside the box is likely unsustainable. I'm hoping that these new tables will help illustrate these nuanced cases a bit better than the old format.

Riyad Mahrez is one of the rare cases where despite heading the "douse the fire" table, he still represents an excellent investment. The only reason for the perceived negativity is that he can't continue to be this good for the remainder of the season, but at his price range he doesn't need to deliver anywhere near his current returns to offer excellent value. Whether or not teams will figure Leicester out is another question for another post but for now, enjoy the ride and the incredible price rises coming his way. The same is true to a lesser degree of Everton's promising duo of Romelu Lukaku and Ross Barkley who have started very nicely, although a look down the road on the fixture list makes this pair more of a "hold" than a "buy" proposition.

The defenders here don't really need comment as they are obviously not going to sustain their goal scoring pace for the year, though it's worth noting that both Vicent Kompany and Russell Martin have posted quite healthy xG rates and can probably be relied upon to deliver some offensive value, making them both potential ownable.

Santi Cazorla is no stranger to underperforming his underlying stats, having notched just a single non-penalty goal last season on an incredible 27 shots on target. This year it's the turn of his assist totals to suffer with nothing to show for his 12 created chances which are second among all midfielders. Despite this solid performance and rejuvenated performance last year, it's hard to see a scenario in which the Spaniard outperforms both Ozil and Ramsey with any consistency, both of whom are available for the same price. An interesting name to keep in the back of your mind is old favourite Gylfi Sigurdsson who has nothing to show for the season to date but has already racked up solid numbers, appearing in both fan the flames tables. Andre Ayew's hot start along with the emergence of Jefferson Montero will likely overshadow the Icelander's return to form but he might be there to capitalise if some of the hot new things start to stutter.

Marc Albrighton has the dubious combination of double digit gameweek and recognisable name on a promoted team that can often lead to widescale overreaction. His numbers are okay, perhaps even encouraging given his 5.1m price tag but he remains a promising bargain bin option rather than a potentially undervalued diamond. Yaya Toure plays at the other end of the spectrum and remains one of the league's better players but continues to confound the model, which generally finds him to be in a constant state of overachievement. This year it's his assist total which looks inflated, although the discount is probably a bit less than suggested as we can safely assume that Man City players (read: Sergio Aguero) convert created chances at a higher clip than the league average. The Ivorian is a tough man to project but with his recent price rise, the numbers would suggest he's tough to get overly excited about.