- On March 2, 2017
- analysis, archetypes, archetypes, data, performance, performance, performance, performance, players, players, sports, sports
In sport, we like to compare our players against the best players. An underrated data-mining technique called Archetypal Analysis allows us to do this with ease. But how does it work? This blog post will explore this in detail.
Archetypal analysis: a colourful analogy
On the colour spectrum, you could think of red, green and blue as the ‘ideal’ or ‘purest’ colours. Every other colour lies somewhere in-between those three colours. That is, in terms of the RGB colour space, somewhere between pure red (255, 0, 0), pure green (0, 255, 0) and pure blue (0, 0, 255). For instance, orange is a mix of red and green (255, 165 and 0).
In other words, red, blue and green are the ‘archetypal’ colours.
This type of thinking — that is, defining data points by relating them to the extremal data points — is a bit like how a particular data-mining technique called archetypal analysis works. The basic method is:
- Find the archetypal/extremal points. These are red, blue and green in the colour example.
- Relate all the observations to the archetypal points. E.g., in the RGB colour system, each colour can be represented as some mixture of the archetypal colours.
- For each observation, the sum of the archetypal coefficients must sum to 1 (i.e. 100%).
In essence, archetypal analysis reduces the size of your data. Four archetypes in your data? You can now express every observation with just four variables.
But archetypal analysis is more than just an exercise in data reduction. It is a great tool for thinking about sports data. I contend that we naturally and regularly compare players/performances against the best players/performances — just like how archetypal analysis compares each observation to the extreme observations.
NBA stars: Who are the archetypes?
Let’s take some up-to-date NBA data from the 2016/2017 season as an example. I’ve used the free R package ‘SportsAnalytics’ to scrape the data from dougstats.com. This analysis is inspired by, and updates, the analysis performed in the paper: ‘Performance Profiles based on Archetypal Athletes’.
We’ll start off with just two variables for now: Total Minutes Played and Blocks (i.e., the number of blocks in season). By visualising this data on a scatterplot, we can intuitively decide how many archetypes exist. In this trivial example, we can see in Figure 1 there are three archetypal points. That is, when:
- Total Minutes Played is high but Blocks are low;
- Total Minutes Played and Blocks are low; and
- Total Minutes Played and Blocks are high
Figure 1: Blocking efficiency with archetypes outlined
Figure 2: Normalised blocking efficiency for each archetypes
Each archetype are actual points on our scatter plot (Figure 1). In this sense, the archetypes have specific values for Blocks and Total Minutes Played. We can normalise these values against the maximum values, which can be seen in Figure 2.
Archetype 1 is so low on both stats that they don’t even show up on the graph. You might call this archetype the ‘Benchwarmer’. It is best represented by Danuel House from the Washington Wizards. Danuel illustrates an interesting feature of archetypal analysis: not all the archetypes represent ‘good’ players; some of them represent ‘bad’ players.
Archetype 2 represents the player that gets a lot of game time but makes few blocks. You might call this archetype the ‘Inefficient Blocker’ archetype. It is best represented by Harrison Barnes: small forward for the Dallas Mavericks. The reality is that blocking is not one of Barnes’ roles on the team.
Finally, Archetype 3 is high on both stats; you might call this archetype the ‘Superstar Blocker’. In this dataset he is Frenchman Rudy Gobert: shot-stopping extraordinaire and defensive anchor for the Utah Jazz.
Table 1 gives a numerical display of the archetypes. Note how each player’s archetypal profile must add to one (i.e. 100%). Also note how the archetypal points do not necessarily correspond to an exact player; e.g., Harrison Barnes has an Archetype 1 value of only 0.97 but is closer than anyone else to Archetype 1.
Table 1: Archetypes for Total Minutes Played vs. Blocks
So, in terms of Total Minutes Played vs Blocks, Barnes, House and Gobert are closest to the three archetypes. What about some other players? Let’s look at Lebron James’ profile.
James is highest in Archetype 1, the Inefficient Blocker archetype. A little surprising at first, but although he is no slouch in his blocking stats, his value lies in his all-round versatility.
A great characteristic of archetypal analysis is that we can think of Lebron James as being 84% like Harrison Barnes and 16% like Rudy Gobert — at least in terms of Total Minutes Played vs Blocks for the 2016-2017 season. I imagine that this type of thinking could translate well into the scouting arena. Say you are looking for a new player with these characteristics:
- 50% similar to your archetypal offensive player;
- 30% similar to your archetypal midfield-type player; and
- No more than 5% similar to your archetypal worst player.
Archetypal analysis can capture all these requirements in an intuitive, easy-to-explain manner.
Another use of archetypal analysis could inform the selection of a brand ambassador. For example, imagine a club has a fan base particularly interested in attacking, skilful players. Archetypal analysis can help identify the player in your squad who best maximises those conditions.
While this trivial example helps to highlight how archetypal analysis can be performed, it really shines when you put more than just two variables into the model. Luckily, the dataset we’ve obtained contains highly dimensional NBA statistics!
Diving deeper into archetypal analysis
Let’s put all the variables shown in Table 2 into the archetypal algorithm and see what comes out…
Table 2: NBA 2016-2017 summary statistics
Table 3: Archetypes of full dataset
Table 3 and Figure 3 reveal the archetypes. Archetype 1 is another Benchwarmer: low on all stats. Danuel House of the Washington Wizards again represents the Benchwarmer archetype.
Archetype 2 is the ‘Offensive Superstar’ archetype: high on all stats except rebounds and blocks. As of February 28th, James Harden from the Houston Rockets best represents this archetype — but fellow superstars Russell Westbrook (Oklahoma City) and Isaiah Thomas (Boston) are trailing close behind.
Archetype 3 is the ‘Specialist Three-Point Shooter’ archetype: these players may not necessarily get much game time, but when they do, they are there to shoot threes. Joe Johnson of the Utah Jazz best fulfils this archetype right now.
Archetype 4 is the ‘Defensive Superstar’ archetype: high on all defensive stats and low on all offensive stats. Rudy Gobert appears again as the archetype here, exhibiting his all-round defensive dominance. Gobert seems to miss out on the accolades he deserves, but the stats support that he is one of the best defenceman in the NBA.
Let’s have a look at Lebron James’ profile again, this time using the full dataset:
We can see Lebron is highest in the Offensive Superstar archetype, with a significant portion in the Defensive Superstar archetype, as you would expect with this all-round legend of the game.
Figure 3: Archetypes of full dataset
Pretending to be a scout
If I were a scout, I might use archetypal analysis to think about how a new signing could fit into my squad’s setup. Sure, a team full of Hardens or Westbrooks might be attractive at first glance, but even if you could somehow afford that, would it even be a good idea? In complex team sports, team cohesion matters. You need a good defensive player to complement your offensive superstar and teams tend to use specialist three-point shooters to spread the defence.
Archetypal analysis can help understand where gaps may exist in your squad and to facilitate an understanding of how a new player might fit into your structure.
Maybe you are looking to fill a position for a versatile defensive-type who also has some offensive qualities. In archetypal terms, we might look for players who exhibit a value higher than 0.6 in the ‘Defensive Superstar’ archetype and also higher than 0.3 in the ‘Offensive Superstar’ archetype. Performing this process in R yields two names: Anthony Davis of the New Orleans Pelicans and Karl-Anthony Towns of the Minnesota Timberwolves. These guys are defensive powerhouses known for their offensive abilities.
Likewise we could look for a players with values higher than 0.6 in the Offensive Superstar archetype and 0.3 in the Defensive Superstar archetype. Doing that returns Kevin Durant and Demarcus Cousins — once again two players aptly described as amazing offensive players who have some defensive capability.
Archetypal analysis is an underrated data-mining technique perfectly suited to sports performance data. It plays into our natural intuition to structure our thinking about sport in terms of the best performances. Archetypal analysis may not give you all the answers, but it will give you an interesting perspective. If you are interested in hearing more about archetypal analysis, do not hesitate to contact us at firstname.lastname@example.org.
- Eugster, M.J.A. (2012).Performance Profiles based on Archetypal Athletes. International Journal of Performance Analysis in Sport. 12(1), 166–187.
- Header image: Copyright 2015 NBAE (Photo by Bill Baptist/NBAE via Getty Images)