Create a StatsBomb-Inspired Template for Team Radar Comparison Using Free Data from FBRef

Mikhail Borodastov
11 min readMar 7, 2024

--

1. The Evolution of Radars in Football

The final stage of any data analysis is the visualization of the obtained results. You can employ the most efficient metrics, gain interesting and breakthrough insights from data analysis, but if the outcomes of your work are presented in a format that is not user-friendly for the end consumers of the data analysis, the primary value of the work done may remain unnoticed and not produce the effect it actually possesses. In other words, how you demonstrate the results of the analysis is often just as important as the results themselves.

One of the leaders in setting trends for metrics visualization in the football industry is StatsBomb. Based on event data, StatsBomb calculates various football metrics and then packages them into formats that have now become the industry standard and are very convenient for understanding the overall statistics for individual players or teams.

Ted Knutson, the founder of StatsBomb, in 2014, proposed one of the methods for efficient visualization of football statistics through what is known as Player Radars, compiled based on various football metrics. He borrowed this idea and diagram design from Rami Moghadam, who worked as an art director at ESPN The Magazine and used these diagrams for a poster of all the NBA stars of 2013.

Radars have also begun to be used for comparing entire teams. Currently, two main types are distinguished: an Attacking Radar, which primarily considers offensive metrics, and a Defensive Radar, focused on defensive metrics.

These radars allow for the evaluation of a team’s overall metric changes year over year or throughout the season, as well as for comparing teams with each other, similar to the comparison of individual players. An example of such a radar is provided below.

In this article, we will demonstrate:

  • A method for creating a similar template based on data from FBref using Python and several popular libraries.
  • How to plot current statistics for selected teams on the template.

At the end of the article, a link to a Jupyter notebook will be provided, from which you can easily adapt the code for your own use and employ it to quickly generate such radars for comparing the integral characteristics of teams before or after a match throughout the season.

2. Downloading Data

To construct a radar template, you will need data from the last 5 seasons for the top 5 European leagues. For analyzing team performance in the current season, you will accordingly require up-to-date data for the current season of the league of interest.

A notebook for downloading data from FBref.com can be found in my GitHub repo here.

Data for the last 5 seasons have already been uploaded to the folder Scraping_fbref_static_data/data/old_seasons within the football-data-analysis repository, so there's no need to download it again. (You do not need to execute Section 3.1 "Collect teams and players statistic for TOP5 Europe leagues for last 5 years" and the corresponding cells)

However, you will need to update the data for the current season. To do this, run the cells under Section 3.2. Make sure to uncomment only those leagues whose information you plan to use on the radar. I have left only La Liga uncommented.

Additionally, if you do not need all types of available tables, you can choose only those relevant for subsequent work:

  • df_outfield – statistics for outfield players
  • df_keeper – statistics for goalkeepers
  • df_team – statistics for teams
  • df_team_vs – statistics against the team under consideration
%%time

season = '2023-2024'

lst_leagues = [
# (9,'Premier-League-Stats'),
# (20,'Bundesliga-Stats'),
(12,'La-Liga-Stats'),
# (11,'Serie-A-Stats'),
# (13,'Ligue-1-Stats'),
# (57, 'Swiss-Super-League-Stats'),
# (23,'Eredivisie-Stats'),
# (72,'Scottish-Championship-Stats'),
# (37,'Belgian-Pro-League-Stats'),
# (54,'Serbian-SuperLiga-Stats'),
# (32,'Primeira-Liga-Seasons-Stats'),
# (40, 'Scottish-Premiership-Stats'),
# (39,'Ukrainian-Premier-League-Stats')
]

lst_outfields = []
lst_keeper = []
lst_team = []
lst_team_vs = []

for league in tqdm(lst_leagues):

left_url = f'https://fbref.com/en/comps/{league[0]}/'
right_url = '/' + league[1]
league_name = league[1].rsplit('-',1)[0]



df_outfield = get_outfield_data(left_url, right_url, dict_res)
df_keeper = get_keeper_data(left_url, right_url, dict_res)
df_team = get_team_data(left_url, right_url, dict_res)
df_team_vs = get_team_data_vs(left_url, right_url, dict_res)

df_outfield.insert(0,'season',season)
df_keeper.insert(0,'season',season)
df_team.insert(0,'season',season)
df_team_vs.insert(0,'season',season)

df_outfield.insert(1,'league_name',league_name)
df_keeper.insert(1,'league_name',league_name)
df_team.insert(1,'league_name',league_name)
df_team_vs.insert(1,'league_name',league_name)


lst_outfields.append(df_outfield)
lst_keeper.append(df_keeper)
lst_team.append(df_team)
lst_team_vs.append(df_team_vs)

Special thanks to @parth1902, whose scraper I used as a basis.

3. Data Preprocessing

The process of building is described in detail in the notebook linked.

3.1. Library Import

import os
import pandas as pd
import gc
import sys
from pathlib import Path
import requests
import re
from bs4 import BeautifulSoup
from datetime import datetime
from IPython.display import Image


import warnings
warnings.filterwarnings("ignore")


# Get the absolute path to the 'utils' directory, which is assumed to be
# in the parent directory of the current Jupyter notebook's location.
utils_dir = str(Path.cwd().parent / "utils")

# Add the 'utils' directory to sys.path if it's not already included.
if utils_dir not in sys.path:
sys.path.append(utils_dir)

# Import all functions from the '../utils/utils_v2.py' and directory.
from utils_v2 import *

In addition to the basic libraries, we also import utils functions from the utils_v2.py module, which, in turn, involves importing modules radar_plot.py and utils.py. I took the last two modules from Anmol_Durgapal (@slothfulwave612) and his library soccerplots, which many of you who have tried to build your own radars are likely familiar with.

In the radar_plot.py module, I made a couple of minor modifications aimed at how numbers are displayed on the final radar.

3.2. Reading Data

We read the data obtained earlier

date = '2024-03-06'

path1 = '../../Scraping_fbref_static_data/data/old_seasons'
path2 = '../../Scraping_fbref_static_data/data/current_season'

#TOP5 europen leagues last 5 seasons
df_players_old = pd.read_csv('/'.join((path1, 'top5_leagues_outfields_2018_2019__2022_2023.csv')), index_col=0).reset_index(drop=True)
df_keeper_old = pd.read_csv('/'.join((path1, 'top5_leagues_keeper_2018_2019__2022_2023.csv')), index_col=0).reset_index(drop=True)
df_team_old = pd.read_csv('/'.join((path1, 'top5_leagues_team_2018_2019__2022_2023.csv')), index_col=0).reset_index(drop=True)
df_team_vs_old = pd.read_csv('/'.join((path1, 'top5_leagues_team_vs_2018_2019__2022_2023.csv')), index_col=0).reset_index(drop=True)

#TOP5 europen leagues current seasons (up-to-{date})
df_players = pd.read_csv('/'.join((path2, date, 'top5_leagues_outfields_2023_2024.csv')), index_col=0).reset_index(drop=True)
df_keeper = pd.read_csv('/'.join((path2, date, 'top5_leagues_keeper_2023_2024.csv')), index_col=0).reset_index(drop=True)
df_team = pd.read_csv('/'.join((path2, date, 'top5_leagues_team_2023_2024.csv')), index_col=0).reset_index(drop=True)
df_team_vs = pd.read_csv('/'.join((path2, date, 'top5_leagues_team_vs_2023_2024.csv')), index_col=0).reset_index(drop=True)

3.3. Calculating Statistics

First, we calculate the statistics for the attacking radar, which are not originally available on FBref.

# Function to perform the calculations on a dataframe
def process_team_dataframe(df):
df['gk_passes_length_avg_m'] = round(df['gk_passes_length_avg'] * 0.9144,2)
df['Shots on target%'] = round(df['shots_on_target_per90'] / df['shots_per90'] * 100,2)
df['Passes to final 3rd%'] = round(df['passes_into_final_third'] / df['passes_completed'] * 100,2)
df['Passes to PA'] = round(df['passes_into_penalty_area'] / df['games'],2)
df['box crosses%'] = round(df['crosses_into_penalty_area'] / df['passes_into_penalty_area'] *100,2)
df['Carries to final 3rd%'] = round(df['carries_into_final_third'] / df['carries']*100,2)
df['Carries to PA'] = round(df['carries_into_penalty_area'] / df['games'],2)
df['Touches final 3rd%'] = round(df['touches_att_3rd'] / df['touches_live_ball']*100,2)
df['npxg/Shot on target'] = round(df['npxg_per90'] / df['shots_on_target_per90'],2)
df['xa_per90'] = round(df['pass_xa'] / df['games'],2)
df['touches in PA'] = round(df['touches_att_pen_area'] / df['games'],2)
#just for df with direct statistic
if df['squad'][0].split(' ')[0] != 'vs':
df['DA'] = df['tackles'] + df['challenge_tackles'] + df['interceptions'] + df['fouls']
# df['gk_psxg_per90'] = round(df['gk_psxg'] / df['games'],2)

return df

# Applying adjustments and processing
df_team = process_team_dataframe(df_team)
df_team_old = process_team_dataframe(df_team_old)

df_team_vs = process_team_dataframe(df_team_vs)
df_team_vs_old = process_team_dataframe(df_team_vs_old)

Next, we calculate the statistics for the defensive radar.

def calculate_PPDA(df_vs, df):

df_vs_ = df_vs.copy()
df_ = df.copy()

df_vs_['squad'] = df_vs_['squad'].apply(lambda x: x.replace('vs ',''))
df_['squad'] = df_['squad'].apply(lambda x: x.replace('vs ',''))


cols_to_merge = ['squad','season','DA']

if 'PPDA' not in df_vs_.columns:

df_merged = pd.merge(df_vs_, df_[cols_to_merge], on=['squad','season'])

df_merged['PPDA'] = round(df_merged['passes'] / df_merged['DA'],2)

del df_, df_vs_
df_, df_vs = pd.DataFrame(), pd.DataFrame()
gc.collect()

return df_merged

else:
return df_vs

df_team_vs = calculate_PPDA(df_team_vs, df_team)
df_team_vs_old = calculate_PPDA(df_team_vs_old, df_team_old)

Additionally, we replace the PSxG metric value in df_team_vs with its equivalent from df_team. This is necessary to compile all defensive statistics into a single dataframe.

df_team_vs['gk_psxg_net_per90'] = df_team['gk_psxg_net_per90']
df_team = df_team.drop(['gk_psxg_net_per90'], axis=1)

df_team_vs_old['gk_psxg_net_per90'] = df_team_old['gk_psxg_net_per90']
df_team_old = df_team_old.drop(['gk_psxg_net_per90'], axis=1)

We also calculate the sum of tackles and interceptions, normalized by the number of minutes played.

# Function to adjust tackles and interceptions
def adjust_tackles_interceptions(df):
df['minutes_wo_ball'] = (100 - df['possession']) / 100 * df['games'] * 90
df['minutes_exp'] = df['games'] * 90 / 2
df['tackles_interceptions_adj'] = df['tackles_interceptions'] / df['minutes_wo_ball'] * df['minutes_exp']
df['tackles_interceptions_adj_per90'] = round(df['tackles_interceptions_adj'] / df['games'], 2)
return df

# # Adjusting tackles and interceptions for both current and old team data
df_team = adjust_tackles_interceptions(df_team)
df_team_old = adjust_tackles_interceptions(df_team_old)

cols = ['tackles_interceptions_adj_per90', 'squad', 'season']
df_team_vs = pd.merge(df_team_vs, df_team[cols], on =['squad','season'])
df_team_vs_old = pd.merge(df_team_vs_old, df_team_old[cols], on =['squad','season'])

3.4. Creating a Colormap Dictionary (Optional)

If this dictionary is not used, red or blue colors will be randomly used for all teams on the radars.

color_dict = {'Manchester City':'#004D98',
'Manchester Utd':'#DB0030',
'Arsenal':'#DB0030',
'Chelsea':'#004D98',
'Liverpool':'#DB0030',
'Atletico': '#DB0030',
'Barcelona':'#004D98',
'Real Madrid':'#004D98',
'Atlético Madrid':'#004D98'}

3.5. Selecting Metrics to Display on the Radar

First, create a dictionary of metrics, mapping the names of the metrics on the radar to the corresponding column names. Using the variable type, we specify in which order to display the metric values:

  • 0 - from smaller to larger
  • 1 - from larger to smaller

Next, select the metrics that will be displayed on the attacking and defensive radars, respectively. You can expand this set and choose other parameters from the data on FBref, customizing the template for your needs.

metrics_dict_attack = {'xG'                    : {'m':'xg_per90', 'type':1},
'npxG' : {'m':'npxg_per90', 'type':1},
'Shots' : {'m':'shots_per90', 'type':1},
'npxG/Shot' : {'m':'npxg_per_shot','type':1},
'G/Shot' : {'m':'goals_per_shot','type':1},
'Shots on target%' : {'m':'Shots on target%','type':1},
'Shots on target' : {'m':'shots_on_target_per90','type':1},
'G/Shot on target' : {'m':'goals_per_shot_on_target','type':1},
'Passes to final 3rd%' : {'m':'Passes to final 3rd%','type':1},
'Passes to PA' : {'m':'Passes to PA','type':1},
'box crosses%' : {'m':'box crosses%','type':0},
'Carries to final 3rd%' : {'m':'Carries to final 3rd%','type':1},
'Carries to PA' : {'m':'Carries to PA','type':1},
'Touches final 3rd%' : {'m':'Touches final 3rd%','type':1},
'Gk passes length' : {'m':'gk_passes_length_avg_m','type':0},
'npxg/Shot on target' : {'m':'npxg/Shot on target','type':1},
'GCA' : {'m':'gca_per90','type':1},
'xA' : {'m':'xa_per90','type':1},
'touches in PA' : {'m':'touches in PA','type':1}
}

metrics_to_plot_attack = ['npxG',
'npxG/Shot',
'Shots',
'Shots on target',
# 'GCA',
'xA', \
'Passes to PA',
'Carries to PA',
'touches in PA',
'Gk passes length',
'box crosses%'
]

metrics_dict_def = {'xG C.' : {'m':'xg_per90', 'type':0},
'npxG C.' : {'m':'npxg_per90', 'type':0},
'Shots C.' : {'m':'shots_per90', 'type':0},
'npxG/Shot C.' : {'m':'npxg_per_shot','type':0},
'G/Shot C.' : {'m':'goals_per_shot','type':0},
'Shots on target% C.' : {'m':'Shots on target%','type':0},
'Shots on target C.' : {'m':'shots_on_target_per90','type':0},
'G/Shot on target C.' : {'m':'goals_per_shot_on_target','type':0},
'Passes to PA C.' : {'m':'Passes to PA','type':0},
'Carries to PA C.' : {'m':'Carries to PA','type':0},
# 'Touches final 3rd%' : {'m':'Touches final 3rd%','type':1},
# 'Gk passes length' : {'m':'gk_passes_length_avg_m','type':0},
'npxg/Shot on target C.' : {'m':'npxg/Shot on target','type':0},
'GCA C.' : {'m':'gca_per90','type':0},
'xA C.' : {'m':'xa_per90','type':0},
'touches in PA C.' : {'m':'touches in PA','type':0},
'PPDA' : {'m':'PPDA','type':0},
'Tackles + Intercep adj' : {'m':'tackles_interceptions_adj_per90','type':1},
'PSxG - GA' : {'m':'gk_psxg_net_per90','type':1},

}
metrics_to_plot_def = [
'npxG C.',
'npxG/Shot C.',
'Shots C.',
'Shots on target C.',
'xA C.',
'touches in PA C.',
'PPDA',
'Tackles + Intercep adj',
'PSxG - GA'

]

dict_m = {'Attacking Radar':{'metric_dict':metrics_dict_attack,
'metric_to_plot':metrics_to_plot_attack},
'Defending Radar':{'metric_dict':metrics_dict_def,
'metric_to_plot':metrics_to_plot_def}}

4. Building the Visualization

As an example, let’s go through the step-by-step process of constructing an attacking radar for Real Madrid and Barcelona. We’ll select the teams, the type of radar, and define two variables that will hold the metrics dictionary and the list of metrics to be plotted.

team1 = 'Barcelona'
team2 = 'Real Madrid'

rt = 'Attacking Radar'

metric_dict = dict_m[rt]['metric_dict']
metric_to_plot = dict_m[rt]['metric_to_plot']

4.1. Forming a DataFrame with Calculated Statistics

Execute the get_df_metrics function to obtain the calculated statistics for the selected teams, the radar boundaries - the 5th and 95th percentiles, and the calculated percentiles for the respective teams.

df_metrics = get_df_metrics(df_team, df_team_old, team1, team2, metric_dict, metric_to_plot)
df_metrics

Perform another transformation with the DataFrame to bring it into a format that will be visualized later.

df_metrics_v2 = preprocessing_df_metric(df_metrics, df_team, rt)
df_metrics_v2

4.2. Applying a Color Layer

Use the metrics_to_image function, which invokes pandas.DataFrame.style to customize the previously obtained DataFrame.

image = metrics_to_image(df_metrics_v2, color_dict, rt, date)

As a result, you get the final version of the DataFrame, which you save as a .jpeg file at the path ../img/{date}/stats_image/

4.3. Drawing the Radar and Forming the Final Visualization

Invoke the plot_radar function, which uses the Radar class and corresponding methods defined in the radar_plot.py module.

plot_radar(df_metrics, df_metrics_v2, rt, color_dict, date, image)

As a result, you obtain the final image, which is saved at the path ../img/{date}/radar_image/.

5. Drawing Multiple Radars

5.1. Drawing a Pair of Attacking and Defensive Radars

Gather everything described above together, resulting in a compact cell.

team1 = 'Barcelona'
team2 = 'Real Madrid'

radar_order = []

for rt in ['Attacking Radar', 'Defending Radar']:

metric_dict = dict_m[rt]['metric_dict']
metric_to_plot = dict_m[rt]['metric_to_plot']

if rt == 'Attacking Radar':

df_metrics = get_df_metrics(df_team, df_team_old, team1, team2, metric_dict, metric_to_plot)
df_metrics_v2 = preprocessing_df_metric(df_metrics, df_team, rt)
image = metrics_to_image(df_metrics_v2, color_dict, rt, date)
plot_radar(df_metrics, df_metrics_v2, rt, color_dict, date, image)
team1 = df_metrics_v2.columns[0]
team2 = df_metrics_v2.columns[2]
radar_order.append(team1)
radar_order.append(team2)
else:
df_metrics = get_df_metrics(df_team_vs, df_team_vs_old, team1, team2, metric_dict, metric_to_plot)
df_metrics_v2 = preprocessing_df_metric(df_metrics, df_team, rt, radar_order = radar_order)
image = metrics_to_image(df_metrics_v2, color_dict, rt, date)
plot_radar(df_metrics, df_metrics_v2, rt, color_dict, date, image, radar_order = radar_order)

As a result, you get two radars. In addition to the previous output, you obtain a defensive radar for the Real-Barcelona pair.

5.2. Drawing Radars for All Matches of the Upcoming Round

To obtain a list of matches that will be played in the next round, use FBref as well. You will retrieve a set of tables located at the specified path.

https://fbref.com/en/comps/12/schedule/La-Liga-Scores-and-Fixtures

league = 'La-Liga'
dict_league = {'La-Liga':12}

url = f'https://fbref.com/en/comps/{dict_league[league]}/schedule/{league}-Scores-and-Fixtures'

res = requests.get(url)
comm = re.compile("<!--|-->")
soup = BeautifulSoup(comm.sub("",res.text),'lxml')


tables_ = soup.findAll('table')

dict_res = {}

for t in tables_:

attrs = t.attrs
if 'id' in attrs:

table_id = t.attrs['id']
table_name = t.find('caption').text
dict_res[table_id] ={'name': table_name,
'value':t.findAll('tbody')[0]}

Scrape the table sched_2023–2024_12_1 (the name of the table can change for a different seasons or leagues).

table = dict_res['sched_2023-2024_12_1']['value']
features = get_column_names_from_table(table)


pre_df_squad = dict()
features_wanted_squad = features[1:]
rows_squad = table.find_all('tr')

for m, row in enumerate(rows_squad):
if(row.find('th',{"scope":"row"}) != None):
name = row.find('th',{"data-stat":"gameweek"}).text.strip().encode().decode("utf-8")
if 'gameweek' in pre_df_squad:
pre_df_squad['gameweek'].append(name)
else:
pre_df_squad['gameweek'] = [name]

for n, f in enumerate(features_wanted_squad):

cell = row.find("td",{"data-stat": f})
a = cell.text.strip().encode()
text=a.decode("utf-8")
try:
text = float(text.replace(',',''))
except Exception as e:
pass
if f in pre_df_squad:
pre_df_squad[f].append(text)
else:
pre_df_squad[f] = [text]
df_result = pd.DataFrame.from_dict(pre_df_squad)

As a result, you get all the matches for a specific round of LaLiga.

mask = df_result['gameweek'] == '28'
df_games = df_result[mask][['home_team','away_team','date']].reset_index(drop=True)
df_games

And now, run a similar script for all matches of the upcoming round.

for n, row in df_games.iterrows():

team1 = row['home_team']
team2 = row['away_team']

radar_order = []

for rt in ['Attacking Radar', 'Defending Radar']:

metric_dict = dict_m[rt]['metric_dict']
metric_to_plot = dict_m[rt]['metric_to_plot']

if rt == 'Attacking Radar':

df_metrics = get_df_metrics(df_team, df_team_old, team1, team2, metric_dict, metric_to_plot)
df_metrics_v2 = preprocessing_df_metric(df_metrics, df_team, rt)
image = metrics_to_image(df_metrics_v2, color_dict, rt, date)
plot_radar(df_metrics, df_metrics_v2, rt, color_dict, date, image)
team1 = df_metrics_v2.columns[0]
team2 = df_metrics_v2.columns[2]
radar_order.append(team1)
radar_order.append(team2)
else:
df_metrics = get_df_metrics(df_team_vs, df_team_vs_old, team1, team2, metric_dict, metric_to_plot)
df_metrics_v2 = preprocessing_df_metric(df_metrics, df_team, rt, radar_order = radar_order)
image = metrics_to_image(df_metrics_v2, color_dict, rt, date)
plot_radar(df_metrics, df_metrics_v2, rt, color_dict, date, image, radar_order = radar_order)

Consequently, you will generate 20 radars, which will be stored in the designated directory, following the process outlined in earlier examples.

P.S.

Within my GitHub repository, you’ll discover a Jupyter notebook designed for creating comparable radars. Additionally, the README.md document includes a Roadmap detailing planned future implementations.

Twitter

Telegram

Linkedin

--

--

Mikhail Borodastov
Mikhail Borodastov

Written by Mikhail Borodastov

ML Product Manager 🚀 | ex- Data Scientist 📊 | Football Analytics Enthusiast ⚽

No responses yet