Soccer Data Science

Date:

June 29 to July 3. AFTERNOON: 15 to 18h.

Instructor

Virgilio Gómez-Rubio

Virgilio Gómez-Rubio is full professor in the department of mathematics, Universidad de Castilla-La Mancha (Albacete, Spain). His research interests include spatial statistics, Bayesian inference and computation methods. Recently, he has started to work on sports analytics. He has developed a number of packages on spatial statistics and Bayesian inference for the R software. Among other publications, he  is the author of book "Bayesian Inference with INLA", that received the 2022 SEIO–BBVA Foundation Award in Data Science and Big Data.

https://becarioprecario.github.io

Language

English, castellano

Description

The course will focus on statistical methods for the analysis of soccer (or football, in Europe). First, different types of data and data sources/providers will be introduced. Then, different case studies will be described and the statistical methods required for data analysis. Typical examples will include the analysis of league table to train models to estimate the outcome of future matches, analyse the spatial position of players in the field, find similar players to a target player and analysis of injuries, to mention a few. Sessions will be organized as short lectures followed up hand-on practicals. All examples will be developed in R in the course materials but participants will be free to use other similar software (e.g., Python).

Course goals

Introduce sports analysis for soccer/football. Learn the different data available in soccer/football. Study the main statistical methods used to analyse soccer/football data. Develop these methods with the R programming language

Course contents

  1. Introduction to soccer data
  2. Data visualization
  3. Spatial analysis of soccer data
  4. Methods for scouting in football
  5. Analysis of league tables
  6. Statistical modelling of soccer data

Prerequisites

Basic knowledge of the R programming language. A knowledge of basics statistical methods (such as those provided in many undergraduate statistics courses in science and engineering degrees).

Targeted at

Undergraduate students in their last year, M.Sc. students and graduate students. The course will also be of interest to the general public interested in sports analytics.

Evaluation

If needed, evaluation can be done by asking participants to conduct different data analyses and providing a short report at the end of the course. In addition, short questionnaires based on multiple choice questions could be employed as well to evaluate the theoretical part of the course (i.e., statistical methods and their applications to soccer data).

Teaching Methodology and Activities

Sessions will be split in different blocks, so that each block is comprised of a short lecture (~45 minutes) followed a hands-on practical (~45 minutes). All datasets and R code will be shared with the course participants.

Software requirements

The course will make extensive use of the R statistical software, and some specific packages will be required for data analysis.