Generalized Additive Models and beyond with P-splines

Date:

July 4 to July 8. AFTERNOON: 15 to 18h

Classroom:

Not defined yet

Instructor

Dae-Jin Lee

I am a researcher at BCAM - the Basque Center for Applied Mathematics - and research leader of the Applied Statistics Group (part of Data Science & AI research area).

I obtained my PhD. in Statistics at Universidad Carlos III de Madrid in June 2010, previously I was Postdoctoral Researcher at CSIRO Mathematics, Informatics and Statistics division (old CMIS) now CSIRO-Data61 in Melbourne, Victoria, Australia (from February 2011 to March 2014).

My main areas of research are statistical modelling and computational statistics with non-parametric regression techniques. In particular, I work in smoothing techniques based on penalized splines regression models (Penalized Likelihood splines with B-splines basis) and tensor product smooths in mixed models’ framework (“Generalized Linear Mixed Models”, GLMM’s). The main applications of my work are related to Multidimensional smoothing, Generalized Linear Array Models, Spatial and Spatio-temporal modelling, Computational Statistics, Disease mapping, Mortality Life Tables, Times series analysis and forecasting, Mixed-effects models, Environmental modelling, Functional Data Analysis, Wireless Sensors and Sensor Networks Data Analysis, Health-Related Quality of Life outcome data analysis, and complex data visualization. Recently, I have started a new research line in Sports Analytics and statistical methods for injury prevention in football.

I am also the scientific coordinator of BCAM's Knowledge Transfer Unit in Data Science, which aims to develop mathematical solutions for scientific challenges based on real-life applications and establish collaborations with Industry.

Language

English

Description

Generalized Additive Models (GAMs) are statistical models that can be used to estimate trends and patterns in the data. GAMs extend the is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, allowing for estimating non-linear effects, interactions and more complex structures. Among all the existing smoothing methods for GAMs, P-splines (penalized splines with B-splines basis and quadratic penalties) proposed by Eilers and Marx (1996) are a very popular approach for modelling. The existing libraries and codes in the statistical software R, makes extremely easy to show P-splines are a powerful tools for data smoothing.

The course is a mix of theory, real data applications and hands-on tutorials for the students. I will discuss, relevant aspects of P-splines such as estimation of the optimal smoothing parameters, inference with GAMs, multidimensional smoothing, longitudinal data analysis or times series analysis with reproducible examples and available R codes.

References:
- Eilers, P.H.C. and Marx, B.D. (1996). "Flexible Smoothing with B-splines and Penalties", Statistical Sciences, Vol. 11, Nº2 89-121.
- Eilers, P.H.C., Marx, B.D. and Durbán M. (2015). "Twenty years of P-splines", SORT-Statistics and Operations Research Transactions, 2015, Vol. 39, Num. 2, pp. 149-86.
- Eilers, P.H.C. and Marx, B.D. (2021). "Practical Smoothing: The Joys of P-splines". Cambridge University Press.

Course goals

  • Provide the students with the background and the benefits of the use of P-splines.
  • Illustrate with real data examples of applications of P-splines.
  • Provide the students with the ability to use P-splines for their own research projects and data analyses.

Course contents

  1. Introduction to smoothing.
  2. Generalized Additive Models.
  3. P-splines
    1. Basis and penalties.
    2. Optimal smoothing.
    3. P-splines for GLMs.
    4. Multidimensional smoothing.
  4. P-splines as mixed models.
    1. P-splines for GLMMs.
    2. Smoothing parameters estimation as variance components.
  5. Beyond the exponential family.
    1. Generalized Additive Models for Location, Shape and Scale.
  6. Examples.

Prerequisites

  • Basic knowledge of linear regression and inference
  • Basic knowledge of R software.
  • Linear Algebra and Matrix Analysis.

Targeted at

Master and PhD students, researchers, data scientists, industrialists.

Evaluation

Short project with real data where the techniques and methods taught in the course are used.

Software requirements

R software, Rstudio (desirable)