Introduction to Causal Inference for Data Science

Date:

June 26 to 30. AFTERNOON: 15 to 18h

Instructor

Aleix Ruiz de Villa holds a PhD in mathematics and a Msc in financial mathematics. Hes been working as a data scientist for more than 10 years. He was Head of Data Science at LaVanguardia, and SCRM (from Lidl). He was Chief Data Science at Onna. For the last 4 years he's been freelancing in causal inference projects in collaboration with Vall d'Hebron and KoaHealth, while teaching regularly at many institutions. And he is an Advisor at Nuclia (NLP company).

Besides his daily work, Aleix is the author of "Causal Inference for Data Science" (Mannig Publishing). He was the cofounder of the Barcelona R Users Group (2011-2017) and the founder of the Barcelona Data Science and Machine Learning Meetup (since 2014).

Language

English

Description

Data has become a great support for companies to understand their customers and make better decisions. However, data can still become confusing and difficult to interpret. The problem is that in order to make decisions, we are interested in knowing what causes what, while data can only tell us what is correlated with what. And as you may know, correlation is not causation.

A good solution is using A/B testing. Unfortunately, they have many limitations: you cannot run lots of them simultaneously, they may be expensive to run, harm the customers, and sometimes they are not even feasible. That’s where causal inference comes into play.

Causal inference is a new methodology, quickly growing in popularity, that works with historical data not obtained through A/B tests. Causal inference helps us to:

- Know in which situations we can discern causation from correlation, in which situations we can as far as we get more variables, and in which situations correlations are the only thing we should expect to get.
- Calculate causal impacts from the historical data using statistical tools.

Course goals

Understand when causal inference is necessary, and also its assumptions and limitations
Learn tools to estimate causal effects
Understand how machine learning and causal inference can help each other

Course contents

1. Introduction

- Correlation is not causation

- A/B tests and Randomized Controlled Trials

2. How to handle confounders

- The main problem with confounders through Simpson's paradox

- How to remove the effect of confounding with the adjustment formula

- The adjustment formula in linear models

- Propensity Scores

3. Using machine learning to remove the effect of confounders

- The adjustment formula with machine learning

- Double Machine Learning

4. Selecting the variables that you need to include in your analysis in Directed Acyclic Graphs

- d-separation

- Backdoor criterion

5. Avoiding confounders through Instrumental Variables

6. Measuring the impact of a time event

- Synthetic Controls

- Regression Discontinuity Design

- Differences in Differences

Prerequisites

Basic knowledge of Probabilyt and Linear Regression
Basic knowledge of machine learning: cross-validation, decision trees, ...
Basic programming knowledge in R or Python

Targeted at

Data scientists, statisticians and econometricians that want to evaluate the impact of decisions or treatments.

Evaluation

Evaluation is done with an exercise where students will have to simulate synthetic data and later analyze it with causal inference tools

Software requirements

R or Python with your favourite GUI