Applied Machine learning to solve real life problems

Date:

July 1 to 5. AFTERNOON: 15 to 18h

Instructor

Jordi Moragas Vilarnau

Degree in Mathematics and PhD in Mathematics (with a specialization in Combinatorics, Graph Theory, and Additive Number Theory) from UPC

Lead Data Scientist (Core Banking) at N26 Bank in Berlin, Germany. Currently leading a talented team through the prototyping, development, and deployment of cutting-edge Machine Learning models in the areas of Credit Risk, Financial Crime prevention/detection, and Credit Card transaction authorization

Formerly Head of Data & Analytics at Bluecap Management Consulting, a firm specialized in financial services and consultancy projects for top-tier European banks in regulatory, strategy, and risk management. As an expert in quantitative risk analysis in banking, the main responsibilities included Credit Risk modeling (PD/LGD), Stress Testing (solvency and liquidity), capital/provisions calculations, and pricing (RAROC approach)

Language

English

Description

This course aims to offer a highly practical perspective on applying modeling techniques to real-world problems, ranging from fundamental to cutting-edge methods, within quantitative business environments such as banking and finance

Course goals

To understand the challenges associated with creating models in real-world scenarios, from data acquisition to selecting the optimal model and validation methods tailored to specific problem types. Additionally, to promote good practices and provide insights into common issues encountered in large-scale modeling projects.

Course contents

  1. Introduction to modeling
    1. Supervised vs. unsupervised learning
    2. Supervised:
      1. Classification
      2. Regression
    3. Metrics
      1. Classification (AUC, Gini, AR, Somers' D, Kolmogorov-Smirnov...)
      2. Regression (R^2, RMSE...)
    4. Best modeling practices (train / test partition, cross-validation, overfitting control)
  2. Feature Engineering
    1. Data processing in real life and Big Data. Typical errors in data management (e.g., future information)
    2. Handling missing values (mean/median/percentiles, control dummy, K-Nearest Neighbors...)
    3. Treatment and creation of new features (categorical variables, Weight of Evidence/continuous bucketization, dummies and one-hot encoding, alert counters...)
  3. Models and applicability cases
    1. Basic problem examples:
      1. Ratings in credit risk (classification)
      2. House prices (regression)
      3. Image recognition (classification)
    2. Linear models (OLS / Logistic) and regularization (LASSO, RIDGE, ElasticNet)
    3. Decision trees and random forests
    4. Gradient Boosting (GBM, XGBoost, LGBM, CatBoost)
    5. Neural networks (1D & 2D CNN) for text and image processing
    6. Generative AI and Large Language Models (LLM)
      1. APE (Automatic Prompt Engineering)
      2. LLM prompting instead of rules
    7. Stacking techniques
    8. Parameter optimization (hyperparameter tuning)
    9. Feature interpretation in black-box models using SHAP (effect of removing/adding each feature)
    10. Using LLMs for explanations
  4. Calibration of classification models
    1. Need for calibration
    2. Confusion matrix and pay-off methodology
    3. Basic calibration techniques of the probability curve (Platt scaling/binomial model, isotonic regression, Bayesian central trend correction)
    4. Advanced calibration techniques of the probability curve (Tasche: "The art of probability-of-default curve calibration") 
  5. Model validation
    1. Stability
    2. Performance
  6. Model Development in a Real Case and Kaggle Evaluation Contest
    1. Building a classification model from scratch to predict the probability of default for corporations

Prerequisites

Basic algebra and statistics. Programming  (the course will be taught in Python).

Targeted at

Students interested in applying current Machine Learning techniques in a production environment. Graduates in mathematics, statistics, engineering and related quantitative fields.

Evaluation

An internal modeling contest will be held in Kaggle.

Software requirements

Python 3 will be the preferred tool for its ease of use and flexibility. Students may also use R if they feel more comfortable with it.