Share:

Applied Machine learning to solve real life problems

Date:

July 3 to 7. AFTERNOON: 15 to 18h

Instructor

Jordi Moragas Vilarnau

Degree in Mathematics and PhD in Mathematics (Combinatorics, Graph Theory and Additive Number Theory) by the UPC, Department of Applied Mathematics IV.

Lead Data Scientist and Head of Financial Crime Prevention (FCP) at N26 (Berlin):

Driving the team and closely collaborating with Machine Learning & Data Engineers, FCP analysts, product managers, and anti financial crime experts to deliver high quality Machine Learning services that impact millions of users. Experiment with new Machine Learning technologies and set the technical and research agenda to solve some of the most challenging problems in the financial industry.

Former Head of Data & Analytics at Bluecap Management Consulting:

Firm specialized on financial services and consultancy projects for top-tier European, US and LATAM banks in regulatory, strategy and risk management. As an expert in quantitative risk analysis, the main tasks were the development, implementation and validation of internal credit risk models (ratings/scoring systems and calibration of PD and LGD under Basel III and IFRS9 framework), stress testing (solvency and liquidity), capital and impairments, pricing and RAROC.

Extensive use of Machine Learning techniques, advanced programming tools (Python, R) and data management (SAS/Oracle/SQL, PySpark)

Language

English

Description

This course aims to provide an extremely practical vision of the application to real problems of modeling techniques, from the most basic to the most state-of-the art ones, in highly quantitative business environments (banking and finance).

Course goals

To learn the challenges involved in creating a model in the real world from obtaining the data to the best model to be used and the validation mechanisms depending on the type of problem. Good practices and description of the usual problems in a large modeling project.

Course contents

  1. Introduction to modeling
    1. Supervised vs. unsupervised learning
    2. Supervised:
      1. Classification
      2. Regression
    3. Metrics
      1. Classification (AUC, Gini, AR, Somers' D, Kolmogorov-Smirnov...)
      2. Regression (R^2, RMSE...)
    4. Best modeling practices (train / test partition, cross-validation, overfitting control)
  2. Feature Engineering
    1. Data processing in real life and Big Data. Typical errors in data management (future information)
    2. Missing values treatment (mean / median / percentiles, control dummy, K-Nearest Neighbors ...)
    3. Treatment and creation of new factors (categorical variables, Weight of Evidence / continuous bucketization, dummies and one-hot encoding, alert counters...)
  3. Models and applicability cases
    1. Basic problem examples:
      1. Ratings and counterparty risk (classification)
      2. Pricing (regression)
      3. Sentiment analysis (classification)
    2. Linear models (OLS / Logistic) and regularization (LASSO, RIDGE, ElasticNet)
    3. Decision trees and random forests
    4. Gradient Boosting (GBM, XGBoost, LGBM, CatBoost)
    5. Neural networks (1D CNN) for text mining and natural language processing
    6. Stacking techniques
    7. Parameter optimization (hyperparameter tuning)
    8. Factor interpretation in black-box models
      1. SHAP (effect of removing/adding each feature)
      2. LIME (local linear behavior)
  4. Calibration of classification models
    1. Need for calibration
    2. Confusion matrix and pay-off methodology
    3. Basic calibration techniques of the probability curve (binomial model)
    4. Advanced calibration techniques of the probability curve (Tasche: "The art of probability-of-default curve calibration") 
  5. Model validation
    1. Stability
    2. Performance
  6. Model development in real cases and Kaggle competitions
    1. Rating
    2. House prices
    3. Sentiment analysis of news (including web scraping)

Prerequisites

Basic algebra and statistics. Programming.

Targeted at

Students who wish to see the application of current Machine Learning techniques in a production environment. Graduates in mathematics, statistics and engineering.

Evaluation

An internal modeling contest will be held in Kaggle.

Software requirements

Python 3 will be the preferred tool for its ease of use and flexibility (latest stable version supporting tensorflow). A student can also use R or others, but all the delivered codes and examples will be in Python.