Applied Machine learning to solve real life problems

Date:

June 20 to 23. AFTERNOON: 15 to 19h (June 20, 21, and 22) and 15 to 18h (June 23)

Classroom:

Not defined yet

Instructor

Jordi Moragas Vilarnau

Degree in Mathematics and PhD in Mathematics (Combinatorics, Graph Theory and Additive Number Theory) by the UPC, Department of Applied Mathematics IV.

Senior Manager at Bluecap Management Consulting, firm specialized on financial services and consultancy projects for top-tier European banks in regulatory, strategy and risk management. As an expert in quantitative risk analysis, his main tasks are the development, implementation and validation of internal credit risk models (ratings/scoring systems and calibration of PD and LGD under Basel III and IFRS9 framework), stress testing (solvency and liquidity), capital and impairments, pricing and RAROC.

Extensive use of Machine Learning techniques, advanced programming tools (Python, R) and data management (SAS/Oracle/SQL).

Language

English, Català

Description

This course aims to provide an extremely practical vision of the application to real problems of modeling techniques, from the most basic to the most state-of-the art ones, in highly quantitative business environments (banking and finance).

Course goals

To learn the challenges involved in creating a model in the real world from obtaining the data to the best model to be used and the validation mechanisms depending on the type of problem. Good practices and description of the usual problems in a large modeling project.

Course contents

  1. Introduction to modeling
    1. Supervised vs. unsupervised learning
    2. Supervised:
      1. Classification
      2. Regression
    3. Metrics
      1. Classification (AUC, Gini, AR, Somers' D, Kolmogorov-Smirnov...)
      2. Regression (R^2, RMSE...)
    4. Best modeling practices (train / test partition, cross-validation, overfitting control)
  2. Feature Engineering
    1. Data processing in real life and Big Data. Typical errors in data management (future information)
    2. Missing values treatment (mean / median / percentiles, control dummy, K-Nearest Neighbors ...)
    3. Treatment and creation of new factors (categorical variables, Weight of Evidence / continuous bucketization, dummies and one-hot encoding, alert counters...)
  3. Models and applicability cases
    1. Basic problem examples:
      1. Ratings and counterparty risk (classification)
      2. Pricing (regression)
      3. Sentiment analysis (classification)
    2. Linear models (OLS / Logistic) and regularization (LASSO, RIDGE, ElasticNet)
    3. Decision trees and random forests
    4. Gradient Boosting (GBM, XGBoost, LGBM, CatBoost)
    5. Neural networks (1D CNN) for text mining and natural language processing
    6. Stacking techniques
    7. Parameter optimization (hyperparameter tuning)
    8. Factor interpretation in black-box models
      1. SHAP (effect of removing/adding each feature)
      2. LIME (local linear behavior)
  4. Calibration of classification models
    1. Need for calibration
    2. Confusion matrix and pay-off methodology
    3. Basic calibration techniques of the probability curve (binomial model)
    4. Advanced calibration techniques of the probability curve (Tasche: "The art of probability-of-default curve calibration") 
  5. Model validation
    1. Stability
    2. Performance
  6. Model development in real cases and Kaggle competitions
    1. Rating
    2. House prices
    3. Sentiment analysis of news (including web scraping)

Prerequisites

Basic algebra and statistics. Programming.

Targeted at

Students who wish to see the application of current Machine Learning techniques in a production environment. Graduates in mathematics, statistics and engineering.

Evaluation

An internal modeling contest will be held in Kaggle.

Software requirements

Python 3 will be the preferred tool for its ease of use and flexibility (latest stable version supporting tensorflow). A student can also use R if he feels more comfortable with it.