Python, more python, pandas and sklearn - June 28th to July 2nd

Date:

June 28th to July 2nd. Morning from 9.00 to 12.00h

Classroom:

005

Modality:

Face-to -Face  or Streaming

Instructor

Alexandre Perera Lluna

Alexandre Perera Lluna (1973) holds a degree in Physics (1996, UB), Electronic Engineer (2001, UB) and a PhD in Physics  (2003 UB), postdoctoral fellow at Texas A&M University (Tx, USA, 2003-2004) and EADS European Aeronautic Defence and Space Company (CRC Forschung,München, DE, 2005), Ramon y Cajal Fellow (2007) is currently tenured at the Polytechnic University of Catalonia (2013). He is double affiliated as researcher of the Institut de Recerca de Sant Joan de Déu.  

He has participated in five national projects and three European projects (as an IP) as well as several industrial projects for technology transfer. Author of more than 60 papers in peer-review journals, five patents and more than 60 contributions to national and international conferences.  He is currently the coordinator of the research group B2Slab (http://b2slab.upc.edu) Bioinformatics and Biomedical Signals Laboratory and head of the Biomedical Research Center  and deputy head for research of the automatic control department at UPC.

His research covers artificial intelligence algorithms, multivariate statistics and machine learning applied to bioinformatics and bioengineering.

Language

English

Description

This course will cover a crash course for scientific Python for data analysis. This crash course will include three main stages:

    • Introduction to Python language as a tool. Workflow, ipython, ipython notebook (jupyter), basic types, mutability and inmutability and object oriented programming.
    • Short introduction to numerical Python and matplotlib for graphical visualization.
    • Introduction to scientific kits for data analysis with machine learning. Pandas. Principal components analysis, clustering and supervised analysis with multivariate data.

Course goals

    • To gain prosficiency in coding python, understand basic types
    • To learn how to build generators and cogenerators
    • To learn how to build and manage data-frame-based representations of data
    • To learn how to use machine learning scientific kit (sklearn)

Course contents

1. Introduction

  • a. Why Python?
  • b. Python History
  • c. Installing Python
  • d. Python resources

2. Working with Python

  • a. Workflow
  • b. ipython vs. CLI
  • c. Text Editors
  • d. IDEs
  • e. Notebook

3. Getting started with Python

  • a. Introduction
  • b. Getting Help
  • c. Basic types
  • d. Mutable and in-mutable
  • e. Assignment operator
  • f. Controlling execution flow
  • g. Exception handling

4. Functions and Object Oriented Programming

  • a. Defining Functions
  • b. Input and Output
  • c. Standard Library
  • d. Object-oriented programming

5. Introduction to NumPy

  • a. Overview
  • b. Arrays
  • c. Operations on arrays
  • d. Advanced arrays (ndarrays)
  • e. Notes on Performance (\%timeit in ipython)

6. Matplotlib

  • a. Introduction
  • b. Figures and Subplots
  • c. Axes and Further Control of Figures
  • d. Other Plot Types
  • e. Animations

7. Python scikits

  • a. Introduction
  • b. Pandas

8. scikit-learn

  • a. Datasets
  • b. Sample generators
  • c. Unsupervised Learning
  • d. Supervised Learning
    • i. Linear and Quadratic Discriminant Analysis
    • ii. Nearest Neighbors
    • iii. Support Vector Machines
  • e. Feature Selection

9. Practical Introduction to Scikit-learn

  • a. Solving an eigenfaces problem
    • i. Goals
    • ii. Data description
    • iii. Initial Classes
    • iv. Importing data
  • b. Unsupervised analysis
    • i. Descriptive Statistics
    • ii. Principal Component Analysis
    • iii. Clustering
  • c. Supervised Analysis
    • i. k-Nearest Neighbors
    • ii. Support Vector Classification
    • iii. Cross validation
  • d. Practical Challenge

Prerequisites

Basic coding skills preferable

Targeted at

People aiming to learn python from scratch up to data management in python.

Evaluation

Final short assignment

Computer class or student's laptop?

Student's laptop

Software requirements

Python3, jupyter notebook, pandas, and sklearn. Open source.