Share:

Data Visualization

Date:

June 27 to July 1. MORNING: 9 to 12h

Classroom:

Not defined yet

Instructor

Pere-Pau Vázquez.

Pere-Pau Vázquez is an associate professor with the Computer Science department at UPC. He is member of the Research Center for Visualization, Virtual Reality and Graphics Interaction (ViRVIG).

He works in the areas of Visualization and Computer Graphics. His current interests are mostly related to visualization of molecular data, large data sets, perception, and interaction in Virtual Reality environments. After graduating in Computer Science (1999), he obtained a PhD in Software (2003) at Universitat Politècnica de Catalunya.  

Language

English

Description

The adequate visualization of all sorts of data is very important in the communication of results, as well as the creation of tools that enable the analysis of complex datasets.
In order to achieve this, there are many important decisions that must be taken, which include considering the audience, the message, the data, etc. But more importantly, visualizations must be guided by the tasks that users should achieve through them. Be it understanding a certain phenomenon, assessing a situation... The result is conditioned both on the data questions the users have, as well as other contour conditions such as the media where the visualization must be published (e.g., paper, slide, web...).
Thus, visualization authors should have a certain background on tools (to evaluate which is the most suitable), data cleaning (to ensure the visualization represents the data faithfully), visualization techniques (to facilitate the selection of the most adequate technique), and some skills regarding at least one tool of visualization design.

Course goals

The course is designed to give the students the initial skills necessary to create visualizations that solve the desired tasks. At the same time, the course intends to teach how to create interactive visualizations using a Python library. At the end of the course, students should be able to obtain a dataset, clean it, propose a visualization design, and implement an interactive technique. Moreover, some notions on the evaluation of the result will also be given.

In order to do so, it is necessary to address some theoretical aspects, such as the visualization pipeline, human perception, and visualization techniques. Moreover, the students will also develop visualizations using a library, so the course will be highly hands-on, where at each session, part of the course will be devoted to the development of visualizations.

Course contents

To achieve the aforementioned goals, I have in mind a course that interleaves theoretical contents and practical work. Some concepts will be given as a typical session, but everyday these concepts will be applied in practice. The result is a design with the following sections (where T means theory explanation and P means practical work):

  • (T) Introduction to visualization: Goals, methods, users, tasks, and data. 
  • (P) Data cleaning.
  • (T) Effective visualizations: Important aspects to design an effective visualization, including perception, focus, storytelling...
  • (T and P) Visualization techniques: The most common visualization techniques, their implementation using altair, and how they should be used. 
  • (T and P) Multiple views: Designing visualization applications using multiple views, cross-highlighting, data partitioning... Its implementation using altair.
  • (T and P) Advanced visualization techniques. In this section I will visit some techniques such as maps, cartograms, text visualization, etc.

Prerequisites

Preferably, users should be familiar with Python, but not deep knowledge is required.
The course will use a library, named altair, which has a good balance between the expressivity (what it can achieve) and the level of programming required to use it, which is relatively low.
It is an introductory course, so no previous visualization knowledge is expected.

Targeted at

Anybody that needs to present data as a result of research (but most of the knowledge can also be applied to business, for example).

Evaluation

The evaluation will consist on the development of a visualization project using altair (a Python library).

Software requirements

In order to create the visualizations, we will use Google Colab (https://colab.research.google.com/), a free Python notebook that runs in any browser, and that only requires a Google account to be used. Besides, for data wrangling, we will use Open Refine (https://openrefine.org/), also a free tool that does not require permissions to install.