Introduction to Compositional Data Analysis
Date:
July 1 to 4. MORNING: 9 to 12 (July 1, 2, and 3) and 9 to 13h (July 4)
Instructor
Maribel Ortego Martínez.
Associate Professor. Applied Mathematics and Statistics Section of the Department of Civil and Environmental Engineering, Universitat Politècnica de Catalunya-BarcelonaTECH. ma.isabel.ortego.upc.edu.
Maribel Ortego holds a Master's Degree in Mathematics (Licenciatura) and a Degree in Statistics (Diplomatura) from the Universitat Autònoma de Barcelona (UAB) and a PhD from the Universitat Politècnica de Catalunya (UPC). She works on problems related to compositional data analysis. Other areas of interest are the modelling of extremal events and the modelling of dependence between variables by means of copula functions.
Berta Ferrer Rosell
Serra Hunter Associate Professor of Marketing in the Department of Economics and Business of the Universitat de Lleida. berta.ferrer.udl.cat.
Berta Ferrer-Rosell holds a Diploma in Tourism (UB) and a Master's Degree in Tourism Management and Planning (UdG), and a PhD from the Universitat de Girona (UdG) and the Universitat de les Illes Balears (UIB). She applies the methodology of compositional data analysis in her research in the field of tourism marketing and digital marketing.
Language
English
Description
Compositional data are vectors that show the relative importance of parts of a whole. Typical examples are data in percentages, in ppm, ppb, or similar, frequent in many fields of science, such as geosciences, environmental sciences, biology, medicine, and social sciences.
The classical statistical analysis of this type of data presents multiple problems, including spurious correlation. As a solution to these problems, J. Aitchison introduced the log-ratio approach in the 1980s. Since then, progress has been made in understanding the geometry of the sample space, the Simplex of D parts (PawEgTol2015, Ortego2022).
The course aims to introduce attendees to the basic principles and methods of compositional data analysis, how to apply them with specific software (CoDaPack and R), and how to interpret the obtained results.
The course combines theoretical classes with practical data analysis.
[1] V. Pawlowsky-Glahn, J. J. Egozcue, and R. Tolosana-Delgado. Modeling and Analysis of Compositional Data. Wiley, 2015.
[2] María Isabel Ortego. Compositional data. In Wiley StatsRef: Statistics Reference Online, pages 1–12. John Wiley & Sons, Ltd, London, 2022.
Course goals
The course aims to introduce attendees to the basic principles and methods of compositional data analysis, how to apply them with specific software (CoDaPack and R), and how to interpret the obtained results. The course combines theoretical classes with practical data analysis.
Course contents
- Day 1: What are Compositional data?; principles; compositional equivalence; sample space; Aitchison geometry in the Simplex.
Practice: Representation of compositions in the ternary (CoDaPack); spurious correlation; introduction to R-compositions; data processing in the ternary. - Day 2: Compositional coordinates (clr, ilr, SBP). Elementary statistics (variability, center, total variance). Normal distribution in the Simplex. Irregular data: zeros, missing, outliers.
Practice: computation and interpretation of the variation matrix and the clr, ilr/olr coordinates in R and CoDaPack. - Day 3: Exploratory analysis (variation array, biplot).
Practice: Calculation and interpretation of variation array and biplot. - Day 4: Design of coordinates. Regression with compositional response.
Practice: Computation and interpretation of the Coda-Dendogram. Regression with compositional response. - Day 5: Hands-on session.
Prerequisites
Recommended prerequisites:
- Univariate statistical analysis.
- Basic knowledge of multivariate statistics.
- Introductory courses of algebra and calculus.
- Experience with standard software: R, MS-Excel, SPSS, Minitab or similar.
Targeted at
- Students of MESIO UPC-UB
- Other UPC Students
- Researchers and professionals that deal with Compositional Data in their Applications
Evaluation
The students will solve practice exercises at the end of each session, in order to consolidate the knowledge of the session. The assessment will be based on the deliverables of these tasks and the Hands-on Session.
Software requirements
All the software used in the sessions is freely available:
- CoDaPack 2.03.06 or higher (Developed by Marc Comas: Marc.Comas@udg.edu)
- compositions (in R) (Developed by Raimon Tolosana: r.tolosana@hzdr.de and Gerald van den Boogaart: boogaart @math.tu- freiberg.de).
- rob-Compositions (in R) (Developed by Matthias Templ: Matthias.Templ@gmail.com).
- zCompositions (in R) (Developed by Josep Antoni Martín-Fernández: josepantoni.martin@udg.edu and Javier Palarea-Albaladejo: javier@bioss.sari.ac.uk)
Share: