Latent Structures based-Multivariate Statistical Process Control: a paradigm shift

Course title

Faculty

Alberto Ferrer. Multivariate Statistical Engineering Research Group. Department of Applied Statistics, Operations Research and Quality. Technical University of Valencia. Camino de Vera s/n Edificio 7A. 46022 Valencia, Spain. http://mseg.webs.upv.es/index.html.

aferrer@eio.upv.es

Alberto Ferrer is currently Professor of Statistics at the Department of Applied Statistics, Operation Research and Quality, and Head of the Multivariate Statistical Engineering Research Group (mseg.webs.upv.es/index.html) at the Universitat Politècnica de València (Spain). His main research interests focus on statistical techniques for process knowledge, quality and productivity improvement, especially those related to multivariate statistical projection methods for both continuous and batch processes. Prof. Ferrer served as Associate Editor of Technometrics (2008-2010). He is currently member of the Editorial Board of Quality Engineering, and member of the International Society for Business and Industrial Statistics (ISBIS) and European Network for Business and Industrial Statistics (ENBIS). He is also active as industrial consultant on Process Analytical Technology (PAT), Process Chemometrics, Quality Improvement & Innovation, and Six Sigma.

Course language

English

Course schedule

June 30 to July 1: 3:00pm to 8:30pm
July 2: 3:00pm to 7:00pm

Type of activity and class load

15 hours classroom course.

Description

This short course tries to provide a platform for discussion of ideas at the frontiers of statistics and quality research focusing in a particular issue: Statistical Process Control. The basic fundamentals of statistical process control (SPC) were proposed by Walter Shewhart for data-starved production environments typical in the 1920´s and 1930´s. In the 21st century the traditional scarcity of data has given way to a data-rich environment typical of highly automated and computerized modern processes. These data often exhibit high correlation, rank deficiency, low signal-to-noise ratio, multistage and multi-way structure, and missing values. Conventional univariate and multivariate statistical process control techniques are not suitable to be used in these environments. This short course discusses the paradigm shift to which those interested in the quality improvement field should pay keen attention, and advocates the use of latent structured-based multivariate statistical process control methods (LSbMSPC), such as principal component analysis (PCA) and partial least squares (PLS), as efficient quality improvement tools in these data-massive contexts and a strategic issue for industrial success in the tremendous competitive global market. All the methods will be illustrated through real case studies using specialized software.

Evaluation

During second and third days students will complete a “Quiz on previous day” (20~minutes). These quizzes will be one part of the evaluation (20%). Additionally, the students will have to send a report on a small project after the course (80%).

Classroom

PC1

End comments about Industrial process and Big Data

Industrial process data has at least 3 V's of Big Data:

Variety. Industrial process data has:

real time measurements, like temperatures, pressures, and flows
periodic lab measurements, like viscosity of fluids or counts of living cells
array data from spectral instruments like near infrared or raman spectrometers

Velocity. Real-time monitoring of industrial process data implies a velocity that depends on the system dynamics. And as manufacturing equipment becomes more highly instrumented and connected, aka the industrial “internet of things”, there will be more data streams to be analyzed.

Veracity. In Big Data terms, veracity means problems in data accuracy and integrity. Industrial process data has noise in the measurements and missing data values. Missing data happens because of data connectivity issues, sensor malfunctions, or sporadic testing. But that’s ok! We use multivariate analysis methods, which handle noise and missing data implicitly.

Volume. In typical areas of Big Data, there are huge numbers of observations like phone calls being made, internet searches being done, or cars on the highway. Industrial process data can have many observations too, but it also has many variables - hundreds of process sensors, raw material data, and QA lab measurements. We have yet to encounter a data set that requires distributed computing, but from a traditional statistics perspective, industrial data is big and messy. Traditional methods can't handle:

diverse data blocks that must be combined for analysis (process measurements, QA data, raw material properties)
huge numbers of highly-correlated measurements
simultaneous prediction of multiple y-variables.