NATIONAL CHIAO TUNG UNIVERSITY

INSTITUTE OF STATISTICS

 

MULTIVARIATE ANALYSIS

SPRING 2023

 

 

 


Instructor:

Guan-Hua Huang, Ph.D.

 

Office: A423 Joint Education Hall

 

Phone: 03-513-1334

 

Email: ghuang@nycu.edu.tw

Class meetings:

Thursday 9:00-12:00 at A406 Joint Education Hall

Office hours:

By appointment

Class website:

http://ghuang.stat.nycu.edu.tw/course/multivariate23/

Credit:

Three (3) credits

 

COURSE SUMMARY

 

The aims of this course are

 

Ÿ   To illustrate extensions of univariate statistical methodology to multivariate data.

Ÿ   To introduce students to some of the distinctive statistical methodologies which arise only in multivariate data.

Ÿ   To introduce students to some of the computational techniques required for multivariate analysis available in standard statistical packages.

 

Topics include multivariate techniques and analyses, multivariate analysis of variance, principal component analysis and factor analysis, canonical correlation analysis, cluster analysis, discrimination and classification, and machine learning.

 

The course uses the R software for statistical computing. Students are expected to be familiar with the usage of the software.

 

HANDOUTS AND TEXTBOOKS

 

Handouts corresponding to each lecture will be available on the class website before each class. The required textbook for this course is:

 

Johnson, R.A. and Wichern, D.W., 2007. Applied Multivariate Statistical Analysis (6th Edition). Prentice Hall, Upper Saddle River, NJ, USA.

 

The following book is recommended for further reading:

 

Hastie, Tibshirani and Friedman, 2009. The Elements of Statistical Learning (2nd edition). Springer, New York, NY, USA.

 

Reading assignments will be made primarily in these two books.

 

PREREQUISITES

 

Students are expected to have a background in undergraduate linear algebra, probability, mathematical statistics, and linear regression. Computer programming knowledge on R and/or C/C++ is required.

 

METHOD OF STUDENT EVALUATION

 

The course grade will be based on 5 homework assignments (50%), 1 midterm exam (20%), and 1 final exam (30%).

 

COURSE OUTLINE

 

Readings refer to:

Johnson, R.A. and Wichern, D.W., 2007. Applied Multivariate Statistical Analysis (6th Edition). (AMSA),

Hastie, Tibshirani and Friedman, 2009. The Elements of Statistical Learning (2nd edition). (ESL)

 

Module

Topic

Reading (pages)

1

Aspects of multivariate analysis:

-   introduction

-   review of linear algebra and matrices

AMSA: 1-30, 49-110

2

Random vectors and random sampling:

-   random vectors/matrices

-   distance

-   the sample

-   random sampling of the sample mean vector and covariance matrix

-   generalized variance

-   matrix operations of sample values

AMSA: 30-37, 60-78,

111-148

3

Multivariate normal distribution:

-   density and properties

-   sampling from multivariate normal and MLE

-   sampling distribution and large sample behavior of  and S

-   assessing the assumption of normality

-   transformation to near normality

AMSA: 149-200

4

Inferences about a mean vector:

-   inference for a normal population mean

-   Hotelling's T2 and likelihood ratio test

-   confidence regions and simultaneous comparisons of component means

-   large sample inferences about a population mean vector

AMSA: 210-238

5

Comparisons of several multivariate means:

-   paired comparisons and repeated measures design

-   comparing mean vectors from two populations

-   comparing several multivariate population means (one-way MANOVA)

AMSA: 273-312

6

Principal components:

-   introduction

-   population principal components

-   summarizing sample variation by principal components

-   large sample inferences

AMSA: 430-459

7

Factor analysis:

-   introduction

-   orthogonal factor model

-   methods of estimation

-   factor rotation

-   factor scores

AMSA: 481-526

8

Canonical correlation analysis:

-   introduction

-   population and sample canonical variates and canonical correlations

-   sample descriptive measures of goodness

AMSA: 539-563

9

Clustering:

-   introduction

-   similarity measures

-   hierarchical clustering methods

-   k-means clustering methods

-   multidimensional scaling

AMSA: 671-715

10

Discrimination and classification:

-   introduction

-   separation and classification for two populations

-   classification with two multivariate normal populations

-   evaluating classification functions

-   fisher discriminant function

-   classification with several population

AMSA: 575-644

11

Machine learning

-   classification and regression tree

-   neural networks

-   support vector machine

-   ensemble learning

ESL:

-  305-317, 587-603

-  389-409

-  129-135, 417-438

-  Section 8.7, Chapter 10