NATIONAL CHIAO TUNG UNIVERSITY

INSTITUTE OF STATISTICS

 

ANALYSIS OF HIGH-THROUGHPUT GENOMIC DATA: EXPRESSION AND SNP

SPRING 2008

 

 

 


Instructor:

Guan-Hua Huang, Ph.D.

 

Office: 423 Joint Education Hall

 

Phone: 03-513-1334

 

Email: ghuang@stat.nctu.edu.tw

Class meetings:

Thursday 9:00 - 12:00 at 407 Joint Education Hall

Office hours:

By appointment

Class website:

http://www.stat.nctu.edu.tw/subhtml/source/teachers/ghuang/course/expsnp08/

Credit:

Three (3) credits

 

COURSE SUMMARY

 

Novel statistical methodology can enhance understanding of the interactions between multiple genes and environmental factors on a complex disease. The massive amount of high-throughput genomic data brings a great challenge of developing advanced statistical and computational data mining tools. In this course, we will go through some effective statistical methods for analyzing these high-throughput data. The course especially focuses on two types of high-throughput data: gene expression microarray and single nucleotide polymorphism (SNP) markers.

 

Topics include

Ÿ          Gene expression:

-         Technology and measurement

-         Quality assessment

-         Preprocessing Affymetrix GeneChip: background adjustment, normalization and summarization

-         Differential expression

-         Clustering and prediction

-         Gene set enrichment analysis

Ÿ          SNP markers:

-         Preliminary analyses: Hardy-Weinberg equilibrium, haplotype and genotype data, measures of linkage disequilibrium, estimates of recombination rates, SNP tagging

-         Population-based association study: case-control and family study

-         Candidate-gene and genome-wide association studies

-         Population stratification

-         Tests of association: single and multiple SNPs

-         Epistatic effects and gene-environment interactions

-         Multiple testing

 

HANDOUTS AND TEXTBOOKS

 

Handouts corresponding to each lecture will be available on the class website before each class. There is no required textbook for this course. Following books are recommended for further reading:

 

Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (Editor) (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer.

 

Thomas DC (2004). Statistical Methods in Genetic Epidemiology. Oxford.

 

Collins AR (Editor) (2007). Linkage Disequilibrium and Association Mapping: Analysis and Applications. Humana Press.

 

PREREQUISITES

 

Students are expected to be familiar with computer languages R and Bioconductor. Background on probability and mathematical statistics is required.

 

METHOD OF STUDENT EVALUATION

 

The course grade will be based on homework assignments and a final project.