ANALYSIS OF
HIGH-THROUGHPUT GENOMIC DATA
FALL 2020
Instructor: |
Guan-Hua Huang, Ph.D. |
|
Office: 423 Joint Education Hall |
|
Phone: 03-513-1334 |
|
Email: ghuang@stat.nctu.edu.tw |
Class meetings: |
Monday 9:00 - 12:00 at 406
Joint Education
Hall |
Office hours: |
By
appointment |
Class website: |
|
Credit: |
Three (3) credits |
Novel
statistical methodology can enhance understanding of the interactions between
multiple genes and environmental factors on a complex disease. The massive
amount of high-throughput genomic data brings a great
challenge of developing advanced statistical and computational data mining
tools. In this course, we will go through some effective statistical methods for analyzing
these high-throughput data. The course especially focuses on three types of
high-throughput data: gene expression microarray, single
nucleotide polymorphism (SNP) markers, and next-generation
sequencing (NGS) reads.
Topics include
Gene
expression:
-
Technology and
measurement
-
Quality assessment
-
Preprocessing Affymetrix
GeneChip: background adjustment, normalization and
summarization
-
Differential expression
-
Clustering and prediction
-
Gene set enrichment analysis
SNP markers:
-
Preliminary analyses: Hardy-Weinberg equilibrium, haplotype and genotype
data, measures of linkage disequilibrium, estimates of recombination rates, SNP
tagging
-
Population-based
association study: case-control and family study
-
Candidate-gene and genome-wide association studies
-
Population stratification
-
Tests of association: single and multiple SNPs
-
Epistatic
effects and gene-environment interactions
-
Multiple testing
NGS reads
-
DNA sequencing
-
Next-generation sequencing platforms
-
1000 Genomes project
-
Genotype and SNP calling from NGS
-
Tests of association for common and rare SNPs
-
Structural variation in the human genome
-
Best practice in analyzing next-generation
sequencing data
Handouts corresponding to each lecture
will be available on the class website before each class. There is no required textbook for this course.
Following books are recommended for further reading:
Gentleman R,
Carey VJ, Huber W,
Irizarry RA, Dudoit S (Editor) (2005). Bioinformatics and Computational Biology Solutions
Using R and Bioconductor. Springer.
Draghici S (2012). Statistics
and Data Analysis for Microarrays Using R and Bioconductor, 2nd Edition.
Chapman & Hall/CRC Press.
Datta S, Nettleton D (Editor) (2014). Statistical Analysis of Next Generation Sequencing Data. Springer.
Thomas DC
(2004). Statistical Methods in Genetic Epidemiology.
Students
are expected to be familiar with computer languages R and Bioconductor.
Background on probability and mathematical statistics is required.
The course grade will be based on three homework assignments, attendance,
participation and a final project.