Next Generation Sequencing Workshop
March 10 - April 21 2010
Week 2 (March 17)
Alignment to reference genomes and variant detection. Emphasis will
be on the Illumina technology. Several alternative tools will be
presented, and their strengths and weaknesses will be discussed.
There is an
online discussion forum set up for this workshop. Workshop
announcements will be posted there, and it is the best place to ask
any workshop related questions, all teachers and organizers will be
monitoring this forum closely. Workshop participants will need to
register on forum website to obtain forum id before posting.
The speakers will be available to answer questions for this session on
Friday March 19 from 3:00 to 4:00 PM in 102 Weill (small conference room).
Week 2 data files:
public directories
ftp://cbsuftp.tc.cornell.edu/ngw2010/session2
protected directories
ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session2
NOTE: User id and password to access protected
files have been e-mailed to registered participants. Replace "usr" in
the link above with the user id you received.
Lecture 1.
Speaker: Qi Sun
(Computational Biology Service Unit).
Lecture 1 slides.
-
Overview of difference between alignment and
assembly.
-
Sequence Alignment
-
introduction to Hash and BWT based alignment
methods
-
things to consider when doing alignment
balance between performance and sensitivity
gapped vs. ungapped alignment
global vs. local alignment
alignment scores incorporating quality score or not
hash size and repetitive regions
paired-end alignment
reporting of ambiguous hits
-
Comparison of some commonly used short-read
alignment tools
-
SAM/BAM: A generic file format for storing large
nucleotide sequence alignments.
Lecture 2.
Speaker:
Charles Danko (Siepel
Lab)
Lecture 2 slides.
-
Input file formats.
-
SAM/BAM formats.
-
MAQ and SOAP have their own formats; any
others that Qi will discuss? -- perhaps introduce briefly.
-
Pileup format.
-
Cover different representations for
looking at SNPs.
-
Briefly mention Tablet; next-gen sequence
viewer (thanks Tristan).
-
Utilities and software for variant detection.
-
Output file formats.
-
Identifying heterozygotes.
-
The problem of limited coverage. Illustrate
by a figure on read depth vs. accuracy.
-
How to calculate the necessary sequencing
depth for an X% coverage on heterozygotes.
-
Strategies for dealing with power to identify
heterozygotes ...
-
Enrichment of target region(s) of interest,
e.g. using array capture.
-
Additional sequencing.
-
Ambiguous mapping & SNP Calling.
-
Methods for detecting structural variants
Exercise: Mapping reads to hg18
reference genome.
Download exercise instructions
here.
-
Align a fastq file to one chromosome of
human genome
-
Visualization of the BAM file with Tablet (SCRI)