Workshops

Next Generation Sequencing Workshop
March 10 - April 21 2010

Week 2 (March 17)
Alignment to reference genomes and variant detection. Emphasis will be on the Illumina technology. Several alternative tools will be presented, and their strengths and weaknesses will be discussed.

There is an online discussion forum set up for this workshop. Workshop announcements will be posted there, and it is the best place to ask any workshop related questions, all teachers and organizers will be monitoring this forum closely. Workshop participants will need to register on forum website to obtain forum id before posting.

The speakers will be available to answer questions for this session on Friday March 19 from 3:00 to 4:00 PM in 102 Weill (small conference room).

Week 2 data files:
public directories ftp://cbsuftp.tc.cornell.edu/ngw2010/session2
protected directories ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session2
NOTE: User id and password to access protected files have been e-mailed to registered participants. Replace "usr" in the link above with the user id you received.

Lecture 1.
Speaker: Qi Sun (Computational Biology Service Unit).

Lecture 1 slides.

Overview of difference between alignment and assembly.
Sequence Alignment
1. introduction to Hash and BWT based alignment methods
2. things to consider when doing alignment
  balance between performance and sensitivity
  gapped vs. ungapped alignment
  global vs. local alignment
  alignment scores incorporating quality score or not
  hash size and repetitive regions
  paired-end alignment
  reporting of ambiguous hits
Comparison of some commonly used short-read alignment tools
SAM/BAM: A generic file format for storing large nucleotide sequence alignments.

Lecture 2.

Speaker: Charles Danko (Siepel Lab)

Lecture 2 slides.

Input file formats.
- SAM/BAM formats.
- MAQ and SOAP have their own formats; any others that Qi will discuss? -- perhaps introduce briefly.
- Pileup format.
- Cover different representations for looking at SNPs.
- Briefly mention Tablet; next-gen sequence viewer (thanks Tristan).
Utilities and software for variant detection.
- Cover MAQ, GigaBayes, Pyrobayes in some detail.
- Touch on others
Output file formats.
Identifying heterozygotes.
- The problem of limited coverage. Illustrate by a figure on read depth vs. accuracy.
- How to calculate the necessary sequencing depth for an X% coverage on heterozygotes.
- Strategies for dealing with power to identify heterozygotes ...
- Enrichment of target region(s) of interest, e.g. using array capture.
- Additional sequencing.
Ambiguous mapping & SNP Calling.
- To date, most studies remove these.
- Using UCSC tracks to filter out repetitive positions.
Methods for detecting structural variants

Exercise: Mapping reads to hg18 reference genome.

Download exercise instructions here.

Align a fastq file to one chromosome of human genome
Visualization of the BAM file with Tablet (SCRI)

Website credentials:

Web Accessibility Help