Next Generation Sequencing Workshop
March 10 - April 21 2010

Week 2 (March 17)
Alignment to reference genomes and variant detection. Emphasis will be on the Illumina technology.  Several alternative tools will be presented, and their strengths and weaknesses will be discussed.  

There is an online discussion forum set up for this workshop.  Workshop announcements will be posted there, and it is the best place to ask any workshop related questions, all teachers and organizers will be monitoring this forum closely. Workshop participants will need to register on forum website to obtain forum id before posting.

The speakers will be available to answer questions for this session on Friday March 19 from 3:00 to 4:00 PM in 102 Weill (small conference room).

Week 2 data files:
            public directories ftp://cbsuftp.tc.cornell.edu/ngw2010/session2
            protected directories ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session2
NOTE: User id and password to access protected files have been e-mailed to registered participants. Replace "usr" in the link above with the user id you received.

Lecture 1. 
Speaker:  Qi Sun (Computational Biology Service Unit).

 

Lecture 1 slides.

 

  1. Overview of difference  between alignment and assembly.

  2. Sequence Alignment

    1. introduction to Hash and BWT based alignment methods

    2. things to consider when doing alignment            
      balance between performance and sensitivity            
      gapped vs. ungapped alignment            
      global vs. local alignment            
      alignment scores incorporating quality score or not            
      hash size and repetitive regions            
      paired-end alignment            
      reporting of  ambiguous hits

  3. Comparison of some commonly used short-read alignment tools

  4. SAM/BAM: A generic file format for storing large nucleotide sequence alignments.

 

Lecture 2.  

Speaker: Charles Danko (Siepel Lab)

 

Lecture 2 slides.

  1. Input file formats.             

    • SAM/BAM formats.            

    • MAQ and SOAP have their own formats; any others that Qi will discuss? -- perhaps introduce briefly.            

    • Pileup format.            

    •  Cover different representations for looking at SNPs.            

    • Briefly mention Tablet; next-gen sequence viewer (thanks Tristan).  

  2. Utilities and software for variant detection.              

    • Cover MAQ, GigaBayes, Pyrobayes in some detail.              

    • Touch on others  

  3. Output file formats.            

  4. Identifying heterozygotes.            

    • The problem of limited coverage.  Illustrate by a figure on read depth vs. accuracy.            

    • How to calculate the necessary sequencing depth for an X% coverage on heterozygotes.            

    • Strategies for dealing with power to identify heterozygotes ...            

    • Enrichment of target region(s) of interest, e.g. using array capture.            

    • Additional sequencing.           

  5. Ambiguous mapping & SNP Calling.            

    • To date, most studies remove these.            

    • Using UCSC tracks to filter out repetitive positions.  

  6. Methods for detecting structural variants            

Exercise: Mapping reads to hg18 reference genome.

Download exercise instructions here.  

  1.  Align a fastq file to one chromosome of human genome

  2. Visualization of the BAM file with Tablet (SCRI)

 
Website credentials: login  Web Accessibility Help