Next Generation Sequencing
Workshop
March 10 - April 21 2010
Week 1 (March 10)
Overview of next-generation sequencing platforms and
applications. Will include discussion of related technologies (seq-cap,
multiplexing), error rates and quality control, and an overview of software and
computing resources.
There is an
online discussion forum set up for this workshop. Workshop
announcements will be posted there, and it is the best place to ask any workshop
related questions, all teachers and organizers will be monitoring this forum
closely. Workshop participants will need to register on forum website to obtain
forum id before posting.
The speakers will be available to answer questions for this session on
Friday March 12 from 3:00 to 4:00 PM in 102 Weill (small conference room).
Week 1 data files:
public directories ftp://cbsuftp.tc.cornell.edu/ngw2010/session1
protected directories
ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session1
NOTE: User id and password to access protected files have
been e-mailed to registered participants. Replace "usr" in the link above with
the user id you received.
Lecture 1.
The instruments, the runs, the QC metrics, and the output
Speaker: Peter Schweitzer (DNA Sequencing and Genotyping Core Facility).
Lecture 1 slides.
Roche/454 GS-FLX
Illumina/Solexa
-
Types of runs available
-
Evaluating run results
-
Summary stats page, error graphs, images
-
Errors produced
-
Data files produced and distributed
Both platforms
Lecture 2.
Working with the data files.
Speaker: Qi Sun
(Computational Biology Service Unit).
Lecture 2 slides.
-
What these data files represent: an in-depth look into the FASTQ
and SFF files
-
Error distribution and quality scores in both platforms
-
Using FASTX toolkit for QC, trimming and de-multiplexing of
Solexa data
-
Introduction to the Roche 454 Software package
-
Computational resources at Cornell.
Exercise:
Download exercise instructions
here.
-
Use FASTX toolkit to generate report of data quality, and do
quality trimming
-
Use 454 GSMapper to align reads to E.coli genome, and inspect the
flowgram intensity and base calling errors.