Next Generation Sequencing Workshop
March 10 - April 21 2010
Week 5 (April 14)
De-novo assembly. Assembly of genomes and transcriptomes from
short-read data. Will include coverage of both paired-end Illumina
reads and 454 reads. Various tools will be discussed.
There is an
online discussion forum set up for this workshop. Workshop
announcements will be posted there, and it is the best place to ask
any workshop related questions, all teachers and organizers will be
monitoring this forum closely. Workshop participants will need to
register on forum website to obtain forum id before posting.
Week 5 data files:
public directories
ftp://cbsuftp.tc.cornell.edu/ngw2010/session5
protected directories
ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session5
NOTE: User id and password to access protected
files have been e-mailed to registered participants. Replace "usr" in
the link above with the user id you received.
Lecture 1.
De-novo genome assembly.
Speaker:
Tristan Lefebure
(Stanhope Lab).
Lecture 1 slides.
-
(A very brief tour of) shotgun sequencing de novo
assembly methods and concepts (5') Go quickly over the concept of
assemblies and the vocabulary used in de novo assemblies
(particularly in velvet):
-
Overlap-layout-consensus and the Hamiltonian
path
-
Eulerian path
-
Scaffolding with paired-end reads
-
Some vocabulary: coverage, k-mer
coverage, N50
-
The short read conundrum (5')
-
Programs available, pro and cons (2')
-
454 data: newbler, CABOG,
-
Illumina: Velvet, ALLPATH2, EULER
-
Example one: 454 50/50 PE assembly with newbler
(13')
-
The data: simulation of reads from a
bacterial genome or find a complete genome with 454 data, file
formats and content, ...
-
Running newbler
-
Outputs description
-
Mapping back on the genome: where did
the assembler failed?
-
Genome finishing
-
Example two: Illumina 70bp mate-pair reads
assembly with Velvet (21')
-
The data: simulation of reads from a
bacterial genome or find a complete genome with 454 data, file
formats and content, ...
-
Running Velvet: kmer length, expected
coverage, insert length...
-
Output description and choice of an
assembly
-
Visualizing the assembly graph
-
Mapping back on the genome: where did
the assembler failed?
Exercise 1:
Download exercise instructions
here.
Lecture 2.
Transcriptome assembly (45').
Speaker:
Zhangjun Fei (Boyce
Thompson Institute).
Lecture 2 slides.
-
Raw sequence cleaning (~5 min)
-
remove low quality regions, adaptor and
primer sequences, and all other possible contaminations, e.g.,
rRNA).
-
Tools: lucy, seqclean
-
Assemble 454 transcriptome sequences (~15 min)
-
Newbler will not be covered here as
it's not suitable for transcriptome assembly
-
Problems of the most commonly used
assembly programs in the published papers: CAP3 and MIRA.
-
Focus o an assembly pipeline developed
by my group called iAssembler (iterative assembler for 454
and/or Sanger EST sequences).
-
Sequencing transcriptome of orphan organisms
using Illumina. (~5 min)
-
Challenge for de novo assembly (velvet,
Oases (http://www.ebi.ac.uk/~zerbino/oases/))
-
Alternative strategy (generate 454
sequence first, then align Illumina reads to the 454 assembled
sequences).
Exercise 2:
Download exercise instructions
here.
-
processing (lucy,
seqclean) and assembly (iAssembler, CAP3, MIRA) of a small 454
dataset (~10,000 -20,000 reads).