Workshops

Next Generation Sequencing Workshop
March 10 - April 21 2010

Week 5 (April 14)
De-novo assembly. Assembly of genomes and transcriptomes from short-read data. Will include coverage of both paired-end Illumina reads and 454 reads. Various tools will be discussed.

There is an online discussion forum set up for this workshop. Workshop announcements will be posted there, and it is the best place to ask any workshop related questions, all teachers and organizers will be monitoring this forum closely. Workshop participants will need to register on forum website to obtain forum id before posting.

Week 5 data files:
public directories ftp://cbsuftp.tc.cornell.edu/ngw2010/session5
protected directories ftp://usr@cbsuftp.tc.cornell.edu/ngw2010p/session5
NOTE: User id and password to access protected files have been e-mailed to registered participants. Replace "usr" in the link above with the user id you received.

Lecture 1. De-novo genome assembly.
Speaker: Tristan Lefebure (Stanhope Lab).

Lecture 1 slides.

(A very brief tour of) shotgun sequencing de novo assembly methods and concepts (5') Go quickly over the concept of assemblies and the vocabulary used in de novo assemblies (particularly in velvet):
- Overlap-layout-consensus and the Hamiltonian path
- Eulerian path
- Scaffolding with paired-end reads
- Some vocabulary: coverage, k-mer coverage, N50
The short read conundrum (5')
- Shorter reads and higher error rate
- Need higher coverage - large data set - lots of memory
Programs available, pro and cons (2')
- 454 data: newbler, CABOG,
- Illumina: Velvet, ALLPATH2, EULER
Example one: 454 50/50 PE assembly with newbler (13')
- The data: simulation of reads from a bacterial genome or find a complete genome with 454 data, file formats and content, ...
- Running newbler
- Outputs description
- Mapping back on the genome: where did the assembler failed?
- Genome finishing
Example two: Illumina 70bp mate-pair reads assembly with Velvet (21')
- The data: simulation of reads from a bacterial genome or find a complete genome with 454 data, file formats and content, ...
- Running Velvet: kmer length, expected coverage, insert length...
- Output description and choice of an assembly
- Visualizing the assembly graph
- Mapping back on the genome: where did the assembler failed?

Exercise 1:

Download exercise instructions here.

Lecture 2. Transcriptome assembly (45').

Speaker: Zhangjun Fei (Boyce Thompson Institute).

Lecture 2 slides.

Raw sequence cleaning (~5 min)
- remove low quality regions, adaptor and primer sequences, and all other possible contaminations, e.g., rRNA).
- Tools: lucy, seqclean
Assemble 454 transcriptome sequences (~15 min)
- Newbler will not be covered here as it's not suitable for transcriptome assembly
- Problems of the most commonly used assembly programs in the published papers: CAP3 and MIRA.
- Focus o an assembly pipeline developed by my group called iAssembler (iterative assembler for 454 and/or Sanger EST sequences).
Sequencing transcriptome of orphan organisms using Illumina. (~5 min)
- Challenge for de novo assembly (velvet, Oases (http://www.ebi.ac.uk/~zerbino/oases/))
- Alternative strategy (generate 454 sequence first, then align Illumina reads to the 454 assembled sequences).

Exercise 2:

Download exercise instructions here.

processing (lucy, seqclean) and assembly (iAssembler, CAP3, MIRA) of a small 454 dataset (~10,000 -20,000 reads).

Website credentials:

Web Accessibility Help