NOTE: This workshop if full. Registration for the second edition on December 12 is now open - please register here.
This one-session workshop will deal with processing of large bioinformatics data files using Linux tools.
When working with genomics or transcriptomics data, we often need to process large text data files that are too big to open, for example, in Excel. In this one-session workshop, we will demonstrate how to use Linux tools such as awk, sed, cut, paste, sort, uniq, etc., to filter, transform, and analyze such files. We will also introduce bedtools - a software package designed for efficient processing of large genomics interval files. Some of the commonly used bioinformatics data file formats, including GFF, BED, and FASTQ will be covered in the examples. We will illustrate how to process multiple files simultaneously using multiple CPU cores.
The presented material will be illustrated by hands-on exercises hosted on dedicated workstations of the BioHPC Lab. No programming skills are required, however, all participants should have basic knowledge of Linux command-line environment, for example, as introduced in our two previous workshops: "Introduction to BioHPC Lab" and "Linux for Biologists" (lecture slides are available on workshop web pages).
The BioHPC Lab workstations used for the workshop will be accessed remotely using the Secure SHell (ssh) protocol. To participate in the exercises, please bring your own laptop with an ssh client installed. MACs and Linux laptops come with native ssh clients and no extra installation is needed. For windows, the recommended ssh client is PuTTy - please install it prior to the workshop. To be able to run Linux programs with graphical inerface displaying on your laptop, you should also install RealVNC viewer. To transfer files between your laptop and a Linux machine, you will need an sftp clinet, such as FileZilla on Windows (MAC and Linux laptops come with native sfpt clients and no extra installation is needed). However, neither RealVNC viewer nor FileZilla are essential for the workshop. For links to client software mentioned above, instructions, and more information on access to BioHPC machines, please refer to the following document: http://cbsu.tc.cornell.edu/lab/doc/Remote_access.pdf, especially points 1 and 2.2-2.4.
Access to BioHPC Lab workstations requires a Lab account. If you do not yet have an account on BioHPC Lab system, we will create one for you after you register for the workshop. Also, we will assign a machine for you to work on during and after the workshop. Please do not make any machine reservations for the workshop!
The machine allocations will be posted here.