institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc lab: user guide
 

BioHPC Lab:
User Guide

 


BioHPC Lab Software

There is 391 software titles installed in BioHPC Lab. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Lab.

454 gsAssembler or gsMapper, a5, ABruijn, ABySS, AdapterRemoval, Admixtools, Admixture, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, analysis, ANGSD, Annovar, apollo, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, bamtools, Basset, BayeScan, BBmap, BCFtools, bcl2fastq, Beagle, Beagle4, Beast2, bedops, BEDtools, bfc, bgc, biobambam, Bioconductor, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, bmtagger, Boost, Bowtie, Bowtie2, breseq, BSseeker2, BUSCO, BWA, canu, CAP3, CBSU RNAseq, cd-hit, CEGMA, CellRanger, CheckM, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, cortex_var, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, dDocent, DeconSeq, deepTools, delly, destruct, DETONATE, diamond, Discovar, Discovar de novo, distruct, Docker, dREG, Drop-seq, dropSeqPipe, dsk, ea-utils, ecopcr, EDGE, EIGENSOFT, EMBOSS, entropy, ermineJ, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastML, fastq_species_detector, FastQC, fastStructure, FastTree, FASTX, fineSTRUCTURE, flash, Flexible Adapter Remover, FMAP, freebayes, FunGene Pipeline, GATK, GBRS, GCTA, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomicConsensus, gensim, germline, GMAP/GSNAP, GNU Compilers, GNU parallel, Grinder, GROMACS, Gubbins, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HiC-Pro, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, HyPhy, iAssembler, IBDLD, IDBA-UD, IGV, IMa2, IMa2p, IMAGE, impute2, infernal, InStruct, InteMAP, InterProScan, iRep, java, jbrowse, jellyfish, JoinMap, julia, jupyter, kallisto, Kent source utilities, khmer, LACHESIS, lcMLkin, LDAK, leeHom, LINKS, LocusZoom, longranger, LUCY, LUCY2, LUMPY, MACS, MaCS simulator, MACS2, MAFFT, Magic-BLAST, MAKER, MAQ, MASH, MaSuRCA, Mauve, mccortex, megahit, MEGAN, MEME Suite, MERLIN, MetaBAT, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, Migrate-n, mira, miRDeep2, MISO (misopy), MixMapper, MKTest, MMSEQ, mothur, MrBayes, mrsFAST, msld, MSMC, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, MUMmer, muscle, muTect, ncftp, Nemo, Netbeans, NEURON, new_fugue, NextGenMap, NGSadmix, ngsDist, ngsF, ngsTools, NGSUtils, Novoalign, NovoalignCS, Oases, OBITools, Orthomcl, PAGIT, PAML, pandas, pandaseq, Panseq, PASA, PASTEC, pbalign, pbh5tools, PBJelly, PBSuite, PeakSplitter, PEAR, PennCNV, ph5tools, Phage_Finder, PHAST, PHYLIP, PhyloCSF, phylophlan, PhyML, Picard, Pindel, piPipes, PIQ, Platypus, plink, Plotly, popbam, prinseq, prodigal, progressiveCactus, prokka, pyRAD, PySnpTools, PyVCF, QIIME, QIIME2 q2cli, Quake, QuantiSNP2, QUAST, QUMA, R, RACA, RADIS, RAPTR-SV, RAxML, Ray, Rcorrector, REAPR, RepeatMasker, RepeatModeler, RFMix, RNAMMER, rnaQUAST, Roary, RSEM, RSeQC, RStudio, sabre, SaguaroGW, samblaster, Samtools, Satsuma, scikit-learn, scythe, Sentieon, SeqPrep, sgrep, SHAPEIT, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, simuPOP, skewer, smcpp, SMRT Analysis, snakemake, snap, SNAPP, SNPhylo, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, SPAdes, SRA Toolkit, srst2, stacks, stampy, STAR, statmodels, Strelka, StringTie, STRUCTURE, supernova, SURPI, sutta, SVDetect, svtools, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, TMHMM, TopHat, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, transrate, TRAP, treeCl, treemix, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMI-tools, usearch, Variant Effect Predictor, VarScan, vcf2diploid, vcfCooker, vcflib, vcftools, Velvet, VESPA, ViennaRNA, VIP, VirusFinder 2, VizBin, vsearch, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for Docker (hide)

Name:Docker
Version:1.12.5
OS:Linux
About:Executes applications in containers that are isolated from main operating system (OS-level virtualization)
Added:2/14/2017 3:44:40 PM
Updated:
Link:https://www.docker.com/
Notes:

This link points to our Docker Quick Start Guide - an example based fast introduction to our Docker implementation. For more details read below.

Docker allows users to run applications in a way that is isolated from the host operating system, therefore preventing compatibility issues and allowing to run applications normally impossible to run without custom installations - like native Ubuntu programs on CentOS. It also allows users to install and run applications as administrators ("root"). Unfortunately Docker does not sufficiently isolate users running applications as administrators from the main machine, so we had to deploy a modified version of Docker that is safe and secure in BioHPC Lab environment and still allows users great freedom of running, installing and modifying applications.

IMPORTANT: Original Docker command is "docker". This command has been replaced by "docker1" command in BioHPC Lab. Whenever reading a Docker book or website please replace "docker" with "docker1" when you want to run the command on BioHPC Lab machines. Most options and syntax is the same and differences are discussed below. Syntax of any command can be displayed with "docker1 commandname --help", "docker1 --help" will display all available commands.
If you run "docker" instead of "docker1" you will get the error "Cannot connect to the Docker daemon. Is the docker daemon running on this host?".

The text below is a fast crash-course to help starting with Docker. Please refer to online tutorials or our Docker workshop (to be announced soon) for more in-depth introduction.

Docker images.

Docker image is a template Docker uses to create instances of running programs, which are called containers. Before running any Dockerized application you need to know how to access its Docker image. There are two ways:
 

  • Images are stored in Docker registries (or hubs) and their names and addresses are described in respective software documentation. You can import images from repositories with command "docker1 pull imagename". A number of customized images for BioHPC users are in "biohpc" repository (see below).
     
  • Images can be imported from a file ("docker1 import filename"). We provide a number of custom images in directory /programs/docker/images (they can be also imported from biohpc repository). You can export your own modified container to a file for later use - typically the workflow is to pull a basic image, run container, install software in it and then export for later use.

Any image imported is stored as a local copy. If it is imported from a repository the image name is the same as in the repository ("reponame/imagename"). If it is imported from a file, the image name will be "biohpc_labid/imagename" where labid is your Lab ID. If your image is in a file you have to import it first in order to run. If you use an image from a repository you can run it directly with "run" command - it will be pulled automatically. Example of import from a file:

docker1 import /programs/docker/images/cowsay.tar

This is an example of repository import:

docker1 pull biohpc/cowsay

You can list local images with "docker1 images", here is what I got after the above import from file:

[jarekp@cbsum1c2b011 ~]$ docker1 images
REPOSITORY                            TAG                 IMAGE ID            CREATED             SIZE
biohpc_jarekp/cowsay                  latest              eac8cfea6661        4 seconds ago       319.7 MB

After importing from repository the result is slightly different - same image, different naming:

[jarekp@cbsum1c2b011 ~]$ docker1 pull biohpc/cowsay
Using default tag: latest
Trying to pull repository dtr.cucloud.net/biohpc/cowsay ...
latest: Pulling from dtr.cucloud.net/biohpc/cowsay

08d48e6f1cff: Pull complete
a1aa994f5ff7: Pull complete
Digest: sha256:b4ec86cdbb2d564d7ea94c9b49196f6b82e3c635a6581ee4eae02687e8ba91b8
Status: Downloaded newer image for dtr.cucloud.net/biohpc/cowsay:latest
[jarekp@cbsum1c2b011 ~]$ docker1 images
REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
dtr.cucloud.net/biohpc/cowsay   latest              195f168235c9        2 weeks ago         337.1 MB

Running Docker applications.

A command to run a Docker container is "docker1 run [OPTIONS] IMAGE [COMMAND] [ARG...]". This command has a lot of options, but its basics are very simple. First, a simple test command to run to check if Docker is OK is "docker1 run hello-world". In principle there are 3 main ways to run a Docker container
 

  • Single command. An image is run with "docker1 run image cmd", after the command is completed the container stops. It cannot be rerun, but its output and structure can be still examined. The container can be saved.

    [jarekp@cbsum1c2b011 ~]$ docker1 run biohpc/cowsay cowsay 'Hi there!'
     ___________
    < Hi there! >
     -----------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||
    [jarekp@cbsum1c2b011 ~]$

     
  • Interactive mode. An image is run with "docker1 run -it image cmd", the container input is now linked to the keyboard, output to the screen and it will run interactively as long as the "cmd" is active. Typically "cmd" is a shell like "/bin/bash":

    [jarekp@cbsum1c2b011 ~]$ docker1 run -it biohpc_jarekp/cowsay /bin/bash
    [root@a605b04a7ca5 workdir]#

    The container is now available to run commands until we exit the shell.
     
  • Background mode. The container can be started in the background (with "-d" option), then users can execute commands inside the container using "docker1 exec" command.

    [jarekp@cbsum1c2b011 ~]$ docker1 run -d -t biohpc/cowsay /bin/bash
    5ab4520a337fbb01b2b3f45c14688e095446237930657d7293fa7238c91c8864
    [jarekp@cbsum1c2b011 ~]$ docker1 ps -a
    CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS                      PORTS               NAMES
    5ab4520a337f        biohpc/cowsay       "/bin/bash"            6 seconds ago       Up 4 seconds                                    jarekp__biohpc_5

    [jarekp@cbsum1c2b011 ~]$ docker1 exec 5ab4520a337f /bin/bash -c "fortune | cowsay"
     _________________________________________
    / If a person (a) is poorly, (b) receives \
    | treatment intended to make him better,  |
    | and (c) gets better, then no power of   |
    | reasoning known to medical science can  |
    | convince him that it may not have been  |
    | the treatment that restored his health. |
    |                                         |
    | -- Sir Peter Medawar, "The Art of the   |
    \ Soluble"                                /
     -----------------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||
    [jarekp@cbsum1c2b011 ~]$

All the current containers can be listed with "docker1 ps -a" command - without "-a" only running containers will show.

[jarekp@cbsum1c2b011 ~]$ docker1 ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS                      PORTS               NAMES
5ab4520a337f        biohpc/cowsay       "/bin/bash"            4 minutes ago       Up 4 minutes                                    jarekp__biohpc_5
7e25f8af4981        biohpc/cowsay       "/bin/bash"            7 minutes ago       Exited (0) 7 minutes ago                        jarekp__biohpc_4
b30bf95712c7        biohpc/cowsay       "cowsay 'Hi there!'"   20 minutes ago      Exited (0) 20 minutes ago                       jarekp__biohpc_3
6b073ad025c7        biohpc/cowsay       "echo 'Hi there!'"     21 minutes ago      Exited (0) 21 minutes ago                       jarekp__biohpc_2
6111cccdbf71        biohpc/cowsay       "ls -a -l /"           22 minutes ago      Exited (0) 22 minutes ago                       jarekp__biohpc_1
[jarekp@cbsum1c2b011 ~]$ docker1 ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
5ab4520a337f        biohpc/cowsay       "/bin/bash"         4 minutes ago       Up 4 minutes                            jarekp__biohpc_5
[jarekp@cbsum1c2b011 ~]$

 

Containers have their own root directory and system directories - after all they are isolated. In BioHPC Lab each container has direct access to /workdir/labid directory (where labid is your Lab ID) - which is mounted inside the container as /workdir. If you would like to copy files to or from the container you can use this directory. You can also copy all necessary data there to use inside container. As you run inside the container as "root" new files created in /workdir/labid will be owned by root, and therefore may be difficult to handle outside the container. You can use "docker1 claim" command (custom BioHPC command) to change ownership of all files in /workdir/labid to your labid.

[jarekp@cbsum1c2b011 ~]$ docker1 run biohpc/cowsay df -h
Filesystem                                                                                           Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:2-128408749-19d0dcbcbed14933b4bd8db3d769b34820cb37457317eb3247e69e85cd7c7790   10G  365M  9.7G   4% /
tmpfs                                                                                                7.8G     0  7.8G   0% /dev
tmpfs                                                                                                7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/rhel-local                                                                               813G  121G  692G  15% /workdir
shm                                                                                                   64M     0   64M   0% /dev/shm
[jarekp@cbsum1c2b011 ~]$
[jarekp@cbsum1c2b011 ~]$ docker1 run biohpc/cowsay /bin/bash -c "echo test > /workdir/testfile"
[jarekp@cbsum1c2b011 ~]$ ls -al /workdir/jarekp/
total 4
drwxr-xr-x  2 jarekp root 21 Feb 15 16:02 .
drwxrwxrwx. 4 root   root 30 Feb  8 16:51 ..
-rw-r--r--  1 root   root  5 Feb 15 16:02 testfile
[jarekp@cbsum1c2b011 ~]$ docker1 claim
[jarekp@cbsum1c2b011 ~]$ ls -al /workdir/jarekp/
total 4
drwxr-xr-x  2 jarekp root 21 Feb 15 16:02 .
drwxrwxrwx. 4 root   root 30 Feb  8 16:51 ..
-rw-r--r--  1 jarekp root  5 Feb 15 16:02 testfile
[jarekp@cbsum1c2b011 ~]$

You can pull and run any image from public repositories. We have found, however, that most of them are very "light", i.e. they do not include development tools or libraries. Therefore we provide several development images to use as starting points to install your applications. We use Cornell Docker repository at dtr.cucloud.net, full path is dtr.cucloud.net/biohpc/imagename, but dtr.cucloud.net is added to the Docker repo search path on BioHPC Lab machines so using biohpc/imagename is just fine.

Description Repository image File
CentOS 7 cowsay
Basic image for testing with two extra commands installed: fortune and cowsay.
biohpc/cowsay /programs/docker/images/cowsay.tar
CentOS 7 development
CentOS 7 image with developemnt tools and libraries installed (compilers, Java, Perl, Python etc)
biohpc/centos7dev /programs/docker/images/centos7dev.tar
Ubuntu development
Ubuntu image with developemnt tools and libraries installed (compilers, Java, Perl, Python etc).
biohpc/ubuntudev /programs/docker/images/ubuntudev.tar

CentOS 7 development with GUI and sshd
CentOS 7 image with a standard set of developemnt tools and libraries  built on centos7dev image. Includes X11 and GUI tools and libraries. Automatically starts sshd, it must be run in background and connected to with ssh (see below "Running GUI ...").

biohpc/centos7devgui /programs/docker/images/centos7devgui.tar

Of course you can run any public images, for example "docker1 run -it ubuntu /bin/bash" will start an Ubuntu image for interactive use, the image will be downloaded from the official Ubuntu repository. Also, many programs or pipelines can be installed in basic images, development images are needed when building from source is required.

All containers you create are named "labid_biohpc_##'. Over time, lots of stopped containers will accumulate, they can be deleted using "docker1 rm imagename_or_id" command, but it can only deal with one container at a time. We provide "docker1 clean [options]" command that can help dealing with groups of containers:

docker1 clean --help
Usage:  docker1 clean [OPTIONS]

Remove containers from local machine

        remove all my non-running containers (default)
  all   remove all my containers (running or not)
  nores remove all containers from users not having current reservation

Saving containers for future use.

A container can be exported to a file using the following command "docker1 export -o filename imagename_or_id", e.g.

docker1 export -o /home/jarekp/mycowsay.tar 97651589ec95

The resulting file can be imported into Docker with "docker1 import" command.

Building images with Dockerfile.

Docker images can be buld (docker1 build path) using a file with a set of instructions called Dockerfile, path option in docker1 build command should point to a directory with Dockerfile in it. Please refer to online books or tutorials for more information about building images using a Dockerfile. BioHPC Lab restricts Dockerfile build path to  /workdir/labid, i.e. build path /workdir/labid/myimagedir is fine, but /home/labid/myimagedir will be denied (as also will be any system directories).

Running GUI/graphical/X-Windows/X11 applications in Docker container.

In order to run graphical applications in Docker container, the image must have X11 components installed and it must be able to start sshd program on launch. We provide one such image: centos7devgui. First you need to pull or import the image

docker1 pull biohpc/centos7devgui

Then the container must be started in the background, with ssh port mapped to your local machine's internal network:

docker1 run -d -p 127.0.0.1:5000:22 -P -t biohpc/centos7devgui /usr/sbin/sshd -D

Once the container is running you can connect to it from the machine where you run docker1 command using ssh to tunnel X11 graphics to your normal display:

ssh -X root@localhost -p 5000

It will ask you for a password which is 'biohpc' (without quotation marks). Remember that you need to run a program on your local machine that can accept and render the graphics. Consult our online User Guide or "Linux for Biologists" workshops for more details on using GUI applications.

Limiting CPUs and memory availale to the container

The CPU cycles and memory available to the container can be limited using options to the docker1 run command. For example, 

docker1 run -it --memory="4g" --cpu-period="100000" --cpu-quota="400000" biohpc/centos7dev /bin/bash

will create an interactive CentOS7 container with memory limited to 4GB and able to consume CPU cycles equivalent to 4 CPU cores (cpu-quota/cpu-period=4). The imposed memory and CPU limits are not immediately obvious to a user working in the container. For example, the top or free commands will still show the full memory of the host rather than the limited amount, and cat /proc/cpuinfo will still show all the CPU cores of the host. However, the imposed limits will affect programs running within the container.

Deleting images and containers.

Unused containers and images take space and therefore they need to be periodically pruned. Every night a script is run that removes unused containers and images. The rules are as follows.
 

  • Any non-running container older than 1 week is deleted regardless of reservations.
  • Any container from a user that does not have a reservation is deleted.
  • Unused images, i.e. images that are not linked to containers are deleted. Images imported by users that do have a reservation are not deleted.

If you have custom images please make sure they are named properly (biohpc_labid/name, you can use docker1 tag command to change image names). You only need to provide the name part of the docker1 tag command, biohpc_labid will be pre-pended automatically, i.e. 'docker1 tag f49eec89601e myimage' will name image f49eec89601e with 'biohpc_labid/myimage'. When you import image from a file it will be properly named automatically.

You can manually delete your own containers and images.

Summary of custom BioHPC Docker commands.

docker1 claim

Enables user to take ownership in all files under /workdir/labid on a local machine.

docker1 clean [options]

Removes set of Docker containers. Supports 3 options:

  • docker1 clean   
    remove all my non-running containers (default)
  • docker1 clean all
    remove all my containers (running or not)
  • docker1 clean nores
    remove all containers from users not having current reservation

docker1 run

Various options relating to Docker volumes are disabled.

docker1 build

BioHPC Lab restricts Dockerfile build path to  /workdir/labid, i.e. build path /workdir/labid/myimagedir is fine, but /home/labid/myimagedir will be denied.

  


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help