institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc lab: user guide

BioHPC Lab:
User Guide


BioHPC Lab Software

There is 449 software titles installed in BioHPC Lab. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Lab.

, 454 gsAssembler or gsMapper, a5, ABruijn, ABySS, AdapterRemoval, Admixtools, Admixture, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, analysis, ANGSD, Annovar, antiSMASH, apollo, Arlequin, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, AWS command line interface, bamtools, Basset, BayeScan, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beagle4, Beast2, bedops, BEDtools, bfc, bgc, bigWig, biobambam, Bioconductor, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, bmtagger, Boost, Bowtie, Bowtie2, BPGA, breseq, BSseeker2, BUSCO, BWA, bwa-meth, canu, CAP3, cBar, CBSU RNAseq, cd-hit, CEGMA, CellRanger, centrifuge, CFSAN SNP pipeline, CheckM, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, cortex_var, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, dDocent, DeconSeq, deepTools, delly, destruct, DETONATE, diamond, Discovar, Discovar de novo, distruct, Docker, dREG, dREG.HD, Drop-seq, dropSeqPipe, dsk, ea-utils, ecopcr, EDGE, EIGENSOFT, EMBOSS, entropy, ephem, ermineJ, ete3, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, fastcluster, FastML, fastp, fastq_species_detector, FastQC, fastStructure, FastTree, FASTX, fineSTRUCTURE, flash, Flexible Adapter Remover, FMAP, FragGeneScan, freebayes, FunGene Pipeline, GAEMR, GATK, gatk4, GBRS, GCTA, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomicConsensus, gensim, germline, GMAP/GSNAP, GNU Compilers, GNU parallel, gradle-4.4, Grinder, GROMACS, GSEA, Gubbins, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HiC-Pro, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IgBLAST, IGV, IMa2, IMa2p, IMAGE, impute2, INDELseek, infernal, InStruct, InteMAP, InterProScan, iRep, java, jbrowse, jellyfish, JoinMap, julia, jupyter, kallisto, Kent Utilities, khmer, kSNP, LACHESIS, lcMLkin, LDAK, leeHom, Lep-MAP3, LINKS, LocusZoom, longranger, LUCY, LUCY2, LUMPY, lyve-SET, MACS, MaCS simulator, MACS2, MAFFT, Magic-BLAST, MAKER, MAQ, MASH, MaSuRCA, Mauve, MaxBin, mccortex, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, Migrate-n, mira, miRDeep2, MISO (misopy), MixMapper, MKTest, MMAP, MMSEQ, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, MUMmer, muscle, MUSIC, muTect, ncftp, Nemo, Netbeans, NEURON, new_fugue, NextGenMap, NGSadmix, ngsDist, ngsF, ngsTools, NGSUtils, Novoalign, NovoalignCS, Oases, OBITools, Orthomcl, PAGIT, PAML, pandas, pandaseq, Panseq, PASA, PASTEC, pbalign, pbh5tools, PBJelly, PBSuite, PeakRanger, PeakSplitter, PEAR, PennCNV, PfamScan, PGDSpider, ph5tools, Phage_Finder, PHAST, PHYLIP, PhyloCSF, phylophlan, PhyML, Picard, Pindel, piPipes, PIQ, Platypus, plink, Plotly, popbam, prinseq, prodigal, progressiveCactus, prokka, pyani, PyMC, pyRAD, Pyro4, PySnpTools, PyTorch, PyVCF, QIIME, QIIME2 q2cli, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, RADIS, RAPTR-SV, RAxML, Ray, Rcorrector, RDP Classifier, REAPR, RepeatMasker, RepeatModeler, RFMix, RNAMMER, rnaQUAST, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, sabre, SaguaroGW, samblaster, Samtools, Satsuma, Satsuma2, scikit-learn, scythe, selscan, Sentieon, SeqPrep, sgrep, sgrep sorted_grep, SHAPEIT, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, simuPOP, skewer, SLiM, smcpp, SMRT Analysis, SMRT LINK, snakemake, snap, SNAPP, SNeP, SNPhylo, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, SPAdes, SparCC, SRA Toolkit, srst2, stacks, Stacks 2, stampy, STAR, statmodels, STITCH, Strelka, StringTie, STRUCTURE, supernova, SURPI, sutta, SVDetect, svtools, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, TMHMM, TopHat, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, transrate, TRAP, treeCl, treemix, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMI-tools, usearch, Variant Effect Predictor, VarScan, vcf2diploid, vcfCooker, vcflib, vcftools, Velvet, VESPA, ViennaRNA, VIP, VirSorter, VirusDetect, VirusFinder 2, VizBin, vsearch, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for Roary (hide)

About:Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome.
Added:7/20/2016 12:57:16 PM

This program is configured within a Virtual Machine (VM) and runs only on workstations marked "VM Supported" (i.e., on most of BioHPC Lab machines) and requires graphical connection through VNC. To log in through VNC, follow these steps: 

1. Reserve a computer, turn on VNC. (From, login . At "my reservations" page, click "Connect VNC" next to your reserved computer")

2. Connect to your reserved computer by VNC. (, click "Access", read under "Access with VNC"). You might see a window "Authentication is required ...", ignore it. You can drag this warning window to the edge.

3. Open terminal window in VNC (right click within the desktop and select "open terminal"). Make sure that the directory /workdir/myid (where myid is replaced by your actual BioHPC Lab user ID) exists (create it, if needed: mkdir /workdir/myid).

Once your VNC session is established, you are ready to start working with Roary. This involves starting the VirtualBox Manager, configuring the Roary VM in it, starting the Roary VM, transferring input files to the VM, running Roary within the VM, copying the output files out of the VM, shutting down the VM, and (optionally) saving your own copy of the modified VM snapshot for contibued further use. These steps are described in more detail below:

Configuring  Roary VM for the fist time

Start VirtualBox Manager (on command prompt, type VirtualBox). This will open the program's window within your VNC session.

Goto File->Preferences->Default Machine Folder, select Other and specify /workdir/myid/RoaryVM. Click Open, then OK to confirm the selection. 

Go to File->Import Appliance, select directory /programs/Roary, and then select the file pathogen-vm.latest.ova, click Open, then Next.

In the VM configuration window that shows up within the VirtualBox Manager, you can increase RAM (memeory) and CPU (number of processors) values, although defaults should be sufficient in most cases (these parameters may also be adjusted later, after the VM is configured, before each VM start). When ready, click Import. This will start the import process, which may take a few minutes. As a result, a directory /workdir/myid/RoaryVM will be created, containing relevant VM-related files, and the VM machine called Bio-Linux-8.0.7 will show up in the VirtualBox Manager's left panel.

After you have finished working with Roary and shut down the VM, you may want to save the VM in its current state (i.e., with all files and changes you made within the VM while working in it) to your home directory, so that it can be re-started later, possible on a different BioHPC Lab machine. Saving the VM may be accomplished with the following commands:

cd /workdir/myid; rsync -av --relative RoaryVM /home/myid

This way, a directory /home/myid/RoaryVM will be created, containing the current snapshot of Roary VM.


Configuring saved Roary VM (e.g., on another machine)

Assuming that you have saved your previous Roary VM session in your home directory, /home/myid/RoaryVM, copy it to /workdir/myid:

cd; rsync -av --relative  RoaryVM  /workdir/myid

When you start VirtualBox Manager (by typing VirtualBox at the VNC terminal command prompt), you should see Bio-Linux-8.0.7 virtual machine listed in the left panel.

Starting configured Roary VM (already in /workdir/myrid/RoaryVM)

If not yet done, start VirtualBox Manager. 

Optional: To adjust settings (mostly RAM and number of CPUs) of a configured VM, select this VM (with single click), then click Setting->System. On Motherboard tab, adjust desired RAM amount using a slider on top. Under Processor tab, you can adjust the number of CPUs using the Processor(s) slider. After making the adjustments, click OK.

Start the virtual machine by double-clicking on Bio-Linux-8.0.7 icon (in the left panel of VirtualBox manager). A window will open showing boot screen of the VM. You can maximize this window. After a few seconds the boot-up process will start (you can prompt it by placing the mouse pointer in the window and hitting Enter). When the VM boots up, you will see a full graphical Ubuntu desktop with an image of a building (Sanger Institute?) as a background. You can dismiss messages (about keyboard and mouse) from the VirtualBox Manager that appear on top by clicking on the "X" icon to the right of the message.  

Using a running Roary VM

Once your Roary VM boots up and its graphical desktop is up and running within your VNC sessions, click on the Terminal icon (black rectangle with ">_" symbol on it) on the left panel to fire up an Ubuntu Linux terminal window within the Roary VM (you can launch more of these, if you want). You are logged in to the VM as user manager, with home directory /home/manager. You can create subdirectories within this home directory and organize your files there. You have all usual Linux commands and tools at your disposal. For example, you can use text editors, like vim or nano, or ssh, scp, or sftp to other machines. Once your Roary input files are within your home directory on the VM, follow Roary manual to run the program. While a roary calculation is running, you can disconnect from the BioHPC Lab machine by clicking "X" in upper right corner of the VNC window - this will leave the VNC session (and Roary VM in it) running as long as your reservation is active. You can re-connect to the VNC session later to continue your work. Make sure not to hit "X" in the Roary VM window, which is running within the VNC window, as this will kill the VM and interrupt the Roary run. 


Getting files in and out of a running Roary VM

This is best accomplished using scp of sftp commands. For example

scp -r myid@cbsulogin:/home/myid/my_roary_data  /home/manager

will copy the whole directory my_roary_data (with all files and subirectories, if any) from your BioHPC home directory to the virtual machine directory /home/manager

Likewise, assuming that Roary output files are on /home/manager/roary_output in the VM, they can by saved to your BioHPC home directory with a command

scp /home/manager/roary_output myid@cbsulogin:/home/myid

(note that the user IDs on VM and on BioHPC Lab differ, so in the command above you need to explicitly specify your BioHPC ID myid when contacting cbsulogin). 

Shutting down a Roary VM

After your Roary run is finished and results have been saved to your BioHPC Lab home directory, you can shut down the Roary VM. To do this, click on the gearbox icon in upper-right corner of the Roary VM (not of the VNC window!), select Shutdown, and confirm with a big button that shows up. Exit the VirtualBox Manager, if it is running. You can now save the Roary VM image (which has been modified on /workdir/myid while you were working on it) to your home directory, as described earlier.


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]


Website credentials: login  Web Accessibility Help