institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc lab: user guide
 

BioHPC Lab:
User Guide

 


BioHPC Lab Software

There is 423 software titles installed in BioHPC Lab. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Lab.

, 454 gsAssembler or gsMapper, a5, ABruijn, ABySS, AdapterRemoval, Admixtools, Admixture, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, analysis, ANGSD, Annovar, apollo, Arlequin, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, bamtools, Basset, BayeScan, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beagle4, Beast2, bedops, BEDtools, bfc, bgc, biobambam, Bioconductor, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, bmtagger, Boost, Bowtie, Bowtie2, BPGA, breseq, BSseeker2, BUSCO, BWA, canu, CAP3, cBar, CBSU RNAseq, cd-hit, CEGMA, CellRanger, CheckM, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, cortex_var, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, dDocent, DeconSeq, deepTools, delly, destruct, DETONATE, diamond, Discovar, Discovar de novo, distruct, Docker, dREG, Drop-seq, dropSeqPipe, dsk, ea-utils, ecopcr, EDGE, EIGENSOFT, EMBOSS, entropy, ermineJ, ete3, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, fastcluster, FastML, fastq_species_detector, FastQC, fastStructure, FastTree, FASTX, fineSTRUCTURE, flash, Flexible Adapter Remover, FMAP, FragGeneScan, freebayes, FunGene Pipeline, GAEMR, GATK, GBRS, GCTA, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomicConsensus, gensim, germline, GMAP/GSNAP, GNU Compilers, GNU parallel, Grinder, GROMACS, Gubbins, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HiC-Pro, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IgBLAST, IGV, IMa2, IMa2p, IMAGE, impute2, INDELseek, infernal, InStruct, InteMAP, InterProScan, iRep, java, jbrowse, jellyfish, JoinMap, julia, jupyter, kallisto, Kent Utilities, khmer, LACHESIS, lcMLkin, LDAK, leeHom, LINKS, LocusZoom, longranger, LUCY, LUCY2, LUMPY, MACS, MaCS simulator, MACS2, MAFFT, Magic-BLAST, MAKER, MAQ, MASH, MaSuRCA, Mauve, MaxBin, mccortex, megahit, MEGAN, MEME Suite, MERLIN, MetaBAT, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, Migrate-n, mira, miRDeep2, MISO (misopy), MixMapper, MKTest, MMSEQ, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, MUMmer, muscle, MUSIC, muTect, ncftp, Nemo, Netbeans, NEURON, new_fugue, NextGenMap, NGSadmix, ngsDist, ngsF, ngsTools, NGSUtils, Novoalign, NovoalignCS, Oases, OBITools, Orthomcl, PAGIT, PAML, pandas, pandaseq, Panseq, PASA, PASTEC, pbalign, pbh5tools, PBJelly, PBSuite, PeakRanger, PeakSplitter, PEAR, PennCNV, PGDSpider, ph5tools, Phage_Finder, PHAST, PHYLIP, PhyloCSF, phylophlan, PhyML, Picard, Pindel, piPipes, PIQ, Platypus, plink, Plotly, popbam, prinseq, prodigal, progressiveCactus, prokka, pyRAD, Pyro4, PySnpTools, PyTorch, PyVCF, QIIME, QIIME2 q2cli, QTCAT, Quake, QuantiSNP2, QUAST, QUMA, R, RACA, RADIS, RAPTR-SV, RAxML, Ray, Rcorrector, RDP Classifier, REAPR, RepeatMasker, RepeatModeler, RFMix, RNAMMER, rnaQUAST, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, sabre, SaguaroGW, samblaster, Samtools, Satsuma, Satsuma2, scikit-learn, scythe, selscan, Sentieon, SeqPrep, sgrep, sgrep sorted_grep, SHAPEIT, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, simuPOP, skewer, SLiM, smcpp, SMRT Analysis, snakemake, snap, SNAPP, SNeP, SNPhylo, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, SPAdes, SRA Toolkit, srst2, stacks, stampy, STAR, statmodels, STITCH, Strelka, StringTie, STRUCTURE, supernova, SURPI, sutta, SVDetect, svtools, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, TMHMM, TopHat, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, transrate, TRAP, treeCl, treemix, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMI-tools, usearch, Variant Effect Predictor, VarScan, vcf2diploid, vcfCooker, vcflib, vcftools, Velvet, VESPA, ViennaRNA, VIP, VirSorter, VirusDetect, VirusFinder 2, VizBin, vsearch, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for blast2go (hide)

Name:blast2go
Version:DB: Mar.2016; Software: v1.2.1
OS:Linux
About:Gene Ontology annotation and function enrichment analysis.
Added:4/15/2013 5:20:07 PM
Updated:4/25/2016 12:13:57 PM
Link:https://www.blast2go.com/
Manual:https://www.blast2go.com/images/b2g_pdfs/blast2go_cli_manual.pdf
Download:https://www.blast2go.com/blast2go-pro/b2g-register-basic
Notes:

################################################

####   Run BLAST on any BioHPC computer   ######
################################################
#you can run blast on any of the biohpc computers, adjust the num_threads based on computer you are using:general machine: 8; medium memory:24; large memory: 64
# you have an option to use swissprot, refseq or nr for blast database. In most cases swissprot is fast and good enough. However, if a large percentage of your genes have no blast hits to swissprot, you can try refseq. The nr database is too big, the blast run would take very long time.
#replace test.fa with your own fasta file. Make sure you are using the right blast software (blastx or blastp). To save time, it is preferrable to use blastp on protein queries. We recommend to to use TransDecoder software to identify protein coding sequences from cDNA sequences.
#replace swissprot with nr if you want to blast against nr database
#adjust the blast parameters in blast command
# BLAST might take hours to finish. With nr, it might take days

#commands (use swissprot as an example. To use refseq, replate swissprot with refseq_protein)

cd /workdir/myUserName
cp /shared_data/genome_db/BLAST_NCBI/swissprot* ./

blastp -num_threads 24 -query test.fa -db swissprot -out blastresults.xml  -max_target_seqs 20 -evalue 1e-5 -outfmt 5  -culling_limit 10 >& blastlogfile &

After this step, the blast result file blastresults.xml will be created. Copy this file to your home directory.

 

################################################
####   Optional: Run Interproscan on any BioHPC computer   ######
################################################
#you can run interproscan on any of the biohpc computers,

Follow the instruction to run interproscan on BioHPC lab computer: https://cbsu.tc.cornell.edu/lab/userguide.aspx?a=software&i=87#c

It is recommended to use protein sequence to run interproscan. But you can use nucleotide sequences if you do not have protein sequences.

Output format needs to be xml.

################################################
####    Do following steps on cbsumm10    ######
################################################

#####step 1: prepare additional files needed for blast2go ###############
cd /workdir/myUserName
cp /shared_data/blast2go/* ./

## after this step, you will see two files in the /workdir/myUserName: annotation.prop  go.obo.

# Copy the blast result file create in last step into the working directory:

cp /home/myUserName/blastresults.xml ./

#optional: if you have InterProSan results, copy the result xml file here.

cp /home/myUserName/ipsout.xml ./


####step 2: run annotation ###############
#replace "myresult" in the commands below with name for your output file
#if necessary, you can adjust BLAST result filtering parameter in annotation.prop file, under ImportBlastResultsAlgoParameters
#after this step, you will get 1. myresult.annot; 2. myresult.b2g; 3. myresult.pdf. 
#myresult.annot: It is a text file with GO annotation;
#myresult.b2g: It is a project file that you can open in the free version of BLAST2GO GUI software as described in the next step. You will need to use this file to run function enrichment test with blast2go GUI
#myresult.report: A good report file with statistics of your data set.

#command if you do not have InterProScan result:

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -mapping -annotation  -annex -statistics all  -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile &

#command if you have InterProScan result (make sure you edit the annotation.prop file, change the value next to InterProScanImportParameters.inputFormat):

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -loadips50 ipsout.xml  -mapping -annotation  -annex -statistics all  -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile &

 

#######################################################################
####    Do following steps on cbsumm10 or your own windows/mac   ######
#######################################################################
#You do not need to use the command line version for further analysis. 

#There is a free version of BLAST2GO with graphic user interface (GUI) that can be used for further analysis. The software can be downloaded at https://www.blast2go.com/blast2go-pro/download-b2g then you will need to register at this web site to get activation code: https://www.blast2go.com/blast2go-pro/b2g-register-basic

#User manual for GUI version: https://www.blast2go.com/images/b2g_pdfs/b2g_user_manual.pdf

#If you want to continue to use the command line, instruction of command line: https://www.blast2go.com/blast2go-command-line

Step 3. Function enrichment analysis

#######################################################################
####    Do following steps on any BioHPC computer   ######
#######################################################################

This r script uses the Bioconductor package TopGO. If the tool crashes or run nothing, you can increase the p-value in the parameter.

Rscript /shared_data/RNAseq/exercise3/topGO.r  go.annot refset testset 0.1 BP myBP

go.annot: the annotation file with two columns: gene ID and GO ID
refset: a text file with list of reference set of genes with one gene per line (normally all genes that have none-zero expression in your experiments)
testset: a text file with the list of genes to test, e.g. differentially expressed genes.
0.1: cutoff p-value for enriched categories.
BP: test for biological processing GO. You can also test for MF (molecular function)  and CC (cellular component).
myBP: output file.

The output is a text file with enriched category and a GO enrichment chart.


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help