institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc lab: user guide

BioHPC Lab:
User Guide


BioHPC Lab Software

There is 391 software titles installed in BioHPC Lab. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Lab.

454 gsAssembler or gsMapper, a5, ABruijn, ABySS, AdapterRemoval, Admixtools, Admixture, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, analysis, ANGSD, Annovar, apollo, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, bamtools, Basset, BayeScan, BBmap, BCFtools, bcl2fastq, Beagle, Beagle4, Beast2, bedops, BEDtools, bfc, bgc, biobambam, Bioconductor, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, bmtagger, Boost, Bowtie, Bowtie2, breseq, BSseeker2, BUSCO, BWA, canu, CAP3, CBSU RNAseq, cd-hit, CEGMA, CellRanger, CheckM, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, cortex_var, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, dDocent, DeconSeq, deepTools, delly, destruct, DETONATE, diamond, Discovar, Discovar de novo, distruct, Docker, dREG, Drop-seq, dropSeqPipe, dsk, ea-utils, ecopcr, EDGE, EIGENSOFT, EMBOSS, entropy, ermineJ, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastML, fastq_species_detector, FastQC, fastStructure, FastTree, FASTX, fineSTRUCTURE, flash, Flexible Adapter Remover, FMAP, freebayes, FunGene Pipeline, GATK, GBRS, GCTA, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomicConsensus, gensim, germline, GMAP/GSNAP, GNU Compilers, GNU parallel, Grinder, GROMACS, Gubbins, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HiC-Pro, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, HyPhy, iAssembler, IBDLD, IDBA-UD, IGV, IMa2, IMa2p, IMAGE, impute2, infernal, InStruct, InteMAP, InterProScan, iRep, java, jbrowse, jellyfish, JoinMap, julia, jupyter, kallisto, Kent source utilities, khmer, LACHESIS, lcMLkin, LDAK, leeHom, LINKS, LocusZoom, longranger, LUCY, LUCY2, LUMPY, MACS, MaCS simulator, MACS2, MAFFT, Magic-BLAST, MAKER, MAQ, MASH, MaSuRCA, Mauve, mccortex, megahit, MEGAN, MEME Suite, MERLIN, MetaBAT, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, Migrate-n, mira, miRDeep2, MISO (misopy), MixMapper, MKTest, MMSEQ, mothur, MrBayes, mrsFAST, msld, MSMC, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, MUMmer, muscle, muTect, ncftp, Nemo, Netbeans, NEURON, new_fugue, NextGenMap, NGSadmix, ngsDist, ngsF, ngsTools, NGSUtils, Novoalign, NovoalignCS, Oases, OBITools, Orthomcl, PAGIT, PAML, pandas, pandaseq, Panseq, PASA, PASTEC, pbalign, pbh5tools, PBJelly, PBSuite, PeakSplitter, PEAR, PennCNV, ph5tools, Phage_Finder, PHAST, PHYLIP, PhyloCSF, phylophlan, PhyML, Picard, Pindel, piPipes, PIQ, Platypus, plink, Plotly, popbam, prinseq, prodigal, progressiveCactus, prokka, pyRAD, PySnpTools, PyVCF, QIIME, QIIME2 q2cli, Quake, QuantiSNP2, QUAST, QUMA, R, RACA, RADIS, RAPTR-SV, RAxML, Ray, Rcorrector, REAPR, RepeatMasker, RepeatModeler, RFMix, RNAMMER, rnaQUAST, Roary, RSEM, RSeQC, RStudio, sabre, SaguaroGW, samblaster, Samtools, Satsuma, scikit-learn, scythe, Sentieon, SeqPrep, sgrep, SHAPEIT, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, simuPOP, skewer, smcpp, SMRT Analysis, snakemake, snap, SNAPP, SNPhylo, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, SPAdes, SRA Toolkit, srst2, stacks, stampy, STAR, statmodels, Strelka, StringTie, STRUCTURE, supernova, SURPI, sutta, SVDetect, svtools, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, TMHMM, TopHat, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, transrate, TRAP, treeCl, treemix, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMI-tools, usearch, Variant Effect Predictor, VarScan, vcf2diploid, vcfCooker, vcflib, vcftools, Velvet, VESPA, ViennaRNA, VIP, VirusFinder 2, VizBin, vsearch, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for PASA (hide)

About: a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data
Added:4/17/2017 3:59:00 PM

The PASA pipeline is implemented in Docker.

Follow these steps to run PASA.

1. Prepare workdir and sample data

mkdir /workdir/xxxxx         # replace xxxxx with your user id

cp -r /programs/pasa/sample_data ./

2. start PASA docker instance:

docker1 import /programs/pasa/pasa2.1.0.tar

docker1 images

docker1 run -d -t -e PASAHOME=/usr/local/PASApipeline-2.1.0 biohpc_xxxxx/pasa2.1.0 /bin/bash    #replace xxxxx with your user id.

docker1 ps  # this command will show the container ID that is needed for running the docker instance. A container ID look like: "27fc60656acd"

2. start mysql in the docker instance

docker1 exec xxxxxxxxxxx service mysql start  # replace xxxxxxxxx with the container ID

3. The following steps are how to run PASA on sample data (located in /workdir/xxxxx/sample_data). To work on real data, you can create a new directory /workdir/xxxxx/mydata, and use the same procedure. You need to copy  the *.config files into your data directory and modify it.

The directory /workdir is shared between the Docker instance (/workdir )and the host machine (/workdir/xxxxx), so that /workdir/xxxxx/sample_data directory can be seen both through host and Docker instance.(xxxxx is your user ID).

## run seqclean to remove adapter (Optional)

cd /workdir/xxxxx/sample_data

/programs/seqclean-x86_64/seqclean all_transcripts.fasta -v /workdir/xxxxx/sample_data/UniVec

## pasa

## to get pasa help menu

docker1 exec 601a0a784b52 /bin/bash -c "\$PASAHOME/scripts/"

## to run pasa on sample data

docker1 exec 601a0a784b52 /bin/bash -c "cd /workdir/sample_data; \$PASAHOME/scripts/ -c alignAssembly.config -C -R -g genome_sample.fasta  -t all_transcripts.fasta.clean -T -u all_transcripts.fasta -f FL_accs.txt --ALIGNERS blat,gmap --CPU 8 >& log  "  &

## to run pasa on real data. you will need to put alignAssembly.config, transcript and genome file in your real data director.

You can monitor the work progress by  docker1 exec 601a0a784b52 ps -u root, or monitor /workdir/xxxxx/sample_data/log file.


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]


Website credentials: login  Web Accessibility Help