institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 700 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, analysis, ANGSD, Annovar, antiSMASH, apollo, Arlequin, aspera, assembly-stats, atac-seq-pipeline, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, AWS command line interface, axe, BactSNP, bam2fastx, bamtools, bamUtil, Basset, BayeScan, Bayescenv, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, canu, CAP3, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-atac, centrifuge, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, Dsuite, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EGAD, EIGENSOFT, EMBOSS, entropy, ephem, epic2, ermineJ, ete3, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastsimcoal26, fastStructure, FastTree, FASTX, feh, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FunGene Pipeline, G-PhoCS, GAEMR, Galaxy, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GoShifter, gradle-4.4, graftM, graphviz, GRiD, Grinder, GROMACS, GSEA, GTDB-Tk, GTFtools, Gubbins, GUPPY, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, KmerFinder, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, Lep-MAP3, lftp, Lighter, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, MAQ, MASH, mashtree, Mashtree, MaSuRCA, Mauve, MaxBin, mccortex, mcl, MCscan, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMmer, muscle, MUSIC, muTect, Nanopolish, ncftp, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, NRSA, nvidia-docker, Oases, OBITools, OMA, OrthoFinder, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, pandas, pandaseq, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pbmm2, PBSuite, PCAngsd, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, Platypus, plink, plink2, Plotly, popbam, PopCOGenT, Porechop, portcullis, pplacer, PRANK, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, PSASS, psutil, pyani, PyCogent, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2 q2cli, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, racon, RADIS, RadSex, RAPTR-SV, RAxML, Ray, rclone, Rcorrector, RDP Classifier, REAPR, Relate, RelocaTE2, RepeatMasker, RepeatModeler, RFMix, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, Samtools, Satsuma, Satsuma2, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, shasta, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, SimPhy, simuPOP, singularity, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, SPAdes, SPALN, SparCC, SPARTA, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SVDetect, svtools, SWAMP, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for pgap (hide)

Name:pgap
Version:2019-08-22.build3958
OS:Linux
About:The NCBI Prokaryotic Genome Annotation Pipeline.
Added:10/9/2019 9:31:35 PM
Updated:
Link:https://github.com/ncbi/pgap
Manual:https://github.com/ncbi/pgap/wiki
Notes:

# Prepare input files, following instructions on this site: https://github.com/ncbi/pgap/wiki/Input-Files

There are several examples (version 2019-08-22.build3958) under /programs/pgap/test_genomes. (I have used MG37 genome as an example)

#Reserve a computer. You need medium memory gen1 or  higher level BioHPC computer to do annotation. The pipeline can use all available cpu cores on the computer.

#Create a work directory, and copy over the script /programs/pgap/pgap.py (This script has been modified from the original file to work on BioHPC. For details, read the paragraph at the end of this note)

mkdir -p /workdir/$USER/pgap
cp pgap.py /programs/pgap/pgap.py  /workdir/$USER/pgap/

#Download pgap docker image and supplemental data. This step might take up to one hour, do it in "screen"

cd /workdir/$USER/pgap

./pgap.py -D docker1 --update

# Here is the command to  annotate the test genome provide by the developer. By default, pgap will use all cpu cores available. The Mg37 test genome takes 37minutes on the BioHPC mm gen-1 machine. Run it in "screen".  To annotate your genome, replace output directory "mg37_results" and input yaml file "test_genomes/MG37/input.yaml" in the command. Your genome fasta and yaml files must be in a directory under /workdir/$USER/pgap. By default, it will use all CPU cores on your computer. In the FAQ page of the software https://github.com/ncbi/pgap/wiki/FAQ , it recommends to allow 8GB or RAM per CPU core. On a BioHPC medium memory gen1  computer (24 core, 128 GB RAM), you do not want to exceed 16 cores. 

cd /workdir/$USER/pgap

./pgap.py --cpus 16 -D docker1 -r -o mg37_results test_genomes/MG37/input.yaml

*** How to modify the pgap.py file:  

Download the script with this command: wget https://github.com/ncbi/pgap/raw/prod/scripts/pgap.py

It need to be modified to work on BioHPC . Add this line right before "if (self.params.args.cpus):"

self.cmd.extend (["-w",  "/pgap"])
This step is necessary, and default path for docker1 container on BioHPC is /workdir. but pgap reqires default to /pgap


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help