institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 770 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, anvio, apollo, arcs, Arlequin, aspera, assembly-stats, atac-seq-pipeline, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, AWS command line interface, axe, BactSNP, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, ephem, epic2, ermineJ, ete3, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FunGene Pipeline, G-PhoCS, GAEMR, Galaxy, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, hail, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, hdf5, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, king, KmerFinder, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, Lep-MAP3, lftp, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, MAQ, MASH, mashtree, Mashtree, MaSuRCA, Mauve, MaxBin, McClintock, mccortex, mcl, MCscan, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, moments, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMmer, muscle, MUSIC, muTect, nag-compiler, nanofilt, Nanopolish, ncftp, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, NRSA, nvidia-docker, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, panaroo, pandas, pandaseq, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, phylophlan3, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, Platypus, plink, plink2, Plotly, popbam, PopCOGenT, Porechop, portcullis, pplacer, PRANK, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, PSASS, psutil, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, racon, RADIS, RadSex, RAPTR-SV, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, ReferenceSeeker, Relate, RelocaTE2, RepeatMasker, RepeatModeler, RERconverge, RFMix, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, Shiny, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SV-plaudit, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, sweepsims, tabix, Taiji, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for blast2go (hide)

Name:blast2go
Version:DB: March.2020: Software: v1.4.4
OS:Linux
About:Gene Ontology annotation and function enrichment analysis.
Added:4/15/2013 5:20:07 PM
Updated:5/4/2020 8:45:47 AM
Link:https://www.blast2go.com/
Manual:http://cli.docs.blast2go.com/attachments/1cf400c152e4794dc9618e878b52becb.pdf
Notes:

########################################################
##  Run Diamond on a BioHPC medium memory gen2 or large memory gen2 computer##
########################################################

#Run Diamond on BioHPC gen2 computer (either medium or large memory). Use the Uniref90 database.  By default, Diamond will use all CPUs on the computer. If you want to restrict the CPU usage,  use this option "-p ". It takes less than 30 minutes for 40,000 proteins.

mkdir /workdir/$USER
cd /workdir/$USER
cp /shared_data/genome_db/uniref90.dmnd ./
/programs/diamond/diamond blastp --db uniref90 --query mygenome.protein.fasta --outfmt 5 --max-target-seqs 100 --max-hsps 1 --evalue 1e-10  -t ./ --block-size 10 --index-chunks 1 -o blastresults.xml 

########################################################
##  Optional: Run Interproscan on any BioHPC computer   ##
########################################################
#This step is optional. But if your species is distantly related to well studied species, InterproScan would help. In our tests, interproscan takes about 1 hour per 10,000 proteins on a medium memory gen2 machine. Use protein sequences to run InterproScan. 

Follow the instructions to run interproscan on BioHPC lab computer: https://cbsu.tc.cornell.edu/lab/userguide.aspx?a=software&i=87#c

Output format needs to be xml.

./interproscan.sh -b ipsout -f XML -i mygenome.protein.fasta --goterms --pathways --iprlookup -t p -T ./ 

#####################################
##  Do following steps on cbsumm10  ##
#####################################

After you login to cbsumm10. Before you proceed to next step, please run this command to verify that the blast2go database server is running on this machine. 

echo "show dbs" | mongo

You should see output like this:

MongoDB server version: 3.6.12
admin    0.000GB
config   0.000GB
go_db   69.040GB
local    0.000GB
bye

If you get an errror message that no database is connected, please contact brc_bioinformatics@cornell.edu, and let us know that "mongodb on cbsumm10 is not running", and we will need to start the server for you.

##step 1: Create a working directory on cbsumm10, and copy over GO database and configuration files needed for blast2go 

mkdir /workdir/$USER
cd /workdir/$USER
cp /shared_data/blast2go/annotation.prop ./
cp /shared_data/blast2go/go.obo ./

 

# Copy over the Diamond and InterproScan result files created in previous steps into the /workdir/$USER

cp $HOME/blastresults.xml /workdir/$USER

#optional: if you have InterProSan results, copy the result xml file here.
cp $HOME/ipsout.xml /workdir/$USER

##step 2: run annotation 

  • In the last step, you copied the "annotation.prop" file from the directory "/shared_data/blast2go/". This file constains parameters for BLAST2GO. You might want to review the default settings under "ImportBlastResultsAlgoParameters" and  "AnnotationAlgoParameters", especially the following parameters. For some query sequences,  blast hits might not be present in the BLAST2GO mapping database, these query sequences would fail to get a GO assignment.  In that case, setting ImportBlastResultsAlgoParameters.numberOfHits to a big number might help. But using too many hits would cause the sequences being "over-annotated".
    • ImportBlastResultsAlgoParameters.numberOfHits=100
    • ImportBlastResultsAlgoParameters.blastMinHSPLength=33
    • AnnotationAlgoParameters.eValueHitFilter=1.0E-10
    • AnnotationAlgoParameters.hspHitCoverageCutoff=0
  • After this step, you will get 1. myresult.annot; 2. myresult.b2g; 3. myresult.pdf. 
  1. myresult.annot: It is a text file with GO annotation, this is the file you will need for enrichment analysis;
  2. myresult.b2g: It is a binary file that you can open in the BLAST2GO GUI software (BioHPC does not have license to BLAST2GO GUI. Biobam does offer a free trial version if you want to try BLAST2GO GUI). You do not need this file if you do not plan to use the GUI software. 
  3. myresult.PDF: A report file with statistics of your data set.

##Use this command if you DO NOT have InterProScan result:

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -mapping -annotation -annex -statistics all -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile 

##Use this command if you have InterProScan result (make sure you edit the annotation.prop file, change the value next to InterProScanImportParameters.inputFormat):

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -loadips50 ipsout.xml -mapping -annotation  -annex -statistics all -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile &

#####################################################
####    Do function enrichment analysis   ######
#####################################################

Follow our workshop instructions to do function enrichment analysis

https://biohpc.cornell.edu/doc/annotation_2019_exercises2.html

 

####################################################
####   About BLAST2GO GUI ######
####################################################

The software we provided here is the command line version, which can only be used for generating the GO annotation file. BioHPC does not have license to the BLAST2GO GUI software.  There is a free version of BLAST2GO Basic (GUI) https://www.blast2go.com/b2g-register-basic, which has limited visualization functions.

 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help