institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 754 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, apollo, Arlequin, aspera, assembly-stats, atac-seq-pipeline, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, AWS command line interface, axe, BactSNP, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-atac, cellranger-dna, centrifuge, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, ephem, epic2, ermineJ, ete3, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FunGene Pipeline, G-PhoCS, GAEMR, Galaxy, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, hdf5, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, KmerFinder, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, Lep-MAP3, lftp, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, MAQ, MASH, mashtree, Mashtree, MaSuRCA, Mauve, MaxBin, mccortex, mcl, MCscan, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMmer, muscle, MUSIC, muTect, nag-compiler, nanofilt, Nanopolish, ncftp, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, NRSA, nvidia-docker, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, pandas, pandaseq, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, Platypus, plink, plink2, Plotly, popbam, PopCOGenT, Porechop, portcullis, pplacer, PRANK, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, PSASS, psutil, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, racon, RADIS, RadSex, RAPTR-SV, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, ReferenceSeeker, Relate, RelocaTE2, RepeatMasker, RepeatModeler, RERconverge, RFMix, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samtabix, Samtools, Satsuma, Satsuma2, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for blast2go (hide)

Name:blast2go
Version:DB: March.2020: Software: v1.4.4
OS:Linux
About:Gene Ontology annotation and function enrichment analysis.
Added:4/15/2013 5:20:07 PM
Updated:5/4/2020 8:45:47 AM
Link:https://www.blast2go.com/
Manual:http://cli.docs.blast2go.com/attachments/1cf400c152e4794dc9618e878b52becb.pdf
Notes:

The manual for Blast2GO command line can be found at http://cli.docs.blast2go.com/attachments/1cf400c152e4794dc9618e878b52becb.pdf

1. Make a Diamond database using the NCBI protein sequences. Run Diamond on a BioHPC large memory (gen2) or medium memory (gen2) computer. Download a fasta file related to your species from this site: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/ . In this example, I am using the NCBI invertebrate refseq data. 

wget ftp://ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate/invertebrate.*.protein.faa.gz
gunzip *faa.gz
cat *faa > refseq_protein.fa

#make diamond database
/programs/diamond/diamond makedb --in refseq_protein.fa -d refseq_protein -parse_seqids

2. Run Diamond (By default, Diamond will use all available CPU cores on your computer. Use this parameter if you need to restrict CPU: --threads)

/programs/diamond/diamond blastp --db refseq_protein --query test.fa --outfmt 5 --top 10 --evalue 1e-5 --block-size 20 --index-chunks 1 -t /workdir/qisun -o blastresults.xml

After this step, the blast result file blastresults.xml will be created. Copy this file to your home directory, or directly scp this file to the cbsumm10 computer if you have reserved cbsumm10.

#Alternatively, you can run BLAST instead of Diamond. BLAST is much slower, but previously it is more commonly used than Diamond. Here are the commands if you want to run BLAST. Adjust -num_threads based on the machine you are using.

cd /workdir/myUserName
cp /shared_data/genome_db/BLAST_NCBI/swissprot* .

blastp -num_threads 24 -query test.fa -db swissprot -out blastresults.xml -max_target_seqs 20 -evalue 1e-5 -outfmt 5 -culling_limit 10 >& blastlogfile & 

#A few notes about Diamond/BLAST database:  You can use one of the NCBI BLAST databases, e.g. swissprot, refseq or nr. Swissprot is small, nr is big, refseq is in the middle. Big databases might give you better GO annotation, especially you are working on a species not well studied. The refseq or nr databases are big, it is recommended to use Diamond. Otherwise, it could take days or even weeks to run BLAST.

########################################################
##  Optional: Run Interproscan on any BioHPC computer   ##
########################################################
#This step is optional. It is very slow. In general, if this is not needed for most species, unless your species is distantly related to any database in NCBI. 

If you run interproscan, make sure only use protein sequence as input. If you run interproscan on nucleotide sequence, the results will be rejected by blast2go.

Follow the instruction to run interproscan on BioHPC lab computer: https://cbsu.tc.cornell.edu/lab/userguide.aspx?a=software&i=87#c

Output format needs to be xml.

#####################################
##  Do following steps on cbsumm10  ##
#####################################

##step 1: Create a working directory on cbsumm10, and copy over GO database and configuration files needed for blast2go 

mkdir /workdir/$USER
cd /workdir/$USER
cp /shared_data/blast2go/* ./

# Copy over the blast result file created in last step into the working directory:

cp $HOME/blastresults.xml ./

#optional: if you have InterProSan results, copy the result xml file here.
cp $HOME/ipsout.xml ./

##step 2: run annotation 

  • Replace "myresult" in the commands below with name for your output file
  • If necessary, you can adjust BLAST result filtering parameter in annotation.prop file, under ImportBlastResultsAlgoParameters
  • After this step, you will get 1. myresult.annot; 2. myresult.b2g; 3. myresult.pdf. 
  1. myresult.annot: It is a text file with GO annotation;
  2. myresult.annot: It is a text file with GO annotation;
  3. myresult.b2g: It is a project file that you can open in the free version of BLAST2GO GUI software as described in the next step. You will need to use this file to run function enrichment test with blast2go GUI
  4. myresult.report: A good report file with statistics of your data set.

##Use this command if you DO NOT have InterProScan result:

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -mapping -annotation -annex -statistics all -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile 

##Use this command if you have InterProScan result (make sure you edit the annotation.prop file, change the value next to InterProScanImportParameters.inputFormat):

/usr/local/blast2go/blast2go_cli.run -properties annotation.prop -useobo go.obo -loadblast blastresults.xml -loadips50 ipsout.xml -mapping -annotation  -annex -statistics all -saveb2g myresult -saveannot myresult -savereport myresult -tempfolder ./ >& annotatelogfile &

#####################################################
####    Do function enrichment analysis   ######
#####################################################

Follow our workshop instructions to do function enrichment analysis

https://biohpc.cornell.edu/doc/annotation_2019_exercises2.html

 

####################################################
####   About BLAST2GO GUI ######
####################################################

The software we provided here is the command line version, which can only be used for generating the GO annotation file. BioHPC does not have license to the BLAST2GO GUI software.  There is a free version of BLAST2GO Basic (GUI) https://www.blast2go.com/b2g-register-basic, which has limited visualization functions.

 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help