institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

Two-step login for Cornell Cisco AnyConnect VPN
1 active announcement posted - click here to read full text

 


BioHPC Cloud Software

There is 843 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

3d-dna, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, anvio, apollo, arcs, ARGweaver, Arlequin, ART, aspera, assembly-stats, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, AWS v2 Command Line Interface, axe, axel, BactSNP, bakta, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, baypass, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, caper, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, Cooler, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, cuteSV, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, drive, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, EDTA, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FSA, FunGene Pipeline, G-PhoCS, GADMA, GAEMR, Galaxy in Docker, Galaxy Server, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GENECONV, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, hail, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, haslr, hdf5, hget, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, idba, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, king, KmerFinder, KmerGenie, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, lep-anchor, Lep-MAP3, lftp, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, mapDamage, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Mauve, MaxBin, McClintock, mccortex, mcl, MCscan, MCScanX, medusa, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, metaron, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, MMTK, modeltest, MODIStsp-2.0.5, module, moments, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, muscle, MUSIC, Mutation-Simulator, muTect, MZmine, nag-compiler, nanofilt, Nanopolish, ncftp, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, nQuire, NRSA, nvidia-docker, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, panaroo, pandas, pandaseq, pandoc, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pblat, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, phylophlan3, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, platanus, Platypus, plink, plink2, Plotly, Point Cloud Library, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, pplacer, PRANK, preseq, primalscheme, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, ProtExcluder, protolite, PSASS, psmc, psutil, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, R, RACA, racon, RADIS, RadSex, rapt, RAPTR-SV, RATT, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, Red, ReferenceSeeker, regenie, Relate, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, RFMix, RGAAT, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rockhopper, rphast, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, seqkit, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, Shiny, shore, SHOREmap, shortBRED, SHRiMP, sickle, sift4g, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, Sniffles, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SV-plaudit, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, SweepFinder2, sweepsims, tabix, Taiji, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, TEtranscripts, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for BRAKER (hide)

Name:BRAKER
Version:2.1.6
OS:Linux
About:Uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genome
Added:9/1/2019 8:49:03 PM
Updated:5/20/2021 11:38:45 AM
Link:https://github.com/Gaius-Augustus/BRAKER
Notes:

To run v2.1.6

#Register and download GeneMark software and license key

  • Go to web site GeneMark 
  • Check "GeneMark-ES/ET/EP ver 4.65_lic" and "LINUX 64" next to it.
  • Fill out the form and click "I agree to the terms"
  • Download two files: "gmes_linux_64.tar.gz" and "gm_key_64.gz" and put them under /workdir/$USER/

#Prepare Braker2 software

cd /workdir/$USER/

zcat gm_key_64.gz > $HOME/.gm_key

tar xvfz gmes_linux_64.tar.gz 

cp -r /programs/braker2-2.1.6/* ./

#this step fix ProtHint in current release of gmes_linux_64/ProtHint
rm -fr gmes_linux_64/ProtHint
mv ProtHint gmes_linux_64/

#run command example

cd /workdir/$USER

./braker2 --help

#run in "screen" session
#the "braker2" command is an executable singularity image file which can be run like "braker.pl"
#make sure gmes_linux_64, config and input files are present in current directory where the command is executed.
#change "--cores NUMBER" in the command based on CPU core availability on your server;

./braker2 --genome=myGenomeAssembly.fa --bam=myRNAseqAlignment.bam --softmasking --workingdir=outPutDirectory --cores 8  &

 

************************************************* END OF INSTRUCTIONS**************************************************

If you want to install braker2 by yourself, following instructions here:

  • The Docker image is provided by Biocontainer;
  • Click here to get latest version tag;

Prepare software

1. Download and build Singularity image (replace "2.1.6--hdfd78af_4" with latest version tag) 

cd /workdir/$USER

singularity pull braker-2.1.6.sif docker://quay.io/biocontainers/braker2:2.1.6--hdfd78af_4

2. Prepare Genemark software (Genemark is free for academic users, but it requires you to register.)

cd /workdir/$USER

# Copy the software file gmes_linux_64.tar.gz here;
# Copy the key file gm_key_64.gz here;

tar xvfz gmes_linux_64.tar.gz

zcat gm_key_64.gz > $HOME/.gm_key

# Fix the shebang line of PERL scripts
cd gmes_linux_64
./change_path_in_perl_scripts.pl "/usr/bin/env perl"

# (Optional) Verify that Genemark is properly installed
./check_install.bash

# ProtHint distributed with Genemark on 3/11/2021 does not work with the container.
# Replace the scripts with latest ProtHint from github  
cd /workdir/$USER
git clone https://github.com/gatech-genemark/ProtHint.git
cp ProtHint/bin/* gmes_linux_64/ProtHint/bin/

3. Copy the Augustus config directory from inside container to outside container, so that it is writable by you. 

./braker-2.1.6.sif cp -r /usr/local/config /workdir/$USER

4. (Optional) Testing your installation

  • 4.1 Download the braker2 testing data set and script
git clone https://github.com/Gaius-Augustus/BRAKER.git

cd BRAKER/example
wget http://topaz.gatech.edu/GeneMark/Braker/RNAseq.bam
  • 4.2 Run testing data
#singularity must be launched from the directory parental to genemark and augustus config directory.
cd /workdir/$USER

#start singularity braker2 shell
./braker-2.1.6.sif

#set environment variables based on PATH on host machine
export PROTHINT_PATH=/workdir/$USER/gmes_linux_64/ProtHint/bin
export GENEMARK_PATH=/workdir/$USER/gmes_linux_64
export AUGUSTUS_CONFIG_PATH=/workdir/$USER/config

#these two environmental variables must be set as /usr/local/bin (path internal to container)
export AUGUSTUS_BIN_PATH=/usr/local/bin
export AUGUSTUS_SCRIPTS_PATH=/usr/local/bin

cd /workdir/$USER/BRAKER/example/tests

#testing RNA-seq input as training data
./test1.sh 

#testing protein sequence input as training data
./test2.sh 

#once done, exit singularity shell
exit

Your test run results are in example/tests/test1 and example/tests/test2. Compare them with the results provided by the developer (example/results).

5. Now you are ready to run Braker to annotate your genome. Run it in "screen" persistent session

cd /workdir/$USER
mkdir /workdir/$USER/run1

export PROTHINT_PATH=/workdir/$USER/gmes_linux_64/ProtHint/bin
export GENEMARK_PATH=/workdir/$USER/gmes_linux_64
export AUGUSTUS_CONFIG_PATH=/workdir/$USER/config
export AUGUSTUS_BIN_PATH=/usr/local/bin
export AUGUSTUS_SCRIPTS_PATH=/usr/local/bin

./braker-2.1.6.sif braker.pl --genome=myGenome.fa --bam=myRNAseq.bam --softmasking --workingdir=/workdir/$USER/myRun1 --cores 48

After the job is done, the result files are in /workdir/$USER/myRun1 .

If you plan to run this version of Braker later again. Here is a list of software files/directories under /workdir/$USER that you need to keep.

  • braker-2.1.6.sif
  • gmes_linux_64
  • config

########################End of instructions######################################################

To run the previous version v2.1.5  (Using docker image built by the BioHPC team)

The braker software is implemented as a docker image. One of the component GeneMark ET/ES/EP requires license. You will need to register and download the software and license file.

Current version:

  • Ubuntu 18.04
  • Braker: 2.1.5
  • Augustus: between 3.3.3 and 3.3.4 (github master branch on 10/16/2020)

#get docker image

docker1 pull biohpc/braker2

#prepare input

Create a data directory under /workdir/$USER/. Put following items in the directory:

  • Genome assembly in fasta;
  • RNA-seq bam;
  • genemark software directory: gmes_linux_64
  • genemark license: .gm_key  (file name starts with ".")

# run command

docker1 run --rm -v /workdir/$USER/mydata:/data  biohpc/braker2 sh -c ". /root/source.sh; braker.pl --species=mysp --genome=mygenome.fa.masked --bam=RNA.sorted.bam --softmasking --cores=24"

 


 

 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help