institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 853 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

3d-dna, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, Alphafold, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, anvio, apollo, arcs, ARGweaver, Arlequin, ART, aspera, assembly-stats, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, AWS v2 Command Line Interface, axe, axel, BactSNP, bakta, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, baypass, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, caper, CarveMe, catch, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, Cooler, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, cuteSV, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, drive, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, EDTA, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FSA, FunGene Pipeline, G-PhoCS, GADMA, GAEMR, Galaxy in Docker, Galaxy Server, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GENECONV, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, hail, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, haslr, hdf5, hget, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, humann, HUMAnN2, hyperopt, HyPhy, hyphy-analyses, iAssembler, IBDLD, idba, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, king, KmerFinder, KmerGenie, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, lep-anchor, Lep-MAP3, lftp, Liftoff, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, mapDamage, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Mauve, MaxBin, McClintock, mccortex, mcl, MCscan, MCScanX, medusa, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, metaron, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlift, mlst, MMAP, MMSEQ, MMseqs2, MMTK, modeltest, MODIStsp-2.0.5, module, moments, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, muscle, MUSIC, Mutation-Simulator, muTect, MZmine, nag-compiler, nanofilt, Nanopolish, ncftp, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, nQuire, NRSA, NuDup, nvidia-docker, nvtop, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, panaroo, pandas, pandaseq, pandoc, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pblat, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, phylophlan3, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, platanus, Platypus, plink, plink2, Plotly, Point Cloud Library, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, pplacer, PRANK, preseq, primalscheme, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, ProtExcluder, protolite, PSASS, psmc, psutil, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, R, RACA, racon, RADIS, RadSex, RagTag, rapt, RAPTR-SV, RATT, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, Red, ReferenceSeeker, regenie, Relate, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, RFMix, RGAAT, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rockhopper, rphast, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, seqkit, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, Shiny, shore, SHOREmap, shortBRED, SHRiMP, sickle, sift4g, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, Sniffles, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, subread, supernova, SURPI, sutta, SV-plaudit, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, SweepFinder2, sweepsims, tabix, Taiji, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, TEtranscripts, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for Repbase (hide)

Name:Repbase
Version:26.04
OS:Linux
About:a database of representative repetitive sequences from eukaryotic species
Added:5/17/2021 12:55:25 PM
Updated:
Link:https://www.girinst.org/
Notes:

Cornell Library has the license to Repbase. Please DO NOT share the database files with groups outside Cornell.

Download the database files using a computer on campus. From outside Cornell campus (including with VPN), you can access the website through the library proxy server (a valid Cornell NetID is required) .

Repbase provides two sets of data files, both can be used for RepeatMasker.

1. Repbase RepeatMasker Edition (latest version 10/26/2018)

  • Includes modification by the RepeatMasker team, with TE classifications formatted for RepeatMasker/RepeatClassifier;
  • No update since 2018;

2. New Repbase release in embl or fasta format (latest version as checked on 5/18/2021: v26.04 04/22/2021)

  • More species included;
  • Up to date;

How to use the Repbase RepeatMasker Edition with RepeatMasker. 

Scenario 1. Your species is covered in the Repbase. No need to create a de novo custom database. 

a) Setup the RepeatMasker library from Repbase (You only need to do it once)

#Download the file RepBaseRepeatMaskerEdition-20181026.tar.gz from Repbase

#download the tetools docker image and convert it to a singularity image file tetools.sif  
cd /workdir/$USER
singularity pull tetools.sif docker://dfam/tetools:latest

# Make a copy of RepeatMasker's Libraries directory here
singularity exec tetools.sif cp -r /opt/RepeatMasker/Libraries ./

# Extract RepBase (replace "/work/path/to" with path to the file)
tar -x -f /work/path/to/RepBaseRepeatMaskerEdition-20181026.tar.gz 

# Run the 'addRepBase.pl' script (part of the RepeatMasker package) to merge the databases,
# After this step, you should see a new RepeatMaskerLib.h5 which combines the Dfam and Repbase
singularity exec tetools.sif addRepBase.pl -libdir Libraries/

b) Next time you run RepeatMasker, you just need to set environment variable LIBDIR to the Libraries directory. RepeatMasker will use the Repbase.

# Run RepeatMasker with the LIBDIR environment variable set, using Insecta as an example
export LIBDIR=/workdir/$USER/Libraries

singularity exec /work/path/to/tetools.sif RepeatMasker -species Insecta [add other RepeatMasker options] genome.fa

Scenario 2. Your species is not well covered in the Repbase. You need to create a de novo custom repeat library from the genome assembly.

It is recommended that you create the de novo library using RepeatModeler, you may want to combine the library from RepeatModeler with the Repbase library. To merge the two libraries, people have used two different approaches,

  • Concatenate the two repeat libraries into one file, optionally remove the redudancy (probably not needed). Then run RepeatMasker with the "-lib myCombined.lib" option.
  • Alternatively you can run RepeatMasker sequencially, first use "-species mySpecies" option against Repbase and create a masked genome, then run  RepeatMasker on the masked genome with "-lib myCustom.lib" option.

Here are the instructions how to combine the two libraries.

#create a Repbase fasta file from "RepeatMaskerLib.h5" created in the previous step, using insect as an example 
singularity exec /work/path/to/tetools.sif famdb.py -i RepeatMaskerLib.h5 families --format fasta_name --include-class-in-name --ancestors --descendants 'Insecta' > insect.lib

#then combine with custom library
cat insect.lib myCustom.lib > myCombined.lib

 

How to use the Repbase latest release

  • The full release file is RepBaseXX.XX.embl.tar.gz, which includes separate files for each taxonomy clade (here is the documentation for the file names);
  • You can also download data file from one clade only (From the Repbase web site, first click the directory link RepBaseXX.XX.embl to access these files .); 

Convert the Repbase embl file to RepeatMasker library file (a fasta file, the sequence title include class/subclass of the TE elements). embl2rm.pl in PATH of BioHPC computers. It can also be downloaded from https://bitbucket.org/cornell_bioinformatics/embl2rm . Do not include the simple.ref file from Repbase for RepeatMasker. 

embl2rm.pl invrep.ref invrep.fasta

If you have a custom library built with RepeatModeler, you can optionally combine the custom library with the Repbase library file. When running RepeatMasker, use the parameter "-lib" to point to this library file.


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help