institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

BioHPC Cloud Downtime Thu April 25 8am-7pm
1 active announcement posted - click here to read full text

 


BioHPC Cloud Software

There is 624 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, analysis, ANGSD, Annovar, antiSMASH, apollo, Arlequin, aspera, assembly-stats, atac-seq-pipeline, athena_meta, Atlas-Link, ATLAS_GapFill, ATSAS, Augustus, AWS command line interface, axe, bamtools, bamUtil, Basset, BayeScan, Bayescenv, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, blast2go, BLAT, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, canu, CAP3, cBar, CBSU RNAseq, CCTpack, cd-hit, CEGMA, CellRanger, cellranger-atac, centrifuge, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CLUMPP, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, dDocent, DeconSeq, deepTools, defusion, delly, destruct, DETONATE, diamond, diploSHIC, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, Drop-seq, dropEst, dropSeqPipe, dsk, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EIGENSOFT, EMBOSS, entropy, ephem, epic2, ermineJ, ete3, exabayes, exonerate, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, fastq_pair, fastq_species_detector, FastQC, fastsimcoal26, fastStructure, FastTree, FASTX, feh, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FunGene Pipeline, GAEMR, Galaxy, GATK, gatk4, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomicConsensus, gensim, GEOS, germline, gffread, giggle, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, gradle-4.4, graftM, graphviz, Grinder, GROMACS, GSEA, GTDB-Tk, GTFtools, Gubbins, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IDP-denovo, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, InStruct, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, java, jbrowse, jellyfish, JoinMap, julia, jupyter, kallisto, Kent Utilities, keras, khmer, KmerFinder, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, Lep-MAP3, Lighter, LINKS, LocusZoom, longranger, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, MAQ, MASH, mashtree, Mashtree, MaSuRCA, Mauve, MaxBin, mccortex, mcl, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MixMapper, MKTest, mlst, MMAP, MMSEQ, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMmer, muscle, MUSIC, muTect, Nanopolish, ncftp, Nemo, Netbeans, NEURON, new_fugue, NextGenMap, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsTools, NGSUtils, NLR-Parser, Novoalign, NovoalignCS, nvidia-docker, Oases, OBITools, OMA, OrthoFinder, Orthomcl, PacBioTestData, PAGIT, paleomix, PAML, pandas, pandaseq, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pbalign, pbh5tools, PBJelly, PBSuite, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, PfamScan, PGDSpider, ph5tools, Phage_Finder, PHAST, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, Platypus, plink, plink2, Plotly, popbam, Porechop, portcullis, pplacer, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, psutil, pyani, PyCogent, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2 q2cli, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, racon, RADIS, RadSex, RAPTR-SV, RAxML, Ray, Rcorrector, RDP Classifier, REAPR, Relate, RelocaTE2, RepeatMasker, RepeatModeler, RFMix, rgdal, RGI, Rgtsvm, ripgrep, RNAMMER, rnaQUAST, Roary, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, sabre, SaguaroGW, salmon, Sambamba, samblaster, SampleTracker, Samtools, Satsuma, Satsuma2, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, simuPOP, singularity, sistr_cmd, SKESA, skewer, SLiM, smcpp, SMRT Analysis, SMRT LINK, snakemake, snap, SNAPP, snATAC, SNeP, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, SPAdes, SparCC, SPARTA, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SVDetect, svtools, SweepFinder, sweepsims, tabix, Tandem Repeats Finder (TRF), TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tcoffee, TensorFlow, TEToolkit, tfTarget, TMHMM, tmux, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, transrate, TRAP, treeCl, treemix, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for Orthomcl (hide)

Name:Orthomcl
Version:V1.4
OS:Linux
About:OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences.
Added:11/5/2013 11:27:18 AM
Updated:10/30/2014 8:16:53 PM
Link:http://orthomcl.org/common/downloads/software/unsupported/v1.4/
Notes:

Note: OrthoMCl is an old software, there are other Ortholog identification software available now, e.g. OrthoFinder (use diamond for alignment), which is much faster, much easier to run. OrthoFinder is available on BioHPC (https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=629#c). 

To run OrthoMCl 2.0.9, use docker. A docker image file is located in /programs/orthomcl-2.0, with a brief readme file in the same directory. If you need help, schedule an office hour.  

The following instruction is for running v1.4, and old version for small scale work.

Command line: For help: use the command orthomcl.pl Run orthomcl on prerun BLAST results (-m 8 output): orthomcl.pl --mode 3 --blast_file AtCeHs_blast.out --gg_file AtCeHs.gg --inflation 5

 

The documentation for the v1.4  is at:
/programs/ORTHOMCLV1.4/README

briefly:
step 1. merge fasta file for genes in each species (preferrablly protein sequence file) into one single file and run blast

#command for merging

cat sp1.fasta sp2.fasta sp3.fasta > merged.faa

#for protein sequences, run these 2 commands
makeblastdb -in merged.faa-dbtype prot 

## change number of threads based on number of cores on your computer

blastp -num_threads 24  -db merged.faa -query merged.faa -outfmt 6 -evalue 1e-5 -max_hsps 1 -out myblastresults -max_target_seqs 1000

 

step 2. prepare an all.gg file

example of all.gg
Ath: At1g01190 At1g01280 At1g04160 ...
Hsa: Hs10834998 Hs10835119 Hs10835271 ...
Sce: YAL029c YAR009c YAR010c YHR023w ...
Each line stands for each genome. Each line starts with genome name, followed by a colon ":", and then followed by all the gene id's separated by space key " ".

step 3. 
orthomcl.pl --mode 3 --blast_file myblastresults --gg_file all.gg --inflation 5 --pv_cutoff 1e-5 --pi_cutoff  80
pv_cutoff: evalue cutoff
pi_cutoff: percent identity cutoff
inflation: 2 to 5, with 5 for tighter cluster


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help