institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 792 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, anvio, apollo, arcs, Arlequin, aspera, assembly-stats, atac-seq-pipeline, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, axe, BactSNP, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, baypass, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, drive, Drop-seq, dropEst, dropSeqPipe, dsk, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FSA, FunGene Pipeline, G-PhoCS, GAEMR, Galaxy in Docker, Galaxy Server, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GENECONV, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, hail, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, haslr, hdf5, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, king, KmerFinder, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, lep-anchor, Lep-MAP3, lftp, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Mauve, MaxBin, McClintock, mccortex, mcl, MCscan, MCScanX, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, MMTK, modeltest, moments, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMmer, muscle, MUSIC, muTect, MZmine, nag-compiler, nanofilt, Nanopolish, ncftp, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, NRSA, nvidia-docker, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, panaroo, pandas, pandaseq, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, phylophlan3, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, platanus, Platypus, plink, plink2, Plotly, popbam, PopCOGenT, Porechop, portcullis, pplacer, PRANK, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, protolite, PSASS, psutil, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, QUMA, R, RACA, racon, RADIS, RadSex, RAPTR-SV, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, ReferenceSeeker, Relate, RelocaTE2, RepeatMasker, RepeatModeler, RERconverge, RFMix, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rockhopper, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, Shiny, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, Sniffles, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SV-plaudit, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, sweepsims, tabix, Taiji, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for RStudio (hide)

Name:RStudio
Version:1.2.5042
OS:Linux
About:RStudio is an integrated development environment (IDE) for R
Added:3/1/2016 7:40:28 PM
Updated:5/8/2020 3:31:46 PM
Link:https://www.rstudio.com/
Manual:https://support.rstudio.com/hc/en-us/categories/200035113-Documentation
Download:https://www.rstudio.com/products/rstudio/download/
Notes:

There are multiple ways to use Rstudio at BioHPC. First we describe the pros/cons to help you decide. After that we give specific instructions for getting started with each option.

  1. Rstudio Server is the most common choice. It allows you to have an Rstudio session that you connect to within a web browser. It is persistent, meaning that it keeps working even if you lose connection with the server, you can reconnect to it and your session will still be running. However, it has two important limitations:

    • Only one “Rstudio Server” can run on each BioHPC machine. The server can service many users, so anyone with access to the machine can log in and have their own Rstudio session once the server is started. The main limitation is that this server only runs a single version of R. If multiple users want to use Rstudio Server from the same machine, with different versions of R, this will not work.

    • Each user can only have one active Rstudio session at a time on a particular machine. If you have access to multiple machines at BioHPC, you can have a different session on each machine, but only if you move your session info out of your home directory (see specific instructions in the Rstudio Server setup section below).

  2. RStudio Desktop is an X-windows (graphical) software that each user can run from the command line. It is similar to Rstudio server, but does not operate through a web browser and is not persistent (unless you run it over VNC, which is not always reliable).  Most users find the graphics are slower with this option. But there are no limitations about number of instances per machine, or versions of R.

  3. You can run Rstudio Server in a docker container. In this way, you can have multiple independent instances of Rstudio server running on the same machine, each running different versions of R, if desired. The downside to this solution is that your R packages need to be installed in the docker container, so you cannot automatically use the packages installed on the BioHPC machines or in your home directory. However, once you have your R environment configured within your docker container, you can save this container as an image, which can be reused on any linux machine, or shared with others who may want to reproduce your research environment.

 

Setting up RStudio Server (without docker)

***Start-up instructions (but please read Notes below)***

  1. SSH Login (through Putty or Mac Terminal) to the BioHPC server you want to run RStudio. 

  2. Run the command "/programs/rstudio_server/mv_dir" if you want to keep the very big session data files under /workdir (see Note 2 below)

  3. Run the command "/programs/rstudio_server/rstudio_start"  to start RStudio server. You may get a message that Rstudio is already running. This may mean that another user on the same machine has already started Rstudio. In this case, proceed to step 4.

  4. From a browser on your laptop/desktop computer, go to this site "http://cbsuxxxxxx.biohpc.cornell.edu:8015".   (replacing the "cbsuxxxxx" with the acutal machine name ). Sometimes, you might need to reshresh the page once.  Log in with your BioHPC username and password.

  5. If you want to stop the RStudio Server, use the command "/programs/rstudio_server/rstudio_stop". Be aware that this will stop all Rstudio sessions on the machine (including other user's, if it is a hosted machine or you have a shared reservation).

*** Notes ***

  1. Rstudio Server runs through http (not https), but as BioHPC is only accessible through Cornell local network, it is reasonably safe.
  2. About your Rstudio session data (~/.rstudio):
    • Rstudio stores its “session info” in your home directory, at /home/$USER/.rstudio. You may want to move this to local storage, for two reasons:
      1. If you are working with big data files, the session info can get quite large, and your home directory is paid storage (or subject to a quota).
      2. Your home directory is shared across BioHPC machines, so if rsession data is stored there, you can only have one session across all machines. By moving it to local storage, you can have different Rstudio sessions on different BioHPC machines.
    • To move your session data to local storage, use the command:
      /programs/rstudio_server/mv_dir
      *before* starting Rstudio. This command will delete existing rstudio session data in your home directory (~/.rstudio), create a new directory for session data (/workdir/$USER/rstudio), and create a symbolic link (~/.rstudio) to it.
    • If you are switching between versions of R, you should clear your rstudio session data before startin Rstudio server, with the command:
      rm -fr /home/$USER/.rstudio/*
  3. Alternative versions of R: By default, R studio uses the default R. If you want to use another version of R, e.g. R v3.6,  start R studio with "/programs/rstudio_server/rstudio_start 3.6". The supported versions are 3.5, 3.5s, 3.5.2, 3.6, 3.6.3, 4.0.0 . When you switch between versions, make sure you delete the session data, and restart Rstudio, i.e.:
    /programs/rstudio_server/rstudio_stop
    rm -fr /home/$USER/.rstudio
    rm -fr /workdir/$USER/rstudio
    /programs/rstudio_server/mv_dir
    /programs/rstudio_server/rstudio_start 3.6.3

 

Running Rstudio Desktop

Rstudio Desktop is an X-windows software. If you do not know how to run an X-windows software on BioHPC lab, read this page: https://cbsu.tc.cornell.edu/lab/userguide.aspx?a=access#b (read the section under under How to run graphical applications)

To start the software, use this command: 

/programs/rstudio/bin/rstudio

 

Running Rstudio Server in Docker

To start docker container, use the commands:

docker1 pull rocker/rstudio:3.5.2
docker1 run -d -p 8009:8787 -e ROOT=TRUE -e PASSWORD=yourPassword rocker/rstudio:3.5.2

Then, if you are on the Cornell network (or connected to VPN), you should then be able to go to http://cbsuxxxx.biohpc.cornell.edu:8009 and log in with the username rstudio, and password yourPassword (Replace cbsuxxxx with actual machine name). If you get an error about the port already being used after the docker1 run command, you can try another available port: 8009-8019 are open, although 8015 is used by the non-docker Rstudio. Rstudio runs on port 8787 inside the docker container, but this is mapped to port 8009 on the BioHPC server. If you want to have multiple Rstudio sessions, you can repeat the "docker1 run" command with different ports (keep 8787 the same, but change 8009 to something between 8009-8019).

All your files in /workdir/$USER/ will be mounted at /workdir/ within the container.

The example above will run R version 3.5.2, most other versions of R are also available (see https://hub.docker.com/r/rocker/rstudio/). 

You can install your R packages within the docker container (you can do this inside your Rstudio session using the install.packages() function, for example). It is possible you may also need to open a bash terminal within the docker container to install any system libraries required by your R packages. We can guide you through this if you need help. Once you have all your packages installed, you can save the container as an image and use it on any linux machine, or share it with others to reproduce your research.

More information about docker at biohpc is here: https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=340#c

 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help