institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Software

There is 831 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

3d-dna, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, agrep, albacore, Alder, AlleleSeq, ALLMAPS, ALLPATHS-LG, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, Annovar, antiSMASH, anvio, apollo, arcs, ARGweaver, Arlequin, ART, aspera, assembly-stats, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, AWS v2 Command Line Interface, axe, BactSNP, bakta, bam2fastx, bamtools, bamUtil, BarNone, Basset, BayeScan, Bayescenv, baypass, BBmap, BCFtools, bcl2fastq, BCP, Beagle, Beast2, bedops, BEDtools, bfc, bgc, bgen, bigQF, bigWig, bioawk, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, Bismark, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BLUPF90, BMGE, bmtagger, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BreedingSchemeLanguage, breseq, brocc, BSseeker2, BUSCO, BWA, bwa-meth, cactus, CAFE, canu, CAP3, caper, CarveMe, cBar, CBSU RNAseq, CCTpack, cd-hit, cdbfasta, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, chimera, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CNVnator, compat, CONCOCT, Conda, copyNumberDiff, cortex_var, CRISPRCasFinder, CRISPResso, CrossMap, CRT, cuda, Cufflinks, cutadapt, cuteSV, dadi, dadi-1.6.3_modif, danpos, dDocent, DeconSeq, Deepbinner, DeepTE, deepTools, defusion, delly, DESMAN, destruct, DETONATE, diamond, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, Docker, dREG, dREG.HD, drep, drive, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, dynare, ea-utils, ecopcr, ecoPrimers, ectyper, EDGE, edirect, eems, EgaCryptor, EGAD, EIGENSOFT, EMBOSS, Empress, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastANI, fastcluster, FastME, FastML, fastp, FastQ Screen, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal26, fastStructure, FastTree, FASTX, feh, FFmpeg, fineRADstructure, fineSTRUCTURE, FIt-SNE, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, freebayes, FSA, FunGene Pipeline, G-PhoCS, GADMA, GAEMR, Galaxy in Docker, Galaxy Server, GATK, gatk4, gatk4amplicon.py, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GENECONV, geneid, GeneMark, GeneMarker, Genome STRiP, GenomeMapper, GenomeStudio (Illumina), GenomeThreader, genometools, GenomicConsensus, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GffCompare, gffread, giggle, glactools, GlimmerHMM, GMAP/GSNAP, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GoShifter, gradle-4.4, graftM, GraPhlAn, graphviz, GRiD, Grinder, GROMACS, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, GUPPY, hail, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, HapSeq2, HarvestTools, haslr, hdf5, hh-suite, HiC-Pro, HiCExplorer, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, HUMAnN2, hyperopt, HyPhy, iAssembler, IBDLD, idba, IDBA-UD, IDP-denovo, idr, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, IMSA-A, INDELseek, infernal, Infomap, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, jags, Jane, java, jbrowse, JCVI, jellyfish, JoinMap, juicer, julia, jupyter, kallisto, Kent Utilities, keras, khmer, kinfin, king, KmerFinder, KmerGenie, kraken, kSNP, kWIP, LACHESIS, lammps, LAST, lcMLkin, LDAK, leeHom, lep-anchor, Lep-MAP3, lftp, Lighter, LinkedSV, LINKS, LocARNA, LocusZoom, lofreq, longranger, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, MACE, MACS, MaCS simulator, MACS2, MAFFT, mafTools, Magic-BLAST, magick, MAKER, mapDamage, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Mauve, MaxBin, McClintock, mccortex, mcl, MCscan, MCScanX, medusa, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, MetaBAT, MetaCRAST, metaCRISPR, MetAMOS, MetaPathways, MetaPhlAn, metaron, MetaVelvet, MetaVelvet-SL, MGmapper, Migrate-n, mikado, MinCED, Minimac3, Minimac4, minimap2, mira, miRDeep2, MISO (misopy), MITObim, MiXCR, MixMapper, MKTest, mlst, MMAP, MMSEQ, MMseqs2, MMTK, modeltest, moments, mono, monocle3, mosdepth, mothur, MrBayes, mrsFAST, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, muscle, MUSIC, Mutation-Simulator, muTect, MZmine, nag-compiler, nanofilt, Nanopolish, ncftp, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, Novoalign, NovoalignCS, nQuire, NRSA, nvidia-docker, Oases, OBITools, Octave, OMA, openmpi, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, paleomix, PAML, panaroo, pandas, pandaseq, pandoc, PanPhlAn, Panseq, Parsnp, PASA, PASTEC, PAUP*, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pbmm2, PBSuite, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan, PhyloPhlAn2, phylophlan3, PhyML, Picard, pigz, Pilon, Pindel, piPipes, PIQ, PlasFlow, platanus, Platypus, plink, plink2, Plotly, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, pplacer, PRANK, preseq, primalscheme, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, protolite, PSASS, psmc, psutil, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pyopencl, pypy, pyRAD, Pyro4, PySnpTools, python, PyTorch, PyVCF, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, R, RACA, racon, RADIS, RadSex, rapt, RAPTR-SV, RATT, RAxML, raxml-ng, Ray, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, Red, ReferenceSeeker, Relate, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, RFMix, RGAAT, rgdal, RGI, Rgtsvm, ripgrep, rJava, RNAMMER, rnaQUAST, Rnightlights, Roary, Rockhopper, rphast, Rqtl, Rqtl2, RSEM, RSeQC, RStudio, rtfbs_db, ruby, sabre, SaguaroGW, salmon, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, scikit-learn, Scoary, scythe, seaborn, SecretomeP, selscan, Sentieon, seqkit, SeqPrep, seqtk, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, shasta, Shiny, shore, SHOREmap, shortBRED, SHRiMP, sickle, SignalP, SimPhy, simuPOP, singularity, sinto, sistr_cmd, SKESA, skewer, SLiM, SLURM, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, SNAPP, snATAC, SNeP, Sniffles, snippy, snp-sites, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SomaticSniper, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, SPARTA, sqlite, SRA Toolkit, srst2, stacks, Stacks 2, stairway-plot, stampy, STAR, Starcode, statmodels, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, supernova, SURPI, sutta, SV-plaudit, SVDetect, SVseq2, svtools, svtyper, SWAMP, SweepFinder, sweepsims, tabix, Taiji, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tbl2asn, tcoffee, TensorFlow, TEToolkit, TEtranscripts, texlive, tfTarget, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, treeCl, treemix, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, tRNAscan-SE, UCSC Kent utilities, UMAP, UMI-tools, Unicycler, UniRep, unrar, usearch, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, VizBin, vmatch, vsearch, vt, WASP, wgs-assembler (Celera), Wise2 (Genewise), Xander_assembler, yaha

Details for atac-seq-pipeline (hide)

Name:atac-seq-pipeline
Version:1.9.3
OS:Linux
About:This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.
Added:10/27/2018 9:50:44 AM
Updated:5/4/2021 5:34:03 PM
Link:https://github.com/ENCODE-DCC/atac-seq-pipeline
Notes:

Part1. Run pipeline with Singularity (the easiest) 

Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session)

export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH

mkdir /workdir/$USER
cd /workdir/$USER/
cp -r /programs/atac-seq-pipeline-1.9.3 /workdir/$USER

#download test data set
wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json

#process test dataset
export ATACROOT=/workdir/$USER/atac-seq-pipeline-1.9.3

caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled_caper.json --singularity --singularity-cachedir $ATACROOT

If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html.

# After the work is finished, organize output results with croo

export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
cd atac
ls -l 
cd xxxxxxx  #replace xxxxxxx with the run directory you get from "ls -l"
croo metadata.json
qc2tsv qc/qc.json  > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

When you run you real data, you need to 

1)prepare reference genome file.

  • in the directory "atac-seq-pipeline-1.9.3", there is a script "build_genome_data.sh". Edit this script, including "YOUR_OWN_GENOME" ,  REGEX_BFILT_PEAK_CHR_NAME=".*"(optional), MITO_CHR_NAME="chrM" (if not known, remove "chrM" include the two quotation marks),  REF_FA="https://some.where.com/your.genome.fa.gz"(change to full file path of the genome fasta file", BLACKLIST= (add path if available, otherwise keep empty). 
  • run these commands to create genome database. (after done, you should see a directory /workdir/$USER/genomedb with a .tsv file)
source /home/$USER/miniconda3/bin/activate encode-atac-seq-pipeline

/workdir/$USER/atac-seq-pipeline-1.9.3

./build_genome_data.sh YOUR_OWN_GENOME /workdir/$USER/genomedb

* I modified this script build_genome_data.sh to work with local genome fasta file. If you are working with human/mouse data, see

 https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/build_genome_database.md

2) prepare input json file.

 https://github.com/ENCODE-DCC/atac-seq-pipeline  (section under "Input JSON file")

3) Number of cpu per task is specified in the file atac.wdl. In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores will be used.

 

Part 2. Install/runpipeline with conda (alternative way)

Here are the instructions to install the pipeline in conda. (It could take 2 hours, run these steps in "screen" persistent session). I also provided below how to run pre-installed v1.7.0.

1. Install Conda if you do not have it in your home directory. You can follow this page to install conda https://biohpc.cornell.edu/lab/doc/Software_Installation_exercises1.html (part 1).

2. Install the latest version atac-seq-pipeline.(there is an issue with bowtie2 in Conda right now 2021/05/03. A temporary fix is to downgrade tbb after installation. I included this step in the commands below. I will remove this step once Conda fixed the bowtie2 problem).

source /home/$USER/miniconda3/bin/activate
mkdir /workdir/$USER
cd /workdir/$USER
git clone https://github.com/ENCODE-DCC/atac-seq-pipeline.git
bash scripts/uninstall_conda_env.sh
bash scripts/install_conda_env.sh mamba
conda activate encode-atac-seq-pipeline
conda install tbb=2020.2
pip install caper croo

#keep a copy of the atac.wdl file in your home directory. This file might be useful later
cp -r /workdir/$USER/atac-seq-pipeline/atac.wdl /home/$USER/ 

3. Run software through cromwell on a local computer.

#set environment

source /home/$USER/miniconda3/bin/activate encode-atac-seq-pipeline

# run test data.

The cpu core per task can be set in the atac.wdl file (under "# group: resource_parameter"). Maximum number of concurrent tasks can be restricted by the parameter max-concurrent-tasks.

cd /workdir/$USER
wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json

caper run atac.wdl -i ENCSR356KRQ_subsampled_caper.json v

 

 

*Here are the instructions to run version 1.7.0 (4/2020) installed on BioHPC, and process a test data set. Run the software using "screen". It could take 30 min for the test data. (Instructions to install latest version is below)

## Activate the conda environment

source /programs/miniconda3/bin/activate encode-atac-seq-pipeline

##Test run

cd /workdir/$USER
cp /programs/atac-seq-pipeline-1.7.0/atac.wdl .

wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json

caper run ./atac.wdl -i ENCSR356KRQ_subsampled_caper.json

##When working with your own data, you need to create your own .json file. Please read the documentation, https://github.com/ENCODE-DCC/atac-seq-pipeline, under the section "Input JSON file"

## qc2tsv (https://github.com/ENCODE-DCC/qc2tsv#installation) and Croo (https://github.com/ENCODE-DCC/croo#installation) under the same conda environment 

To run latest version of atac-seq pipeline, please install the software in your home directory. We are not able to maintain multiple different versions simultaneously. 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help