BioHPC Cloud:
: User Guide

BioHPC Cloud Software

There are 1300 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

3D Slicer, 3d-dna, 454 gsAssembler or gsMapper, 7zip, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, AF_unmasked, AFProfile, AFsample2, AGAT, agrep, albacore, Alder, AliTV-Perl interface, AlleleSeq, ALLMAPS, ALLPATHS-LG, Alphafold, Alphafold3, alphapickle, Alphapulldown, AlphScore, Amber, AMOS, AMPHORA, amplicon.py, AMRFinder, AMRplusplus, analysis, ANGSD, AnnotaPipeline, Annovar, ant, antiSMASH, anvio, apollo, arcs, ARGweaver, aria2, ariba, Arlequin, ART, ASEQ, aspera, assembly-stats, aster, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, autocycler, AWS command line interface, AWS v2 Command Line Interface, axe, axel, BA3, BactSNP, bakta, BAMdash, bamm, bamsnap, bamsurgeon, bamtools, bamUtil, barcode_splitter, BarNone, Basset, BayeScan, Bayescenv, bayesR, baypass, bazel, BBMap/BBTools, BCFtools, BCL convert, bcl2fastq, BCP, bdbag, Beagle, beagle-lib, BEAST, BEAST X, Beast2, bed2diffs, bedops, BEDtools, bettercallsal, bfc, bgc, bgen, bicycle, BiG-SCAPE, bigQF, bigtools, bigWig, bioawk, biobakery, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, biscuit, Bismark, Blackbird, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BlobToolKit, BLUPF90, BMGE, bmtagger, bonito, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BRBseqTools, BreedingSchemeLanguage, breseq, brisktyper, brocc, BSBolt, bsmap, BSseeker2, btyper3, BUSCO, BUSCO Phylogenomics, BWA, bwa-mem2, bwa-meth, bwtool, cactus, CAFE, CAFE5, caffe, cagee, canu, Canvas, CAP3, caper, CarveMe, catch, cBar, CBSU RNAseq, CCMetagen, CCTpack, cd-hit, cdbfasta, cdo, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, cellsnp-lite, centrifuge, centrifuger, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, CheckM2, chimera, ChimeraTE, chimerax, chip-seq-pipeline, chromeister, ChromHMM, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, clam, ClermonTyping, CLImATHET, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CMSeq, CNVnator, coidb, coinfinder, colabfold, COLMAP, CombFold, Comparative-Annotation-Toolkit, compat, CONCOCT, Conda, Conform-gt, Cooler, coolpuppy, cooltools, copyNumberDiff, cortex_var, CoverM, crabs, CRISPRCasFinder, CRISPResso, crispron, Cromwell, CrossMap, CRT, CSP2, cuda, Cufflinks, curatedMetagenomicDataTerminal, cutadapt, cuteFC, cuteSV, Cytoscape, dadi, dadi-1.6.3_modif, dadi-cli, danpos, DAS_Tool, dashing, DBSCAN-SWA, dDocent, DeconSeq, Deepbinner, deeplasmid, DeepTE, deepTools, Deepvariant, defusion, degenotate, delly, DESMAN, destruct, DETONATE, dfast, diamond, dipcall, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, dmtcp, dnmtools, Docker, dorado, DRAM, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, DWGSIM, dynare, ea-utils, EagleC, earlgrey, ecCodes, ecopcr, ecoPrimers, ectyper, EDGE, edirect, EDTA, eems, EgaCryptor, EGAD, egapx, eggnog-mapper, EIGENSOFT, elai, ElMaven, emacs, EMBLmyGFF3, EMBOSS, EMIRGE, Empress, emu, enfuse, EnTAP, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EukDetect, EukRep, EVE, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastAAI, FastANI, fastcluster, fastGEAR, FASTK, FastME, FastML, fastp, FastQ Screen, fastq-multx-1.4.3, fastq_demux, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal2, fastspar, fastStructure, FastTree, FASTX, fcs, FEELnc, feems, feh, FFmpeg, fgbio, ficle, figaro, Fiji, Filtlong, fineRADstructure, fineSTRUCTURE, FIt-SNE, FlaGs2, flash, flash2, flexbar, Flexible Adapter Remover, flexidot, Flye, FMAP, FragGeneScan, FragGeneScan, FRANz, freebayes, FSA, funannotate, FunGene Pipeline, FunOMIC, G-PhoCS, g4predict, GADMA, GAEMR, Galaxy, Galaxy in Docker, garlic, GATK, gatk4, gatk4amplicon.py, gblastn, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, gem, GEM library, GEMMA, GeMoMa, GENECONV, geneid, GeneMark, GeneRax, Genespace, GenoFLU, genomad, Genome STRiP, Genome Workbench, GenomeMapper, Genomescope, GenomeThreader, genometools, GenomicConsensus, genozip, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, GetOrganelle, gfastats, gfaviz, GffCompare, gffread, giggle, git, glactools, GlimmerHMM, GLIMPSE, GLnexus, Globus connect personal, GMAP/GSNAP, gmx_MMPBSA, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GONE, GoShifter, gradle, GraffiTE, graftM, grammy, GraPhlAn, graphtyper, graphviz, greenhill, GRiD, gridss, Grinder, grocsvs, GROMACS, GroopM, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, gunc, GUPPY, gvcftools, hail, hal, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, haplostrips, HaploSync, happ, HapSeq2, harpy, HarvestTools, haslr, hdf5, helixer, hget, hh-suite, HiC-Pro, hic_qc, HiCExplorer, HiFiAdapterFilt, hifiasm, hificnv, HiPhase, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, https://github.com/CVUA-RRW/RRW-PrimerBLAST, https://github.com/ksahlin/strobealign, hugin, humann, HUMAnN2, hybpiper, hyde, HyLiTE, Hyper-Gen, hyperopt, HyPhy, hyphy-analyses, iAssembler, IBDLD, IBDNe, IBDseq, idba, IDBA-UD, idemux, IDP-denovo, idr, idseq, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, impute5, IMSA-A, INDELseek, infernal, Infomap, inspector, inStrain, inStrain_lite, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, IRMA, isoseq, itsx, iva, ivar, JaBbA, jags, Jane, java, jbrowse, JCVI, jellyfish, jsalignon/cactus, juicer, julia, jupyter, jupyterlab, kaiju, kallisto, Kent Utilities, keras, khmer, kineticsTools, kinfin, king, kma, KMC, KmerFinder, KmerGenie, kneaddata, kraken, KrakenTools, KronaTools, kSNP, kWIP, LACHESIS, lammps, LAPACK, lapels, LAST, lastz, lcMLkin, LDAK, LDBlockShow, LDhat, LeafCutter, leeHom, lefse, lep-anchor, Lep-MAP3, LEVIATHAN, lftp, Liftoff, lifton, Lighter, LinkedSV, LINKS, localcolabfold, LocARNA, LocusZoom, lofreq, longranger, Loupe, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, m6anet, Macaulay2, MACE, MACS, MaCS simulator, MACS2, macs3, maffilter, MAFFT, mafTools, MAGeCK, MAGeCK-VISPR, Magic-BLAST, magick, MAGScoT, majiq, MAKER, manta, mapDamage, mapquik, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Matlab_runtime, Mauve, MaxBin, MaxQuant, McClintock, mccortex, mcl, MCscan, MCScanX, mdust, medaka, medusa, medusa2, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, merqury, meryl, MetaBAT, MetaBinner, MetaboAnalystR, MetaCache, MetaCRAST, metaCRISPR, metamaps, MetAMOS, MetaPathways, MetaPhlAn, metapop, metaron, MetaVelvet, MetaVelvet-SL, metaWRAP, methbat, methpipe, methylasso, mfeprimer, MGmapper, MicrobeAnnotator, microtrait, MIDAS, MiFish, Migrate-n, mikado, MinCED, minigraph, Minimac3, Minimac4, minimap2, miniprot, mira, miRDeep2, mirge3, miRquant, MISO, MITE-Hunter, MITE-Tracker, MITObim, MitoFinder, mitohelper, MitoHiFi, mity, MiXCR, MixMapper, MKTest, mlift, MLNe, mlst, MMAP, MMSEQ, MMseqs2, MMTK, MobileElementFinder, modeltest, MODIStsp-2.0.5, module, moments, momi, MoMI-G, mongo, mono, monocle3, morphographx, mosdepth, mothur, MrBayes, mrcanavar, mrsFAST, msdial, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, mummer2circos, muscle, MUSIC, Mutation-Simulator, muTect, myte, MZmine, nag-compiler, namfinder, nanocompore, nanofilt, NanoPlot, Nanopolish, nanovar, ncbi_datasets, ncftp, ncl, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, NextPolish2, nf-core, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NGSNGS, ngsplot, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, NLRtracker, Novoalign, NovoalignCS, nQuire, NRSA, ntSynt, NuDup, numactl, nvidia-docker, nvtop, Oases, OBITools, Octave, odgi, OMA, Oneflux, OpenBLAS, openmpi, openslide, openssl, ORFeus, orthodb-clades, OrthoFinder, orthologr, Orthomcl, osfclient, pacbio, PacBioTestData, PAGIT, pairtools, pal2nal, paleomix, PAML, panacus, panaroo, pandas, pandaseq, pandoc, pangene, pankmer, PanPhlAn, PanPhlAn_pangenome_exporter, Panseq, pantools, paPAML, Parsnp, PASA, PASTEC, PAUP*, pauvre, pb-assembly, pb-CpG-tools, pbalign, pbbam, pbh5tools, PBJelly, pblat, pbmm2, PBSuite, pbsv, pbtk, PCAngsd, pcre, pcre2, PeakRanger, peaks2utr, PeakSplitter, PEAR, PEER, PennCNV, peppro, PERL, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, pharokka, phasedibd, PHAST, phenopath, Phobius, PHRAPL, phykit, PHYLIP, PhyloCSF, phyloFlash, phylonet, phylophlan*, PhyloPhlAn2, phylophlan3, phyluce, PhyML, phyx, Picard, PICRUSt2, pigz, Pilon, Pindel, piPipes, PIQ, piranha, pixy, PlasFlow, platanus, Platypus, plink, plink2, Plotly, plotsr, plumed, pocp, Point Cloud Library, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, POUTINE, pplacer, PRANK, preseq, pretext-suite, primalscheme, primer3, PrimerBLAST, PrimerPooler, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, ProtExcluder, protolite, PSASS, psmc, psutil, pullseq, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pymol-open-source, pyopencl, pypy, pyRAD, pyrho, Pyro4, pyseer, PySnpTools, python, PyTorch, PyVCF, qapa, qcat, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, QuPath, R, RACA, racon, rad_haplotyper, RADIS, RadSex, RagTag, rapt, RAPTR-SV, RATT, raven, RAxML, raxml-ng, Ray, rck, rclone, Rcorrector, RDP Classifier, readtagger, REAGO, REAPR, Rebaler, reCOGnizer, Red, ReferenceSeeker, regenie, regtools, Relate, relion, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, ReSeq, resistify, RevBayes, RFdiffusion, RFMix, RGAAT, rgdal, RGI, Rgtsvm, Ribotaper, ripgrep, rJava, rMATS, RNAMMER, rnaQUAST, Rnightlights, roadies, Roary, Rockhopper, rohan, RoseTTAFold-All-Atom, RoseTTAFold2NA, rphast, Rqtl, Rqtl2, RSAT, rseg, RSEM, RSeQC, RStudio, rtfbs_db, ruby, run_dbcan, rust, rv-tdt, sabre, SaguaroGW, salmon, SALSA, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, sawfish, SCALE, scanorama, SCE-VCF, scikit-learn, Scoary, scoary-2, scTE, scythe, seaborn, SEACR, SecretomeP, segul, self-assembling-manifold, selscan, seqfu, seqkit, SeqPrep, SeqSero2, seqtk, SequelTools, sequenceTubeMap, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, SHAPEIT5, shasta, Shiny, shoelaces, shore, SHOREmap, shortBRED, SHRiMP, SICER2, sickle, sift4g, SignalP, SimPhy, simsapiper, simuPOP, simuscop, sina, SINGER, singularity, sinto, sirius, sistr_cmd, skani, SKESA, skewer, SLiM, SLURM, smap, smash, smcpp, smoove, SMRT Analysis, SMRT LINK, smudgeplot, snakemake, snap, SnapATAC, snapatac2, SNAPP, SnapTools, snATAC, SNeP, Sniffles, snippy, snp-sites, snpArcher, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SoloTE, SomaticSniper, songbird, sorted_grep, sourmash, spaceranger, SPAdes, SPALN, SparCC, sparsehash, SPARTA, SpeciesDetector, speedseq, split-fasta, SQANTI3, sqlite, SqueezeMeta, SQuIRE, SRA Toolkit, srst2, ssantichaivekin/empress, stacks, Stacks 2, stairway-plot, stampy, STAR, staramr, Starcode, statmodels, stellarscope, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, Struo2, stylegan2-ada-pytorch, subread, sumatra, supernova, suppa, SURPI, surpyvor, SURVIVOR, sutta, SV-plaudit, SVaBA, SVclone, SVDetect, svengine, SVseq2, svtools, svtyper, svviz2, SWAMP, sweed, SweepFinder, SweepFinder2, sweepsims, swiss2fasta.py, sword, syri, tabix, TAGADA, tagdust, Taiji, tama, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tax_myPHAGE, tbl2asn, tcoffee, TE-Aid, TEFLoN, telescope, TELR, TEMP2, TensorFlow, TEToolkit, TEtranscripts, texlive, TFEA, tfmodisco, tfTarget, thermonucleotideBLAST, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, tree, treeCl, treemix, treePL, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, TrioCNV2, tRNAscan-SE, Trycycler, twisst2, UBCG2, UCSC Kent utilities, ullar, ultra, ultraplex, UMAP, UMI-tools, umi-transfer, UMIScripts, Unicycler, UniRep, unitig-caller, unrar, usearch, VALET, valor, vamb, variabel, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, VCF2PCACluster, vcf2phylip, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, verkko, VESPA, vg, Vicuna, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, visidata, vispr, VizBin, vmatch, vscode, vsearch, vSNP3, vt, WASP, webin-cli, wget, wgs-assembler (Celera), WGSassign, What_the_Phage, whatshap, wiggletools, windowmasker, wine, Winnowmap, Wise2 (Genewise), wombat, Xander_assembler, xpclr, yaha, yahs, yap

Details for atac-seq-pipeline (If the copy-pasted commands do not work, use this tool to remove unwanted characters)

Name:	atac-seq-pipeline
Version:	2.2.2
OS:	Linux
About:	This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.
Added:	10/27/2018 9:50:44 AM
Updated:	12/5/2023 12:19:43 PM
Link:	https://github.com/ENCODE-DCC/atac-seq-pipeline
Notes:	Run the pipeline in "screen" persistant session. If you have run previous version caper and atac-seq-pipeline, or you run into problems with this pipeline, delete the .caper directory to reset caper ("rm -fr $HOME/.caper") Instructions to run latest version pip install caper --upgrade mkdir /workdir/$USER cd /workdir/$USER git clone https://github.com/ENCODE-DCC/atac-seq-pipeline.git cd atac-seq-pipeline wget -O atac-seq-pipeline.sif https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.2.sif 2. Prepare reference genome database 1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later. 2) If you work with other species, follow the instructions below to prepare the reference genome database. Edit the script "/workdir/$USER/atac-seq-pipeline/scripts/build_genome_data.sh", setting the values for GENOME, DEST_DIR, TSV, MITO_CHR_NAME. The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. cd /workdir/$USER/atac-seq-pipeline cp /PATH/TO/your.genome.fa.gz ./ singularity run --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./scripts/build_genome_data.sh 3. Put your atac-seq data files ( .fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline 4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file") 6. run pipeline You might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used. export PATH=~/.local/bin:$PATH caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif --max-concurrent-tasks 4 7. summarize results cd atac ls -l cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l croo metadata.json qc2tsv qc/qc.json > qc.tsv The results should be in the directories "peak" "qc" and "signal" , report files croo, and QC table qc.tsv Instructions to run v2.1.1 (for v2.1.1 and v 1.10 see below) Run the pipeline in "screen" persistant session. 1. Install caper version 2.1.3 in your home directory. Copy atac-seq-pipeline to the server you are working on, and set environment pip install caper==2.1.3 croo qc2tsv --upgrade mkdir /workdir/$USER cd /workdir/$USER cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER 2. Prepare reference genome database 1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later. 2) If you work with other species, follow the instructions below to prepare the reference genome database. Edit the script "/workdir/$USER/atac-seq-pipeline-2.1.1/build_genome_data_mod.sh", setting the values for GENOME, GENOME DEST_DIR, REF_FA, MITO_CHR_NAME, the rest of the parameters are optional. Make sure that you need to use the full path for file name and destination directory, for example, /workdir/$USER/atac-seq-pipeline-2.1.1/genomedb, /workdir/$USER/atac-seq-pipeline-2.1.1/mygenome.fasta. The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. cd /workdir/$USER/atac-seq-pipeline-2.1.1 cp /PATH/TO/your.genome.fa.gz ./ singularity exec --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./build_genome_data_mod.sh 3. Put your atac-seq data files ( .fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-2.1.1 4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-2.1.1 You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file") 5. Set number of cpu per task. Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-2.1.1/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used. 6. run pipeline export PATH=~/.local/bin:$PATH caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif 7. summarize results cd atac ls -l cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l croo metadata.json qc2tsv qc/qc.json > qc.tsv The results should be in the directories "peak" "qc" and "signal" , report files croo, and QC table qc.tsv Run pipeline with example data files Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session) export PATH=~/.local/bin:$PATH `export ATACROOT=/workdir/$USER/atac-seq-pipeline-2.1.1` `mkdir /workdir/$USER cd /workdir/$USER/ cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER #download test data set wget https://raw.githubusercontent.com/ENCODE-DCC/atac-seq-pipeline/master/example_input_json/ENCSR356KRQ_subsampled.json #process test dataset caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled.json --singularity $ATACROOT/atac-seq-pipeline.sif` If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html. # After the work is finished, organize output results with croo cd atac ls -l cd xxxxxxx #replace xxxxxxx with the run directory you get from "ls -l" croo metadata.json qc2tsv qc/qc.json > qc.tsv The results should be in the directories "peak" "qc" and "signal" , report files croo, and QC table qc.tsv Instructions to run v1.10.0* 1. Copy software directory to the server you are working on, and set environment export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages export PATH=/programs/caper/bin:$PATH export version=1.10.0 mkdir /workdir/$USER cd /workdir/$USER cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER 2. Prepare reference genome database 1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later. 2) If you work with other species, follow the instructions below to prepare the reference genome database. Edit the script "/workdir/$USER/atac-seq-pipeline-1.10.0/build_genome_data_mod.sh", setting the values for GENOME, GENOME DEST_DIR REF_FA MITO_CHR_NAME, the rest of the parameters are optional. Make sure that the chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. cd /workdir/$USER/atac-seq-pipeline-1.10.0 cp /PATH/TO/your.genome.fa.gz ./ singularity exec atac-seq-pipeline.sif ./build_genome_data.sh 3. Put your atac-seq data files ( .fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-1.10.0 4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-1.10.0. You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file") 5. Set number of cpu per task. Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-1.10.0/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used. 6. run pipeline caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif 7. summarize results cd atac ls -l cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l croo metadata.json qc2tsv qc/qc.json > qc.tsv The results should be in the directories "peak" "qc" and "signal" , report files croo, and QC table qc.tsv Run pipeline with example data files Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session) export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages export PATH=/programs/caper/bin:$PATH export version=1.10.0 mkdir /workdir/$USER cd /workdir/$USER/ cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER #download test data set wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json #process test dataset export ATACROOT=/workdir/$USER/atac-seq-pipeline-${version} caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled_caper.json --singularity $ATACROOT/atac-seq-pipeline.sif If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html. # After the work is finished, organize output results with croo export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages export PATH=/programs/caper/bin:$PATH cd atac ls -l cd xxxxxxx #replace xxxxxxx with the run directory you get from "ls -l" croo metadata.json qc2tsv qc/qc.json > qc.tsv The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv

Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

Website credentials:

Web Accessibility Help

drupal search

BioHPC Cloud:: User Guide

BioHPC Cloud:
: User Guide