institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 

 


BioHPC Cloud Software

There are 1209 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here

Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.

3D Slicer, 3d-dna, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, AF_unmasked, AFProfile, AGAT, agrep, albacore, Alder, AliTV-Perl interface, AlleleSeq, ALLMAPS, ALLPATHS-LG, Alphafold, Alphafold3, alphapickle, Alphapulldown, AlphScore, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, AnnotaPipeline, Annovar, ant, antiSMASH, anvio, apollo, arcs, ARGweaver, aria2, ariba, Arlequin, ART, ASEQ, aspera, assembly-stats, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, AWS v2 Command Line Interface, axe, axel, BA3, BactSNP, bakta, bamsnap, bamsurgeon, bamtools, bamUtil, barcode_splitter, BarNone, Basset, BayeScan, Bayescenv, bayesR, baypass, bazel, BBMap/BBTools, BCFtools, BCL convert, bcl2fastq, BCP, bdbag, Beagle, beagle-lib, BEAST, BEAST X, Beast2, bed2diffs, bedops, BEDtools, bettercallsal, bfc, bgc, bgen, bicycle, BiG-SCAPE, bigQF, bigtools, bigWig, bioawk, biobakery, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, biscuit, Bismark, Blackbird, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BlobToolKit, BLUPF90, BMGE, bmtagger, bonito, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BRBseqTools, BreedingSchemeLanguage, breseq, brocc, BSBolt, bsmap, BSseeker2, btyper3, BUSCO, BUSCO Phylogenomics, BWA, bwa-mem2, bwa-meth, bwtool, cactus, CAFE, CAFE5, caffe, cagee, canu, Canvas, CAP3, caper, CarveMe, catch, cBar, CBSU RNAseq, CCMetagen, CCTpack, cd-hit, cdbfasta, cdo, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, CheckM2, chimera, ChimeraTE, chimerax, chip-seq-pipeline, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, ClermonTyping, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CMSeq, CNVnator, coinfinder, colabfold, CombFold, Comparative-Annotation-Toolkit, compat, CONCOCT, Conda, Cooler, copyNumberDiff, cortex_var, CoverM, crabs, CRISPRCasFinder, CRISPResso, crispron, Cromwell, CrossMap, CRT, cuda, Cufflinks, curatedMetagenomicDataTerminal, cutadapt, cuteSV, Cytoscape, dadi, dadi-1.6.3_modif, dadi-cli, danpos, DAS_Tool, dashing, DBSCAN-SWA, dDocent, DeconSeq, Deepbinner, deeplasmid, DeepTE, deepTools, Deepvariant, defusion, delly, DESMAN, destruct, DETONATE, dfast, diamond, dipcall, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, dnmtools, Docker, dorado, DRAM, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, DWGSIM, dynare, ea-utils, earlgrey, ecCodes, ecopcr, ecoPrimers, ectyper, EDGE, edirect, EDTA, eems, EgaCryptor, EGAD, eggnog-mapper, EIGENSOFT, elai, ElMaven, EMBLmyGFF3, EMBOSS, EMIRGE, Empress, enfuse, EnTAP, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EukDetect, EukRep, EVE, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastAAI, FastANI, fastcluster, fastGEAR, FastME, FastML, fastp, FastQ Screen, fastq-multx-1.4.3, fastq_demux, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal2, fastspar, fastStructure, FastTree, FASTX, fcs, feems, feh, FFmpeg, fgbio, ficle, figaro, Fiji, Filtlong, fineRADstructure, fineSTRUCTURE, FIt-SNE, FlaGs2, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, FRANz, freebayes, FSA, funannotate, FunGene Pipeline, FunOMIC, G-PhoCS, GADMA, GAEMR, Galaxy, Galaxy in Docker, GATK, gatk4, gatk4amplicon.py, gblastn, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GeMoMa, GENECONV, geneid, GeneMark, GeneRax, Genespace, genomad, Genome STRiP, Genome Workbench, GenomeMapper, Genomescope, GenomeThreader, genometools, GenomicConsensus, genozip, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, gfaviz, GffCompare, gffread, giggle, git, glactools, GlimmerHMM, GLIMPSE, GLnexus, Globus connect personal, GMAP/GSNAP, gmx_MMPBSA, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GONE, GoShifter, gradle, graftM, grammy, GraPhlAn, graphtyper, graphviz, greenhill, GRiD, gridss, Grinder, grocsvs, GROMACS, GroopM, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, gunc, GUPPY, gvcftools, hail, hal, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, haplostrips, HaploSync, HapSeq2, harpy, HarvestTools, haslr, hdf5, helixer, hget, hh-suite, HiC-Pro, hic_qc, HiCExplorer, HiFiAdapterFilt, hifiasm, hificnv, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, https://github.com/CVUA-RRW/RRW-PrimerBLAST, hugin, humann, HUMAnN2, hybpiper, HyLiTE, Hyper-Gen, hyperopt, HyPhy, hyphy-analyses, iAssembler, IBDLD, IBDNe, IBDseq, idba, IDBA-UD, idemux, IDP-denovo, idr, idseq, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, impute5, IMSA-A, INDELseek, infernal, Infomap, inspector, inStrain, inStrain_lite, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, isoseq, JaBbA, jags, Jane, java, jbrowse, JCVI, jellyfish, jsalignon/cactus, juicer, julia, jupyter, jupyterlab, kaiju, kallisto, Kent Utilities, keras, khmer, kinfin, king, kma, KMC, KmerFinder, KmerGenie, kneaddata, kraken, KrakenTools, KronaTools, kSNP, kWIP, LACHESIS, lammps, LAPACK, lapels, LAST, lastz, lcMLkin, LDAK, LDBlockShow, LDhat, LeafCutter, leeHom, lep-anchor, Lep-MAP3, LEVIATHAN, lftp, Liftoff, lifton, Lighter, LinkedSV, LINKS, localcolabfold, LocARNA, LocusZoom, lofreq, longranger, Loupe, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, m6anet, Macaulay2, MACE, MACS, MaCS simulator, MACS2, macs3, maffilter, MAFFT, mafTools, MAGeCK, MAGeCK-VISPR, Magic-BLAST, magick, MAGScoT, MAKER, manta, mapDamage, mapquik, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Matlab_runtime, Mauve, MaxBin, MaxQuant, McClintock, mccortex, mcl, MCscan, MCScanX, mdust, medaka, medusa, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, merqury, MetaBAT, MetaBinner, MetaboAnalystR, MetaCache, MetaCRAST, metaCRISPR, metamaps, MetAMOS, MetaPathways, MetaPhlAn, metapop, metaron, MetaVelvet, MetaVelvet-SL, metaWRAP, methpipe, mfeprimer, MGmapper, MicrobeAnnotator, microtrait, MIDAS, MiFish, Migrate-n, mikado, MinCED, minigraph, Minimac3, Minimac4, minimap2, miniprot, mira, miRDeep2, mirge3, miRquant, MISO, MITE-Hunter, MITObim, MitoFinder, mitohelper, MitoHiFi, mity, MiXCR, MixMapper, MKTest, mlift, mlst, MMAP, MMSEQ, MMseqs2, MMTK, MobileElementFinder, modeltest, MODIStsp-2.0.5, module, moments, momi, MoMI-G, mongo, mono, monocle3, mosdepth, mothur, MrBayes, mrcanavar, mrsFAST, msdial, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, mummer2circos, muscle, MUSIC, Mutation-Simulator, muTect, myte, MZmine, nag-compiler, namfinder, nanocompore, nanofilt, NanoPlot, Nanopolish, nanovar, ncbi_datasets, ncftp, ncl, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, NextPolish2, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NGSNGS, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, NLRtracker, Novoalign, NovoalignCS, nQuire, NRSA, NuDup, numactl, nvidia-docker, nvtop, Oases, OBITools, Octave, OMA, Oneflux, OpenBLAS, openmpi, openslide, openssl, ORFeus, orthodb-clades, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, pairtools, pal2nal, paleomix, PAML, panacus, panaroo, pandas, pandaseq, pandoc, pangene, PanPhlAn, Panseq, pantools, Parsnp, PASA, PASTEC, PAUP*, pauvre, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pblat, pbmm2, PBSuite, pbsv, pbtk, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PERL, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, pharokka, phasedibd, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan*, PhyloPhlAn2, phylophlan3, phyluce, PhyML, phyx, Picard, PICRUSt2, pigz, Pilon, Pindel, piPipes, PIQ, pixy, PlasFlow, platanus, Platypus, plink, plink2, Plotly, plotsr, plumed, pocp, Point Cloud Library, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, POUTINE, pplacer, PRANK, preseq, pretext-suite, primalscheme, primer3, PrimerBLAST, PrimerPooler, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, ProtExcluder, protolite, PSASS, psmc, psutil, pullseq, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pymol-open-source, pyopencl, pypy, pyRAD, pyrho, Pyro4, pyseer, PySnpTools, python, PyTorch, PyVCF, qapa, qcat, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, QuPath, R, RACA, racon, rad_haplotyper, RADIS, RadSex, RagTag, rapt, RAPTR-SV, RATT, raven, RAxML, raxml-ng, Ray, rck, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, Rebaler, reCOGnizer, Red, ReferenceSeeker, regenie, regtools, Relate, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, ReSeq, resistify, RevBayes, RFdiffusion, RFMix, RGAAT, rgdal, RGI, Rgtsvm, Ribotaper, ripgrep, rJava, rMATS, RNAMMER, rnaQUAST, Rnightlights, roadies, Roary, Rockhopper, rohan, RoseTTAFold-All-Atom, RoseTTAFold2NA, rphast, Rqtl, Rqtl2, RSAT, RSEM, RSeQC, RStudio, rtfbs_db, ruby, run_dbcan, sabre, SaguaroGW, salmon, SALSA, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, SCE-VCF, scikit-learn, Scoary, scoary-2, scTE, scythe, seaborn, SEACR, SecretomeP, segul, self-assembling-manifold, selscan, seqfu, seqkit, SeqPrep, seqtk, SequelTools, sequenceTubeMap, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, SHAPEIT5, shasta, Shiny, shoelaces, shore, SHOREmap, shortBRED, SHRiMP, sickle, sift4g, SignalP, SimPhy, simuPOP, sina, SINGER, singularity, sinto, sirius, sistr_cmd, skani, SKESA, skewer, SLiM, SLURM, smap, smash, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, snapatac2, SNAPP, SnapTools, snATAC, SNeP, Sniffles, snippy, snp-sites, snpArcher, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SoloTE, SomaticSniper, songbird, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, sparsehash, SPARTA, speedseq, split-fasta, SQANTI3, sqlite, SqueezeMeta, SQuIRE, SRA Toolkit, srst2, ssantichaivekin/empress, stacks, Stacks 2, stairway-plot, stampy, STAR, staramr, Starcode, statmodels, stellarscope, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, Struo2, stylegan2-ada-pytorch, subread, sumatra, supernova, suppa, SURPI, surpyvor, SURVIVOR, sutta, SV-plaudit, SVaBA, SVclone, SVDetect, svengine, SVseq2, svtools, svtyper, svviz2, SWAMP, sweed, SweepFinder, SweepFinder2, sweepsims, swiss2fasta.py, sword, syri, tabix, tagdust, Taiji, tama, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tax_myPHAGE, tbl2asn, tcoffee, TE-Aid, telescope, TELR, TensorFlow, TEToolkit, TEtranscripts, texlive, TFEA, tfTarget, thermonucleotideBLAST, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, tree, treeCl, treemix, treePL, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, TrioCNV2, tRNAscan-SE, Trycycler, UBCG2, UCSC Kent utilities, ullar, ultra, ultraplex, UMAP, UMI-tools, umi-transfer, UMIScripts, Unicycler, UniRep, unitig-caller, unrar, usearch, VALET, valor, vamb, variabel, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcf2phylip, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, Vicuna, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, visidata, vispr, VizBin, vmatch, vscode, vsearch, vt, WASP, webin-cli, wget, wgs-assembler (Celera), WGSassign, What_the_Phage, wiggletools, windowmasker, wine, Winnowmap, Wise2 (Genewise), wombat, Xander_assembler, xpclr, yaha, yahs, yap

Details for atac-seq-pipeline (If the copy-pasted commands do not work, use this tool to remove unwanted characters)

Name:atac-seq-pipeline
Version:2.2.2
OS:Linux
About:This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.
Added:10/27/2018 9:50:44 AM
Updated:12/5/2023 12:19:43 PM
Link:https://github.com/ENCODE-DCC/atac-seq-pipeline
Notes:

Run the pipeline in "screen" persistant session. If you have run previous version caper and atac-seq-pipeline, or you run into problems with this pipeline, delete the .caper directory to reset caper ("rm -fr $HOME/.caper")

Instructions to run latest version

pip install caper --upgrade

mkdir /workdir/$USER
cd /workdir/$USER

git clone https://github.com/ENCODE-DCC/atac-seq-pipeline.git
cd atac-seq-pipeline

wget -O atac-seq-pipeline.sif https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.2.sif

2. Prepare reference genome database

1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.

2) If you work with other species, follow the instructions below to prepare the reference genome database.

  • Edit the script "/workdir/$USER/atac-seq-pipeline/scripts/build_genome_data.sh",  setting the values for GENOME, DEST_DIR, TSV, MITO_CHR_NAME.  The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. 
  • Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. 
cd /workdir/$USER/atac-seq-pipeline
cp /PATH/TO/your.genome.fa.gz ./
singularity run --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./scripts/build_genome_data.sh

3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline

4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline

  • You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")

 

6. run pipeline
You might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.

export PATH=~/.local/bin:$PATH
caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif --max-concurrent-tasks 4

 

7. summarize results

cd atac
ls -l
cd xxxxxxx         #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

 

 

Instructions to run v2.1.1 (for v2.1.1 and  v 1.10 see below)

Run the pipeline in "screen" persistant session.

1. Install caper version 2.1.3 in your home directory. Copy atac-seq-pipeline to the server you are working on, and set environment

pip install caper==2.1.3 croo qc2tsv --upgrade

mkdir /workdir/$USER
cd /workdir/$USER
cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER

 

2. Prepare reference genome database

1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.

2) If you work with other species, follow the instructions below to prepare the reference genome database.

  • Edit the script "/workdir/$USER/atac-seq-pipeline-2.1.1/build_genome_data_mod.sh",  setting the values for GENOME, GENOME DEST_DIR, REF_FA, MITO_CHR_NAME, the rest of the parameters are optional.  Make sure that you need to use the full path for file name and destination directory, for example, /workdir/$USER/atac-seq-pipeline-2.1.1/genomedb, /workdir/$USER/atac-seq-pipeline-2.1.1/mygenome.fasta.  The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. 
  • Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. 
cd /workdir/$USER/atac-seq-pipeline-2.1.1
cp /PATH/TO/your.genome.fa.gz ./
singularity exec --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./build_genome_data_mod.sh

 

3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-2.1.1

 

4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-2.1.1

  • You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")
     

5. Set number of cpu per task.
Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-2.1.1/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.

 

6. run pipeline

export PATH=~/.local/bin:$PATH
caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif

 

7. summarize results

cd atac
ls -l
cd xxxxxxx         #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

 

Run pipeline with example data files

Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session)

export PATH=~/.local/bin:$PATH
export ATACROOT=/workdir/$USER/atac-seq-pipeline-2.1.1 

mkdir /workdir/$USER
cd /workdir/$USER/
cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER

#download test data set
wget https://raw.githubusercontent.com/ENCODE-DCC/atac-seq-pipeline/master/example_input_json/ENCSR356KRQ_subsampled.json

#process test dataset
caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled.json --singularity $ATACROOT/atac-seq-pipeline.sif

If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html.

# After the work is finished, organize output results with croo

cd atac
ls -l 
cd xxxxxxx  #replace xxxxxxx with the run directory you get from "ls -l"
croo metadata.json
qc2tsv qc/qc.json  > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

 

Instructions to run v1.10.0

1. Copy software directory to the server you are working on, and set environment

export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
export version=1.10.0

mkdir /workdir/$USER
cd /workdir/$USER
cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER

 

2. Prepare reference genome database

1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.

2) If you work with other species, follow the instructions below to prepare the reference genome database.

  • Edit the script "/workdir/$USER/atac-seq-pipeline-1.10.0/build_genome_data_mod.sh",  setting the values for GENOME, GENOME DEST_DIR REF_FA MITO_CHR_NAME, the rest of the parameters are optional.  Make sure that the chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome. 
  • Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later. 
cd /workdir/$USER/atac-seq-pipeline-1.10.0
cp /PATH/TO/your.genome.fa.gz ./
singularity exec atac-seq-pipeline.sif ./build_genome_data.sh

 

3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-1.10.0

 

4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-1.10.0.

  • You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")
     

5. Set number of cpu per task.
Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-1.10.0/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.

 

6. run pipeline

caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif

 

7. summarize results

cd atac
ls -l
cd xxxxxxx         #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

 

Run pipeline with example data files

Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session)

export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
export version=1.10.0

mkdir /workdir/$USER
cd /workdir/$USER/
cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER

#download test data set
wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json

#process test dataset
export ATACROOT=/workdir/$USER/atac-seq-pipeline-${version}

caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled_caper.json --singularity $ATACROOT/atac-seq-pipeline.sif

If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html.

# After the work is finished, organize output results with croo

export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
cd atac
ls -l 
cd xxxxxxx  #replace xxxxxxx with the run directory you get from "ls -l"
croo metadata.json
qc2tsv qc/qc.json  > qc.tsv

The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv 

 


Notify me if this software is upgraded or changed [You need to be logged in to use this feature]

 

Website credentials: login  Web Accessibility Help