BioHPC Cloud Software
There are 1209 software titles installed in BioHPC Cloud. The sofware is available on all machines (unless stated otherwise in notes), complete list of programs is below, please click on a title to see details and instructions. Tabular list of software is available here
Please read details and instructions before running any program, it may contain important information on how to properly use the software in BioHPC Cloud.
3D Slicer, 3d-dna, 454 gsAssembler or gsMapper, a5, ABRicate, ABruijn, ABySS, AdapterRemoval, adephylo, Admixtools, Admixture, AF_unmasked, AFProfile, AGAT, agrep, albacore, Alder, AliTV-Perl interface, AlleleSeq, ALLMAPS, ALLPATHS-LG, Alphafold, Alphafold3, alphapickle, Alphapulldown, AlphScore, AMOS, AMPHORA, amplicon.py, AMRFinder, analysis, ANGSD, AnnotaPipeline, Annovar, ant, antiSMASH, anvio, apollo, arcs, ARGweaver, aria2, ariba, Arlequin, ART, ASEQ, aspera, assembly-stats, ASTRAL, atac-seq-pipeline, ataqv, athena_meta, ATLAS, Atlas-Link, ATLAS_GapFill, atom, ATSAS, Augustus, AWS command line interface, AWS v2 Command Line Interface, axe, axel, BA3, BactSNP, bakta, bamsnap, bamsurgeon, bamtools, bamUtil, barcode_splitter, BarNone, Basset, BayeScan, Bayescenv, bayesR, baypass, bazel, BBMap/BBTools, BCFtools, BCL convert, bcl2fastq, BCP, bdbag, Beagle, beagle-lib, BEAST, BEAST X, Beast2, bed2diffs, bedops, BEDtools, bettercallsal, bfc, bgc, bgen, bicycle, BiG-SCAPE, bigQF, bigtools, bigWig, bioawk, biobakery, biobambam, Bioconductor, biom-format, BioPerl, BioPython, Birdsuite, biscuit, Bismark, Blackbird, blasr, BLAST, BLAST_to_BED, blast2go, BLAT, BlobToolKit, BLUPF90, BMGE, bmtagger, bonito, Boost, Bowtie, Bowtie2, BPGA, Bracken, BRAKER, BRAT-NextGen, BRBseqTools, BreedingSchemeLanguage, breseq, brocc, BSBolt, bsmap, BSseeker2, btyper3, BUSCO, BUSCO Phylogenomics, BWA, bwa-mem2, bwa-meth, bwtool, cactus, CAFE, CAFE5, caffe, cagee, canu, Canvas, CAP3, caper, CarveMe, catch, cBar, CBSU RNAseq, CCMetagen, CCTpack, cd-hit, cdbfasta, cdo, CEGMA, CellRanger, cellranger-arc, cellranger-atac, cellranger-dna, centrifuge, centroFlye, CFM-ID, CFSAN SNP pipeline, CheckM, CheckM2, chimera, ChimeraTE, chimerax, chip-seq-pipeline, chromosomer, Circlator, Circos, Circuitscape, CITE-seq-Count, ClermonTyping, clues, CLUMPP, clust, Clustal Omega, CLUSTALW, Cluster, cmake, CMSeq, CNVnator, coinfinder, colabfold, CombFold, Comparative-Annotation-Toolkit, compat, CONCOCT, Conda, Cooler, copyNumberDiff, cortex_var, CoverM, crabs, CRISPRCasFinder, CRISPResso, crispron, Cromwell, CrossMap, CRT, cuda, Cufflinks, curatedMetagenomicDataTerminal, cutadapt, cuteSV, Cytoscape, dadi, dadi-1.6.3_modif, dadi-cli, danpos, DAS_Tool, dashing, DBSCAN-SWA, dDocent, DeconSeq, Deepbinner, deeplasmid, DeepTE, deepTools, Deepvariant, defusion, delly, DESMAN, destruct, DETONATE, dfast, diamond, dipcall, diploSHIC, discoal, Discovar, Discovar de novo, distruct, DiTASiC, DIYABC, dnmtools, Docker, dorado, DRAM, dREG, dREG.HD, drep, Drop-seq, dropEst, dropSeqPipe, dsk, dssat, Dsuite, dTOX, duphold, DWGSIM, dynare, ea-utils, earlgrey, ecCodes, ecopcr, ecoPrimers, ectyper, EDGE, edirect, EDTA, eems, EgaCryptor, EGAD, eggnog-mapper, EIGENSOFT, elai, ElMaven, EMBLmyGFF3, EMBOSS, EMIRGE, Empress, enfuse, EnTAP, entropy, epa-ng, ephem, epic2, ermineJ, ete3, EukDetect, EukRep, EVE, EVM, exabayes, exonerate, ExpansionHunterDenovo-v0.8.0, eXpress, FALCON, FALCON_unzip, Fast-GBS, fasta, FastAAI, FastANI, fastcluster, fastGEAR, FastME, FastML, fastp, FastQ Screen, fastq-multx-1.4.3, fastq_demux, fastq_pair, fastq_species_detector, FastQC, fastqsplitter, fastsimcoal2, fastspar, fastStructure, FastTree, FASTX, fcs, feems, feh, FFmpeg, fgbio, ficle, figaro, Fiji, Filtlong, fineRADstructure, fineSTRUCTURE, FIt-SNE, FlaGs2, flash, flash2, flexbar, Flexible Adapter Remover, Flye, FMAP, FragGeneScan, FragGeneScan, FRANz, freebayes, FSA, funannotate, FunGene Pipeline, FunOMIC, G-PhoCS, GADMA, GAEMR, Galaxy, Galaxy in Docker, GATK, gatk4, gatk4amplicon.py, gblastn, Gblocks, GBRS, gcc, GCTA, GDAL, gdc-client, GEM library, GEMMA, GeMoMa, GENECONV, geneid, GeneMark, GeneRax, Genespace, genomad, Genome STRiP, Genome Workbench, GenomeMapper, Genomescope, GenomeThreader, genometools, GenomicConsensus, genozip, gensim, GEOS, germline, gerp++, GET_PHYLOMARKERS, gfaviz, GffCompare, gffread, giggle, git, glactools, GlimmerHMM, GLIMPSE, GLnexus, Globus connect personal, GMAP/GSNAP, gmx_MMPBSA, GNU Compilers, GNU parallel, go-perl, GO2MSIG, GONE, GoShifter, gradle, graftM, grammy, GraPhlAn, graphtyper, graphviz, greenhill, GRiD, gridss, Grinder, grocsvs, GROMACS, GroopM, GSEA, gsort, GTDB-Tk, GTFtools, Gubbins, gunc, GUPPY, gvcftools, hail, hal, HapCompass, HAPCUT, HAPCUT2, hapflk, HaploMerger, Haplomerger2, haplostrips, HaploSync, HapSeq2, harpy, HarvestTools, haslr, hdf5, helixer, hget, hh-suite, HiC-Pro, hic_qc, HiCExplorer, HiFiAdapterFilt, hifiasm, hificnv, HISAT2, HMMER, Homer, HOTSPOT, HTSeq, htslib, https://github.com/CVUA-RRW/RRW-PrimerBLAST, hugin, humann, HUMAnN2, hybpiper, HyLiTE, Hyper-Gen, hyperopt, HyPhy, hyphy-analyses, iAssembler, IBDLD, IBDNe, IBDseq, idba, IDBA-UD, idemux, IDP-denovo, idr, idseq, IgBLAST, IGoR, IGV, IMa2, IMa2p, IMAGE, ImageJ, ImageMagick, Immcantation, impute2, impute5, IMSA-A, INDELseek, infernal, Infomap, inspector, inStrain, inStrain_lite, InStruct, Intel MKL, InteMAP, InterProScan, ipyrad, IQ-TREE, iRep, isoseq, JaBbA, jags, Jane, java, jbrowse, JCVI, jellyfish, jsalignon/cactus, juicer, julia, jupyter, jupyterlab, kaiju, kallisto, Kent Utilities, keras, khmer, kinfin, king, kma, KMC, KmerFinder, KmerGenie, kneaddata, kraken, KrakenTools, KronaTools, kSNP, kWIP, LACHESIS, lammps, LAPACK, lapels, LAST, lastz, lcMLkin, LDAK, LDBlockShow, LDhat, LeafCutter, leeHom, lep-anchor, Lep-MAP3, LEVIATHAN, lftp, Liftoff, lifton, Lighter, LinkedSV, LINKS, localcolabfold, LocARNA, LocusZoom, lofreq, longranger, Loupe, LS-GKM, LTR_retriever, LUCY, LUCY2, LUMPY, lyve-SET, m6anet, Macaulay2, MACE, MACS, MaCS simulator, MACS2, macs3, maffilter, MAFFT, mafTools, MAGeCK, MAGeCK-VISPR, Magic-BLAST, magick, MAGScoT, MAKER, manta, mapDamage, mapquik, MAQ, MARS, MASH, mashtree, Mashtree, MaSuRCA, MATLAB, Matlab_runtime, Mauve, MaxBin, MaxQuant, McClintock, mccortex, mcl, MCscan, MCScanX, mdust, medaka, medusa, megahit, MeGAMerge, MEGAN, MELT, MEME Suite, MERLIN, merqury, MetaBAT, MetaBinner, MetaboAnalystR, MetaCache, MetaCRAST, metaCRISPR, metamaps, MetAMOS, MetaPathways, MetaPhlAn, metapop, metaron, MetaVelvet, MetaVelvet-SL, metaWRAP, methpipe, mfeprimer, MGmapper, MicrobeAnnotator, microtrait, MIDAS, MiFish, Migrate-n, mikado, MinCED, minigraph, Minimac3, Minimac4, minimap2, miniprot, mira, miRDeep2, mirge3, miRquant, MISO, MITE-Hunter, MITObim, MitoFinder, mitohelper, MitoHiFi, mity, MiXCR, MixMapper, MKTest, mlift, mlst, MMAP, MMSEQ, MMseqs2, MMTK, MobileElementFinder, modeltest, MODIStsp-2.0.5, module, moments, momi, MoMI-G, mongo, mono, monocle3, mosdepth, mothur, MrBayes, mrcanavar, mrsFAST, msdial, msld, MSMC, msprime, MSR-CA Genome Assembler, msstats, MSTMap, mugsy, MultiQC, multiz-tba, MUMandCo, MUMmer, mummer2circos, muscle, MUSIC, Mutation-Simulator, muTect, myte, MZmine, nag-compiler, namfinder, nanocompore, nanofilt, NanoPlot, Nanopolish, nanovar, ncbi_datasets, ncftp, ncl, NECAT, Nemo, Netbeans, NEURON, new_fugue, Nextflow, NextGenMap, NextPolish2, nf-core/rnaseq, ngmlr, NGS_data_processing, NGSadmix, ngsDist, ngsF, ngsLD, NGSNGS, NgsRelate, ngsTools, NGSUtils, NINJA, NLR-Annotator, NLR-Parser, NLRtracker, Novoalign, NovoalignCS, nQuire, NRSA, NuDup, numactl, nvidia-docker, nvtop, Oases, OBITools, Octave, OMA, Oneflux, OpenBLAS, openmpi, openslide, openssl, ORFeus, orthodb-clades, OrthoFinder, orthologr, Orthomcl, pacbio, PacBioTestData, PAGIT, pairtools, pal2nal, paleomix, PAML, panacus, panaroo, pandas, pandaseq, pandoc, pangene, PanPhlAn, Panseq, pantools, Parsnp, PASA, PASTEC, PAUP*, pauvre, pb-assembly, pbalign, pbbam, pbh5tools, PBJelly, pblat, pbmm2, PBSuite, pbsv, pbtk, PCAngsd, pcre, pcre2, PeakRanger, PeakSplitter, PEAR, PEER, PennCNV, peppro, PERL, PfamScan, pgap, PGDSpider, ph5tools, Phage_Finder, pharokka, phasedibd, PHAST, phenopath, Phobius, PHRAPL, PHYLIP, PhyloCSF, phyloFlash, phylophlan*, PhyloPhlAn2, phylophlan3, phyluce, PhyML, phyx, Picard, PICRUSt2, pigz, Pilon, Pindel, piPipes, PIQ, pixy, PlasFlow, platanus, Platypus, plink, plink2, Plotly, plotsr, plumed, pocp, Point Cloud Library, popbam, PopCOGenT, PopLDdecay, Porechop, poretools, portcullis, POUTINE, pplacer, PRANK, preseq, pretext-suite, primalscheme, primer3, PrimerBLAST, PrimerPooler, prinseq, prodigal, progenomics, progressiveCactus, PROJ, prokka, Proseq2, ProtExcluder, protolite, PSASS, psmc, psutil, pullseq, purge_dups, pyani, PyCogent, pycoQC, pyfaidx, pyGenomeTracks, PyMC, pymol-open-source, pyopencl, pypy, pyRAD, pyrho, Pyro4, pyseer, PySnpTools, python, PyTorch, PyVCF, qapa, qcat, QIIME, QIIME2, QTCAT, Quake, Qualimap, QuantiSNP2, QUAST, quickmerge, QUMA, QuPath, R, RACA, racon, rad_haplotyper, RADIS, RadSex, RagTag, rapt, RAPTR-SV, RATT, raven, RAxML, raxml-ng, Ray, rck, rclone, Rcorrector, RDP Classifier, REAGO, REAPR, Rebaler, reCOGnizer, Red, ReferenceSeeker, regenie, regtools, Relate, RelocaTE2, Repbase, RepeatMasker, RepeatModeler, RERconverge, ReSeq, resistify, RevBayes, RFdiffusion, RFMix, RGAAT, rgdal, RGI, Rgtsvm, Ribotaper, ripgrep, rJava, rMATS, RNAMMER, rnaQUAST, Rnightlights, roadies, Roary, Rockhopper, rohan, RoseTTAFold-All-Atom, RoseTTAFold2NA, rphast, Rqtl, Rqtl2, RSAT, RSEM, RSeQC, RStudio, rtfbs_db, ruby, run_dbcan, sabre, SaguaroGW, salmon, SALSA, Sambamba, samblaster, sample, SampleTracker, samplot, samtabix, Samtools, Satsuma, Satsuma2, SCALE, scanorama, SCE-VCF, scikit-learn, Scoary, scoary-2, scTE, scythe, seaborn, SEACR, SecretomeP, segul, self-assembling-manifold, selscan, seqfu, seqkit, SeqPrep, seqtk, SequelTools, sequenceTubeMap, Seurat, sf, sgrep, sgrep sorted_grep, SHAPEIT, SHAPEIT4, SHAPEIT5, shasta, Shiny, shoelaces, shore, SHOREmap, shortBRED, SHRiMP, sickle, sift4g, SignalP, SimPhy, simuPOP, sina, SINGER, singularity, sinto, sirius, sistr_cmd, skani, SKESA, skewer, SLiM, SLURM, smap, smash, smcpp, smoove, SMRT Analysis, SMRT LINK, snakemake, snap, SnapATAC, snapatac2, SNAPP, SnapTools, snATAC, SNeP, Sniffles, snippy, snp-sites, snpArcher, SnpEff, SNPgenie, SNPhylo, SNPsplit, SNVPhyl, SOAP2, SOAPdenovo, SOAPdenovo-Trans, SOAPdenovo2, SoloTE, SomaticSniper, songbird, sorted_grep, spaceranger, SPAdes, SPALN, SparCC, sparsehash, SPARTA, speedseq, split-fasta, SQANTI3, sqlite, SqueezeMeta, SQuIRE, SRA Toolkit, srst2, ssantichaivekin/empress, stacks, Stacks 2, stairway-plot, stampy, STAR, staramr, Starcode, statmodels, stellarscope, STITCH, STPGA, StrainPhlAn, strawberry, Strelka, stringMLST, StringTie, STRUCTURE, Structure_threader, Struo2, stylegan2-ada-pytorch, subread, sumatra, supernova, suppa, SURPI, surpyvor, SURVIVOR, sutta, SV-plaudit, SVaBA, SVclone, SVDetect, svengine, SVseq2, svtools, svtyper, svviz2, SWAMP, sweed, SweepFinder, SweepFinder2, sweepsims, swiss2fasta.py, sword, syri, tabix, tagdust, Taiji, tama, Tandem Repeats Finder (TRF), tardis, TargetP, TASSEL 3, TASSEL 4, TASSEL 5, tax_myPHAGE, tbl2asn, tcoffee, TE-Aid, telescope, TELR, TensorFlow, TEToolkit, TEtranscripts, texlive, TFEA, tfTarget, thermonucleotideBLAST, ThermoRawFileParser, TMHMM, tmux, Tomahawk, TopHat, Torch, traitRate, Trans-Proteomic Pipeline (TPP), TransComb, TransDecoder, TRANSIT, transrate, TRAP, tree, treeCl, treemix, treePL, Trim Galore!, trimal, trimmomatic, Trinity, Trinotate, TrioCNV2, tRNAscan-SE, Trycycler, UBCG2, UCSC Kent utilities, ullar, ultra, ultraplex, UMAP, UMI-tools, umi-transfer, UMIScripts, Unicycler, UniRep, unitig-caller, unrar, usearch, VALET, valor, vamb, variabel, Variant Effect Predictor, VarScan, VCF-kit, vcf2diploid, vcf2phylip, vcfCooker, vcflib, vcftools, vdjtools, Velvet, vep, VESPA, vg, Vicuna, ViennaRNA, VIP, viral-ngs, virmap, VirSorter, VirusDetect, VirusFinder 2, visidata, vispr, VizBin, vmatch, vscode, vsearch, vt, WASP, webin-cli, wget, wgs-assembler (Celera), WGSassign, What_the_Phage, wiggletools, windowmasker, wine, Winnowmap, Wise2 (Genewise), wombat, Xander_assembler, xpclr, yaha, yahs, yap
Details for atac-seq-pipeline (If the copy-pasted commands do not work, use this tool to remove unwanted characters)
Name: | atac-seq-pipeline |
Version: | 2.2.2 |
OS: | Linux |
About: | This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data. |
Added: | 10/27/2018 9:50:44 AM |
Updated: | 12/5/2023 12:19:43 PM |
Link: | https://github.com/ENCODE-DCC/atac-seq-pipeline |
Notes: | Run the pipeline in "screen" persistant session. If you have run previous version caper and atac-seq-pipeline, or you run into problems with this pipeline, delete the .caper directory to reset caper ("rm -fr $HOME/.caper")
Instructions to run latest version
pip install caper --upgrade
mkdir /workdir/$USER
cd /workdir/$USER
git clone https://github.com/ENCODE-DCC/atac-seq-pipeline.git
cd atac-seq-pipeline
wget -O atac-seq-pipeline.sif https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.2.sif
2. Prepare reference genome database
1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.
2) If you work with other species, follow the instructions below to prepare the reference genome database.
- Edit the script "/workdir/$USER/atac-seq-pipeline/scripts/build_genome_data.sh", setting the values for GENOME, DEST_DIR, TSV, MITO_CHR_NAME. The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome.
- Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later.
cd /workdir/$USER/atac-seq-pipeline
cp /PATH/TO/your.genome.fa.gz ./
singularity run --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./scripts/build_genome_data.sh
3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline
4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline
- You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")
6. run pipeline
You might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.
export PATH=~/.local/bin:$PATH
caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif --max-concurrent-tasks 4
7. summarize results
cd atac
ls -l
cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv
The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv
Instructions to run v2.1.1 (for v2.1.1 and v 1.10 see below)
Run the pipeline in "screen" persistant session.
1. Install caper version 2.1.3 in your home directory. Copy atac-seq-pipeline to the server you are working on, and set environment
pip install caper==2.1.3 croo qc2tsv --upgrade
mkdir /workdir/$USER
cd /workdir/$USER
cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER
2. Prepare reference genome database
1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.
2) If you work with other species, follow the instructions below to prepare the reference genome database.
- Edit the script "/workdir/$USER/atac-seq-pipeline-2.1.1/build_genome_data_mod.sh", setting the values for GENOME, GENOME DEST_DIR, REF_FA, MITO_CHR_NAME, the rest of the parameters are optional. Make sure that you need to use the full path for file name and destination directory, for example, /workdir/$USER/atac-seq-pipeline-2.1.1/genomedb, /workdir/$USER/atac-seq-pipeline-2.1.1/mygenome.fasta. The chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome.
- Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later.
cd /workdir/$USER/atac-seq-pipeline-2.1.1
cp /PATH/TO/your.genome.fa.gz ./
singularity exec --bind $PWD --pwd $PWD atac-seq-pipeline.sif ./build_genome_data_mod.sh
3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-2.1.1
4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-2.1.1
- You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")
5. Set number of cpu per task.
Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-2.1.1/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.
6. run pipeline
export PATH=~/.local/bin:$PATH
caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif
7. summarize results
cd atac
ls -l
cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv
The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv
Run pipeline with example data files
Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session)
export PATH=~/.local/bin:$PATH
export ATACROOT=/workdir/$USER/atac-seq-pipeline-2.1.1
mkdir /workdir/$USER
cd /workdir/$USER/
cp -r /programs/atac-seq-pipeline-2.1.1 /workdir/$USER
#download test data set
wget https://raw.githubusercontent.com/ENCODE-DCC/atac-seq-pipeline/master/example_input_json/ENCSR356KRQ_subsampled.json
#process test dataset
caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled.json --singularity $ATACROOT/atac-seq-pipeline.sif
If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html.
# After the work is finished, organize output results with croo
cd atac
ls -l
cd xxxxxxx #replace xxxxxxx with the run directory you get from "ls -l"
croo metadata.json
qc2tsv qc/qc.json > qc.tsv
The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv
Instructions to run v1.10.0
1. Copy software directory to the server you are working on, and set environment
export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
export version=1.10.0
mkdir /workdir/$USER
cd /workdir/$USER
cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER
2. Prepare reference genome database
1) If you work with human and mouse data, the Encode project provides pre-prepared genome database. Go to this page , under the section for "Reference genome", you will find URL of the reference genome database. For example, "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv". You will need this URL later.
2) If you work with other species, follow the instructions below to prepare the reference genome database.
- Edit the script "/workdir/$USER/atac-seq-pipeline-1.10.0/build_genome_data_mod.sh", setting the values for GENOME, GENOME DEST_DIR REF_FA MITO_CHR_NAME, the rest of the parameters are optional. Make sure that the chromsome names match between the genome files. For plant genome, you might want to create a new genome fasta file with mitochondria and chloroplast merged, and call it a mitochandria genome.
- Run these commands to create genome database. After it is done, you should see a directory /workdir/$USER/$DEST_DIR with a .tsv file inside. You will need this .tsv file later.
cd /workdir/$USER/atac-seq-pipeline-1.10.0
cp /PATH/TO/your.genome.fa.gz ./
singularity exec atac-seq-pipeline.sif ./build_genome_data.sh
3. Put your atac-seq data files ( *.fastq.gz) into the directory /workdir/$USER/atac-seq-pipeline-1.10.0
4. Prepare a .json text file to specify all input files, and keep it in /workdir/$USER/atac-seq-pipeline-1.10.0.
- You can modify from this example file (for local files, replace URL with file or directory name). Detailed documentation of the json file can be found in this page (The section under "Input JSON file")
5. Set number of cpu per task.
Optionally, you can modify the file /workdir/$USER/atac-seq-pipeline-1.10.0/atac.wdl, and change the number of cpu per task (under "group: resource_parameter"). In most cases, there is no need to change (for example, default setting for aligner bowtie2 is 6 cores per job which is good). However, you might want to restrict number of simultaneously jobs when running the caper command (e.g. limit to up to 4 jobs, --max-concurrent-tasks 4), otherwise, all available cores on the server will be used.
6. run pipeline
caper run atac.wdl -i my.json --singularity atac-seq-pipeline.sif
7. summarize results
cd atac
ls -l
cd xxxxxxx #replace xxxxxxx with the run directory you get from ls -l
croo metadata.json
qc2tsv qc/qc.json > qc.tsv
The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv
Run pipeline with example data files
Here is how to run test data set provided by the pipeline developer (run software in "screen" persistent session)
export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
export version=1.10.0
mkdir /workdir/$USER
cd /workdir/$USER/
cp -r /programs/atac-seq-pipeline-${version} /workdir/$USER
#download test data set
wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json
#process test dataset
export ATACROOT=/workdir/$USER/atac-seq-pipeline-${version}
caper run $ATACROOT/atac.wdl -i ENCSR356KRQ_subsampled_caper.json --singularity $ATACROOT/atac-seq-pipeline.sif
If no --output directory specified, the output directory is atac. The result files under atac, in execution directory, the files are documented in https://encode-dcc.github.io/wdl-pipelines/output_atac.html.
# After the work is finished, organize output results with croo
export PYTHONPATH=/programs/caper/lib/python3.6/site-packages:/programs/caper/lib64/python3.6/site-packages
export PATH=/programs/caper/bin:$PATH
cd atac
ls -l
cd xxxxxxx #replace xxxxxxx with the run directory you get from "ls -l"
croo metadata.json
qc2tsv qc/qc.json > qc.tsv
The results should be in the directories "peak" "qc" and "signal" , report files croo*, and QC table qc.tsv
|
Notify me if this software is upgraded or changed [You need to be logged in to use this feature]