Bulk Segregation Data Analysis Pipeline
Setting up the pipeline
- Installing BWA and
Samtools on your computer. (Only
Linux versions of BWA and Samtools are available currently)
- Download the PERL script here.
- Modify the first section of the PERL script with a text editor, as instructed in
the comment lines.
Run pipeline:
- run the command line: perl pools_analysis.pl controlpool.fastq.gz
mutantpool.fastq.gz
- At the end of the run, three excel files will be generated: filtered1.xls,
filtered2.xls and filtered3.xls. These three files are filtered results in the
order from least stringent to most stringent. The format of the three Excel
files are documented in
http://samtools.sourceforge.net/samtools.shtml. You should start with
filtered3.xls. There should be a cluster of sites linked to the causative
mutation. As EMS mutagenesis would produce 4 to 8 SNP per Mb, with a resolution
of 2 Mb. you would normally see a cluster of 10 to 20 SNPs which has no
recombination between them.
- Annovar software can be
used to annotate the SNPs. But it is generally not needed. You can inspect the
mutations with IGV, most of the mutations are in intergenic region, only a small subset in the coding region.
- Some times EMS mutagenesis could produce large insertion/deletions,
filtered3.xls would show the mapped region, but no candidate causative mutation
could be found. Then you should inspect filtered2.xls, which include the deleted
regions.