Commonly Used Genome Databases on
BioHPC Cloud Computers
The BioHPC
lab computers keep copies of some of the commonly used
reference genomes. Unless otherwise noted, the reference
genomes are from the databases maintained by Illumina
http://tophat.cbcb.umd.edu/igenomes.shtml (including
BWA, Bowtie1&2 index, as well as annotation files from UCSC,
NCBI and Ensembl). The complete set of local reference
genomes are in the directory /shared_data/genome_db. As the
/shared_data directory is mounted on the network file
server, make sure that you copy the files to /workdir before
you use them.
To copy a directory from /shared_data to
/workdir, use the “cp -r” command. For example, to use the
human NCBI genome databases to run Tophat, you will need to
copy Bowtie2Index, WholeGenomeFasta and the GTF files to
workdir.
cp -r
/shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2Index/
/workdir/myUserName/
cp -r
/shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/
/workdir/myUserName/
cp
/shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Annotation/Genes/*gtf
/workdir/myUserName/
Available on the network file
server (/shared_data/genome_db)
-
NCBI BLAST database (nt, nr and
others - see important note about copying them below
****)
-
interproscan***
-
Arabidopsis_thaliana**
-
Caenorhabditis_elegans
-
Drosophila_melanogaster
-
Homo_sapiens
-
Mus_musculus
-
Saccharomyces_cerevisiae
-
Zea_mays**
-
apple
-
grape
-
Taeniopygia_guttata (zebrafinch)
** The databases maintained by Illumina do
not always use gene names commonly accepted by the
community. In our system, the Arabidopsis reference genome
is from TAIR. The maize reference genome agpv2 is from
maizesequence.org, the maize reference genome agpv3 is from
Plant Ensembl.
*** Interproscan needs to be unpacked
before using. Go to your directory under /workdir and then
execute "tar -xzf
/shared_data/genome_db/interproscan-5.2-45.0-64-bit.tar.gz". Your copy
of interproscan will be in subdierctory interproscan in the
directory you executed the command from.
**** Small datatbases pdbaa, pdbnt and swissprot are
distributed as masks to the nt and nr databases.
Therefore if you need swissprot or
pdbaa you also need to copy nr, if
you need pdbnt you also need to copy
nt. To copy any single database you need to
execute command 'cp /shared_data/genome_db/BLAST_NCBI/NNN.*
/workdir/myid/mydbdir' where NNN is the database name (e.g.
nt, nr etc).