institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide

BioHPC Cloud:
: User Guide



Commonly Used Genome Databases on BioHPC Cloud Computers

The BioHPC lab computers keep copies of some of the commonly used reference genomes. Unless otherwise noted, the reference genomes are from the databases maintained by Illumina (including BWA, Bowtie1&2 index, as well as annotation files from UCSC, NCBI and Ensembl). The complete set of local reference genomes are in the directory /shared_data/genome_db. As the /shared_data directory is mounted on the network file server, make sure that you copy the files to /workdir before you use them.

To copy a directory from /shared_data to /workdir, use the “cp -r” command. For example, to use the human NCBI genome databases to run Tophat, you will need to copy Bowtie2Index, WholeGenomeFasta and the GTF files to workdir.

cp -r /shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2Index/  /workdir/myUserName/

cp -r /shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/ /workdir/myUserName/

cp /shared_data/genome_db/Homo_sapiens/NCBI/build37.2/Annotation/Genes/*gtf  /workdir/myUserName/

Available on the network file server (/shared_data/genome_db)

  • NCBI BLAST database (nt, nr and others - see important note about copying them below ****)

  • interproscan***

  • Arabidopsis_thaliana**

  • Caenorhabditis_elegans

  • Drosophila_melanogaster

  • Homo_sapiens

  • Mus_musculus

  • Saccharomyces_cerevisiae

  • Zea_mays**

  • apple

  • grape

  • Taeniopygia_guttata (zebrafinch)

** The databases maintained by Illumina do not always use gene names commonly accepted by the community. In our system, the Arabidopsis reference genome is from TAIR. The maize reference genome agpv2 is from, the maize reference genome agpv3 is from Plant Ensembl.

*** Interproscan needs to be unpacked before using. Go to your directory under /workdir and then execute "tar -xzf /shared_data/genome_db/interproscan-5.2-45.0-64-bit.tar.gz". Your copy of interproscan will be in subdierctory interproscan in the directory you executed the command from.

**** Small datatbases pdbaa, pdbnt and swissprot are distributed as masks to the nt and nr databases. Therefore if you need swissprot or pdbaa you also need to copy nr, if you need pdbnt you also need to copy nt. To copy any single database you need to execute command 'cp /shared_data/genome_db/BLAST_NCBI/NNN.* /workdir/myid/mydbdir' where NNN is the database name (e.g. nt, nr etc).



Website credentials: login  Web Accessibility Help