institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide

BioHPC Cloud:
: User Guide


BioHPC Cloud Storage Guide


BioHPC Cloud storage is divided into two parts: local storage (/workdir or /SSD) and networked storage (home directories and storage group directories). This document considers networked storage.

Every user has access to networked storage through his/her home directory. Networked storage is available on all workstations and login nodes. Data can be transferred to and from networked storage without any reservation, all what you need is an active Lab user account. The best way to transfer data is to use scp or sftp protocol (common Windows client is FileZilla). For step by step explanation of data transfer please refer to "Access". It is also possible to use Globus Online to transfer data to/from home directory (or group directories), Globus at BioHPC Cloud and Using Globus to Share Data for details.

You can share your files with other BioHPC Cloud users by setting file/directory permissions, you can also share data with external users via Globus (Using Globus to Share Data). Sharing via Globus works with people that do not have BioHPC Cloud accounts.

Currently BioHPC Cloud storage system size is 945TB+233TB, i.e. 1.15PB. The storage is implemented as 4-server Gluster cluster (233TB) and 13-server Lustre cluster. There is a limited free storage available to each BioHPC user, its size depends on the user status (see "Quotas" below). Any user can purchase extra storage for 98.00 per TB per year- this is one of the lowest storage prices available anywhere. We can offer this low price since we buy the storage in big chunks and we only recover the cost as it is, i.e. hardware, computer room and maintenance costs.

BioHPC networked storage does not include backups. The backups are available as a separate service - details are available on this page.


Quotas are set according to the following algorithm.

  1. User DOES NOT have access to paid storage

    1. User is associated with an active Lab Credit Account. Home directory storage limit is 200 GB.

    2. User is associated with an active hosted hardware resource. Home directory storage limit is 200 GB.

    3. User is NOT associated with an active Lab Credit Account or hosted hardware. Home directory storage limit is 20 GB.

  2. User DOES have access to paid storage

    1. User purchased storage for home directory. Home directory storage limit is set by the user during storage purchase.

    2. User's home directory belongs to a storage group. Home directory storage limit is set by the group admin, up to maximum group storage quota.

    3. User has access to a storage group, but his/her home directory does not belong there. User can store data in the storage group directory up to maximum group quota, home directory storage quota is set as in the point 1 above.

Free storage quotas cannot be combined, added to purchased storage or used for multiple accounts. They are just to make sure users can carry out common  computations without purchasing extra storage. NOTE: free storage quotas are enforced, and cannot be changed by the user, to change the quota the user must purchase storage credits. Purchased storage quotas are informational only and called "warning threshold", see discussion below.

Managing and purchasing your storage

You can check your storage status on My Storage page, usage and quotas are updated daily at about 5am. If your storage is over the quota, or if your paid storage is about to expire you will be notified by e-mail.

You can also check your storage from the linux command line while logged into any BioHPC machine. While the traditional command "du" is too slow for networked directories with many files, an in-house command "lfs-du" (Large-File-System du) is much faster and can be used to see the size of any files or directories owned by members of your group. The command simply takes list of files or directories, and returns the total size of each argument. Unlike the My Storage page, where results are updated daily, the lfs-du command provides almost real-time results (may be several seconds delay for file system changes to be reflected in lfs-du results).

You can purchase the storage yourself by clicking on "Add or modify storage" button(s) on your "My Storage" page - you will have a separate button for each storage space owned. Typically each user owns his/her home directory, but you may want to create a separate storage space for sharing with other users (a storage group), if you are a member of such storage group you will also see it on "My Storage" page. You can purchase storage using Cornell Account or a credit card. Credit cards are processed by Campus Store, their transaction fee is 5%. If you would like to purchase the storage using an invoice or purchase order (PO) please contact us first.

Each paid storage has "warning threshold" associated with it. If it is exceeded a warning e-mail will be send to storage owner. You can change the warning threshold without buying any storage using "Add/Modify storage" page (from "My Storage") - just select 0 units to purchase and if you have any purchased storage left you can manipulate warning threshold value - of course it cannot be lowered below your actual storage used.

There are several ways to organize your networked storage, summarized below. Please note that the different ways can be combined (i.e. you can add storage to your home directory AND have access to additional storage group).

  1. Add storage to your home directory. You can add storage in 1 TB-year chunks (98.00 each), you can then decide your quota (e.g. add 2 x 1TB-years, set quota at 1 TB and your expiration date will be 2 years.).

  2. Create a storage group and move your home directory there. This option is especially attractive for research groups , all members of the group can share storage quota. Group PI needs to contact us to create the group first and to move all involved users home directories there, the the group can be managed by the PI (or designated person), users can be added or removed, and storage can be added/renewed same way as home directory storage.

  3. Create a storage group for group storage. Similarly as point 2 a group of users can share the storage group, except that their home directories stay as before. Group PI needs to contact us to create the group, the the group can be managed by the PI (or designated person), users can be added or removed, and storage can be added/renewed same way as home directory storage.

The storage can be only purchased in 1 TB-year chunks, it needs to be done up front, the expiration date will depend on your actual storage.

The system works like that: you can buy as many of the 1TB-year chunks as you want, the expiration date will be computed based on your current storage. You will be charged based on actual storage, each day an amount of used TB-years will be subtracted from your storage credits. For example if you have 2 TB of storage and you bought 1 TB-year storage credit, then each day 0.005479 TB-year will be subtracted from your storage credit and your estimated expiration date will be computed based on the remaining storage credits and amount of actual data you keep. If you remove some data and for example reduce it to 1.5 TB, then each day 0.004120 will be subtracted. See the explanation below for more details.

Warning threshold is for informational purposes only - you can set it at any level you want using storage purchase page, just choose "Warning threshold change" as your purchase type or select 0 TB-years to add. You will be notified when the storage grows above the warning threshold so you can take steps preventing you from paying for unwanted storage, but it will not block the growth, only warn you. After the warning e-mail is sent the warning threshold will be automatically adjusted to the closest level higher than the current storage.


Data Safety

Each storage array component of our storage cluster is either RAID6 or raidz2 (RAID6 equivalent in ZFS). Each file is localized, i.e. stored on one component server, therefore the total data safety is equivalent to a single RAID6/RAID7 storage array safety level. In practical terms it means that a simultaneous failure of two hard drives in each of the component server will NOT cause any data loss, and in fact will not even cause any data access disruption either. Periodical scans are carried out for find and correct bit rot.

Networked Lab storage does NOT include backups, it is user's responsibility to make sure critical or irreplaceable data is mirrored or backed up to another physical location - keeping two copies of the same data on the same networked storage is NOT a proper backup. The backups are available as a separate service - details are available on this page.



Website credentials: login  Web Accessibility Help