institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc cloud: user guide
 

BioHPC Cloud:
: User Guide

 


BioHPC Cloud Storage Guide

Overview

BioHPC Cloud storage is divided into two parts: local storage (/workdir or /SSD) and networked storage (home directories and storage group directories). This document considers networked storage.

Every user has access to networked storage through his/her home directory. Networked storage is available on all workstations and login nodes. Data can be transferred to and from networked storage without any reservation, all that you need is an active BioHPC user account. The best way to transfer data is to use scp or sftp protocol (common Windows client is FileZilla). For step-by-step explanation of data transfer, please refer to "Access".

You can share your files with other BioHPC Cloud users by setting file/directory permissions. For external users (without a BioHPC account), you can share data via Globus (Using Globus to Share Data), or you can create a temporary guest account.

Currently BioHPC Cloud storage system size is 945TB+233TB, i.e. 1.15PB. The storage is implemented as 4-server Gluster cluster (233TB) and 13-server Lustre cluster. There is a limited free storage available to each BioHPC user, its size depends on the user status (see "Free storage" below). Any user can purchase extra storage for $98.00 per TB per year- this is one of the lowest storage prices available anywhere. We can offer this low price since we buy the storage in big chunks and we only recover the cost as it is, i.e. hardware, computer room, and maintenance costs.

BioHPC networked storage does not include backups. We strongly encourage all users to develop and implement a backup plan. BioHPC does provide backups as a separate service - details are available on this page .


Free storage

All users are granted a modest amount of free storage for their home directories, to ensure that they can perform basic operations without needing to purchase storage. The amount of free storage is determined as follows:

  1. Users associated with active BioHPC credit accounts, hosted servers, or who have purchased additional paid storage receive 200Gb of free storage for their home directory.
  2. All other users receive 20Gb free storage.

Purchasing storage

If you need more storage than the free allocation, you can purchase storage credits.

  • Storage rates and charging

    Storage may be purchased on the My Storage page using Cornell Account, Credit Card, or with a Purchase Order (for pre-authorized users only). Storage credits are purchased in units of 'Terabyte-years', at a cost of $98.00 per Tb-year (sold in whole units only). At BioHPC, you only pay for storage you actually use. For example, 1 Tb-year of storage will puchase 1 Tb storage for one year, or 2 Tb for half a year, or 0.5 Tb for 2 years. Your storage credit balance is updated every day based on a daily snapshot of your actual usage.

    If you run out of storage credits, you will be informed by email and asked to purchase additional credits. You will continue to accrue a negative balance, unless your storage usage falls below your free storage allocation. If you do not address a negative balance promptly, your account will be locked, and eventually your data will be deleted.

  • Purchased storage directory

    There are two options to consider when you purchase storage:

    1. Purchase storage for a home directory: This can be done by navigating to My Storage Page, and clicking on 'Add or modify home directory storage'. With this option, you will only be charged for usage in your home directory that exceeds your free storage allocation.
    2. Purchase storage for a shared directory: This is a good option for shared storage space within a lab, or for a group project. To get started with this option, you need to contact BioHPC and request the creation of a storage group. You will need to chose a group name and a group owner. If the name of the group is abc123Lab, then the storage directory will be /home/abc123Lab, and the group owner will be able to add or remove group members by navigating to the My Groups page. All group members will see the directory listed on the My Storage Page and can purchase storage credits for it, using the link 'Add or modify abc123Lab storage'.

      Home directories in shared directories: It can be convenient to move the home directories of users to a shared directory with purchased storage. This way, only one storage purchase is necessary to account for the storage space used by multiple users. Additionally, it can keep home directories of group members organized in a single location.

      • Group owners can move a group member's home to group storage by going to My Groups, clicking 'Group users', and then there is a link for each user to move (or remove) their home to group storage.
      • Home directories are moved by creating a symbolic link from /home/username to /home/abc123Lab/username. In this way, the move is fairly invisible to the user, and the user can access their home at either of these paths.
      • When a user's home is moved to paid group storage, they still receive up to 200Gb free storage for their home directory. The size of the user's home, up to the user's free storage allocation, is subtracted from total group storage usage each day before the group storage credit balance is updated.
      • Note: it is not possible to move a home directory to group storage if the home directory contains purchased storage. In this case, contact BioHPC staff to help combine the storage accounts first.

      Permissions in shared directories: See this document for instructions on giving all group members read permission within the shared directory.

Quotas

Quotas are set for home directories and paid storage directories. All quotas are soft: this means that you are not prevented from writing to the filesystem when the quota is exceeded.

  • For unpaid home directories, your quota is either 200Gb or 20Gb (see 'Free storage' above). If you exceed the quota, you will receive frequent email notifications and are expected to address the issue promptly. If you do not remove excess storage or purchase storage credits, after some time your account will be locked, and eventually your data will be deleted.
  • For paid storage, you can set your own quota. These quotas are informational only. When you exceed the quota, you will receive a notification by email, and then your quota will automatically be increased. In this case, the quota can be thought of as a 'warning threshold', it is designed to help keep you aware of how much storage you are paying for.
    • To change your warning threshold, go to My Storage page and click the 'Add or modify storage' button under the appropriate directory. You can choose to purchase 0 units and change the Warning Threshold only. The threshold must be higher than the amount of storage you currently have.
    • By default, quota warning emails for group storage are sent to all group members. To modify this, conatct BioHPC staff.

Checking storage usage

You can check your storage usage and storage credit balance (for purchased storage) on the My Storage page, balances are updated once daily. For purchased storage, you will also see an 'expiration date': this is just a calculation of when you will run out of credits if your storage usage does not change.

You can also check your storage from the linux command line while logged into any BioHPC machine. While the traditional command "du" may be too slow for networked directories with many files, an in-house command "lfs-du" (Large-File-System du) is much faster and can be used to see the size of any files or directories owned by members of your group. The command simply takes list of files or directories, and returns the total size of each argument. Unlike the My Storage page, where results are updated daily, the lfs-du command provides almost real-time results (may be several seconds delay for file system changes to be reflected in lfs-du results).

Data Safety

Networked Lab storage does NOT include backups, there is only one copy stored of each file. It is each user's responsibility to make sure critical or irreplaceable data is mirrored or backed up to another physical location - keeping two copies of the same data on the same networked storage is NOT a proper backup. The backups are available as a separate service - details are available on this page.

Each storage array component of our storage cluster is either RAID6 or raidz2 (RAID6 equivalent in ZFS). Each file is localized, i.e. stored on one component server, therefore the total data safety for a file is equivalent to a single RAID6/RAID7 storage array safety level. In practical terms it means that a simultaneous failure of two hard drives in each of the component servers will NOT cause any data loss, and in fact will not even cause any data access disruption either. The health of disks are monitored constantly by BioHPC staff, and periodical scans are carried out to find and correct bit rot.

While this arrangement sounds safe, we cannot guarantee the safety of your data. We have not yet experienced a third disk failure leading to data loss on a lustre server, but the probability of it occurring is not entirely negligible. However, a much more common scenario of data loss is that a user deletes their own data by mistake. We therefore strongly recommend implementing a backup policy for important data.

 

 

Website credentials: login  Web Accessibility Help