institute of biotechnology >> brc >> bioinformatics >> internal >> biohpc lab: user guide
 

BioHPC Lab:
User Guide

 


BioHPC Lab Storage Guide

Overview

BioHPC Lab storage is divided into two parts: local storage (/workdir or /SSD) and networked storage (home directories and storage group directories). This document considers networked storage.

Every user has access to networked storage through his/her home directory. Networked storage is available on all workstations and login nodes. Data can be transferred to and from networked storage without any reservation, all what you need is an active Lab user account. The best way to transfer data is to use scp or sftp protocol (common Windows client is FileZilla). For step by step explanation of data transfer please refer to "Access". It is also possible to use Globus Online to transfer data to/from home directory (or group directories), Globus at BioHPC Lab and Using Globus to Share Data for details.

You can share your files with other BioHPC Lab users by setting file/directory permissions, you can also share data with external users via Globus (Using Globus to Share Data). Sharing via Globus works with people that do not have BioHPC Lab accounts.

Currently BioHPC Lab storage system size is 700TB+313TB, i.e. 1PB. The storage is implemented as 5-server Gluster cluster (313TB) and 12-server Lustre cluster. There is a limited free storage available to each Lab user, its size depends on the user status (see "Quotas" below). Any user can purchase extra storage for 91.35 per TB per year- this is one of the lowest storage prices available anywhere. We can offer this low price since we buy the storage in big chunks and we only recover the cost as it is, i.e. hardware, computer room and maintenance costs.

Lab networked storage does not include backups. The backups are available as a separate service - details are available on this page.

Quotas

Quotas are set according to the following algorithm.

  1. User DOES NOT have access to paid storage

    1. User is associated with an active Lab Credit Account. Home directory storage limit is 200 GB.

    2. User is associated with an active hosted hardware resource. Home directory storage limit is 200 GB.

    3. User is NOT associated with an active Lab Credit Account or hosted hardware. Home directory storage limit is 20 GB.

  2. User DOES have access to paid storage

    1. User purchased storage for home directory. Home directory storage limit is set by the user during storage purchase.

    2. User's home directory belongs to a storage group. Home directory storage limit is set by the group admin, up to maximum group storage quota.

    3. User has access to a storage group, but his/her home directory does not belong there. User can store data in the storage group directory up to maximum group quota, home directory storage quota is set as in the point 1 above.

Free storage quotas cannot be combined, added to purchased storage or used for multiple accounts. They are just to make sure users can carry out common  computations without purchasing extra storage.

Managing and purchasing your storage

You can check your storage status on My Storage page, usage and quotas are updated daily at about 5am. If your storage is close or over the quota, or if your paid storage is about to expire you will be notified by e-mail. If your storage is over the quota you will not be able to add any more data.

You can purchase the storage yourself by clicking on "Add or modify storage" button(s) on your "My Storage" page - you will have a separate button for each storage space owned. Typically each user owns his/her home directory, but you may want to create a separate storage space for sharing with other users (a storage group), if you are a member of such storage group you will also see it on "My Storage" page. You can purchase storage using Cornell Account or a credit card. Credit cards are processed by Campus Store, their transaction fee is 5%. If you would like to purchase the storage using an invoice or purchase order (PO) please contact us first.

You can also change the quota without buying any storage using "Add/Modify storage" page (from "My Storage") - just select 0 units to purchase and if you have any purchased storage left you can manipulate quota value - of course it cannot be lowered below your actual storage used.

There are several ways to organize your networked storage, summarized below. Please note that the different ways can be combined (i.e. you can add storage to your home directory AND have access to additional storage group).

  1. Add storage to your home directory. You can add storage in 1 TB-year chunks (91.35 each), you can then decide your quota (e.g. add 2 x 1TB-years, set quota at 1 TB and your expiration date will be 2 years.).

  2. Create a storage group and move your home directory there. This option is especially attractive for research groups , all members of the group can share storage quota. Group PI needs to contact us to create the group first and to move all involved users home directories there, the the group can be managed by the PI (or designated person), users can be added or removed, and storage can be added/renewed same way as home directory storage.

  3. Create a storage group for group storage. Similarly as point 2 a group of users can share the storage group, except that their home directories stay as before. Group PI needs to contact us to create the group, the the group can be managed by the PI (or designated person), users can be added or removed, and storage can be added/renewed same way as home directory storage.

The storage can be only purchased in 1 TB-year chunks, it needs to be done up front, and you can set your quota to an appropriate size, which in turn will decide the expiration date.

The system works like that: you can buy as many of the 1TB-year chunks as you want and then set the quota at the level you want, the expiration date will be computed as the result.

For example you can buy 30 x 1TB-year chunks and set the quota for 30TB, then it will last for 1 year, at which point you will need to buy storage again. You can buy 60 x 1 TB-year chunks and set the quota for 30TB, this will last 2 years. You can change the quota at any time, the remaining TB-years (not rounded) will be used to compute new expiration date. You can add TB-year units at any time (and change or not change quota as you like), you can lower your quota at any time (and push back your expiration date as a result), but you cannot get a refund (i.e. convert the TB-year units left back to $$).  

If you need extra storage for a short time, you can raise your quota temporarily, and then lower it back when not needed, it will use more of your TB-year units, but only so many as needed (and usage is computed based on quota and time, where TB-year are counted with floating point numbers). For example an additional 3 TB quota increase for 6 months will cost you 1.5 TB-year units. Your usage of your TB-year units solely depends on quota you set, essentially you pay for reserving certain amount of storage.

The above storage quota TB-year concepts can be visualized as a rectangle with the vertical side being your quota and horizontal side storage time (from now to expiration date). Your purchased TB-years correspond to the area of the rectangle, you can make it longer horizontally, but then it must be also made shorter vertically to preserve the area. Both rectangles below correspond to the same storage purchase (6 TB-years).

                   

Similarly as with Lab Credit Accounts computing hours you are charged for reservation of storage, i.e. your TB-year storage purchased is used and subtracted based on your quota, NOT the amount of actually stored data.

Data Safety

Each storage array component of our storage cluster is either RAID6 or raidz2 (RAID6 equivalent in ZFS). Each file is localized, i.e. stored on one component server, therefore the total data safety is equivalent to a single RAID6/RAID7 storage array safety level. In practical terms it means that a simultaneous failure of two hard drives in each of the component server will NOT cause any data loss, and in fact will not even cause any data access disruption either. Periodical scans are carried out for find and correct bit rot.

Networked Lab storage does NOT include backups, it is user's responsibility to make sure critical or irreplaceable data is mirrored or backed up to another physical location - keeping two copies of the same data on the same networked storage is NOT a proper backup. The backups are available as a separate service - details are available on this page.

 

 

Website credentials: login  Web Accessibility Help