The purpose of our backup system is to create and periodically update snapshots of selected directories
while retaining, for some time, copies of files that have been deleted or changed.
In order to use our backup system, the user needs to purchase backup
pricing). For increased data safety, the backup storage servers are located in Weill Hall,
separately from the rest of our computational infrastructure.
After purchasing backup storage, the user specifies one or more directories they wish to
back up. Each such directory becomes a backup root. A typical example would be your home directory,
although it is also possible to specify other directories, such as a subdirectory of your home directory, or
a directory located on one of the hosted servers. Each backup root
is backed up entirely (reccursively with all files and subdirectories) except subdirectories or files
When backup of a given directory is being done for the first time, the entire directory (except exclusions)
will be copied to the backup server, i.e., its current snpashot will be created reflecting the directory's
state at backup time. Next time the backup runs, this current snapshot will be updated, i.e., files removed,
added, or changed by the user in the meantime in the source directory will be also removed from, added to, or changed
in the current snapshot.
However, the files that have been removed, as well as previous versions of those that changed, will be saved on
the backup server in a backup snapshot labeled with backup date and time. The backup snapshot contains only
files that have been changed or removed by the user from the source directory since the previous backup cycle.
Subsequent backup cycles will update the current snapshot, create new, dated backup snapshots, and remove the
older ones. This process is illustarted in the figure below.
Thus, the backup server will always contain the current snpashot, reflecting the state of the directory from before
the latest backup, plus a number of dated backup snapshots containing files changed or removed between previous backup cycles.
Multiple snapshots facilitate retrieval of old versions of all files, whenever needed.
The maximum age of the backup snapshots to be kept as well as backup frequency are configurable by the user.
Parameters controling backup
Backup is controled by three parameters, set individually for each backup root directory:
- Retention: age (in days) of the oldest version of the backup root directory to be kept
- Frequency: backup frequency (e.g., setting this to 3 means backup of this directory will be run every 3 days)
- MinSave: minimum number of old versions of the directory to be saved always, regardless of age
(prevents a possibility of all previous versions being erased if the original directory is not changed for longer
than Retention days)
Besides setting these parameters, the user can also specify exclusions - files and/or subdirectories of the
backup root to be omitted from the backup process.
Typical backup scenarios
Depending on your needs, you may consider two basic backup strategies: back up most, exclude some
and back up some, exclude most.
- back up most, exclude some: Specify some top-level directory (such as your homoe directory)
as backup root, possibly with a few exclusions. The advantage is that all changes you make to this
directory (except excluded parts) will be reflected in the backup without you taking any extra effort.
However, if you add some large files which you did not really intend to back up but forget to exclude them,
they will be copied to the backup server and you will be charged for space-time they occupy.
- back up some, exclude most: Backup only one (or more) individual subdirectory of your home directory,
the content of which you consier most important. To do this, you need to specify this subdirectory (rather
than your entire home directory) as backup root. The advantage is that changes you make outside of backup root
will not junk up the backup. However, if any of these changes are important
but you forget to copy or move them into backup root, these changes will not be reflected in the backup.
How to purchase backup storage
- First time users must start by purchasing backup storage by clicking on the
Purchase Backup Credit button on the bottom of the My Storage page.
- Backup storage is purchased in 1 TB-year increments, similarly as
main storage. How long your purchased storage will last depends on
the backup size - it is similar to storage-quota relation (see the
main storage page for details). This means if you purchase 1TB-year of backup
storage and your backup size is 0.5 TB then your 1TB-year of purchased backup storage will last 2 years.
If you backup size is 2TB then your 1TB-year purchase will expire after 6 months.
- Backup storage used to date is calculated daily and reported on your My Storage page.
The remaining backup storage is recomputed accordingly.
- A default name is given to your new Backup Credit Account after you accept the purchase and an invoice is created
(the name may be changed after the purchase from the status table on the My Storage page).
Specify backup root directories
- Click on the Manage Backup button on the bottom of the My Storage page.
- Enter the directory you want to backup (backup root)
in the text box and click on the Add Directory to Backup button
- Use the default Server: Network Storage to specify directories which start with /home.
- You may change "Network Storage" in the Server text box to the name of any specific server which you
can access and where you have files that require backup.
This typically applies to BioHPC Lab hosted servers.
- Once a backup root directory has been added, you will be able to edit the
Retention, Frequency and MinSave parameters (click the Edit button)
and add or remove exlcusions (click Manage Excludes). Click Stop Backup to stop backing up
this directory and remove it from your list of backup roots (of course, this operation does not
in any way affect the source directory).
- Repeat the above steps for all directories you would like backed up.
Exclude directories and/or files from backup roots
- Click on the Manage Excludes button to list the content of the backup root directory
- Click on the Exclude checkbox to exclude a file or a subdirectory from backup
- Enter a subdirectory by clicking on its link, then exclude files and/or subdirectories within it, etc.
- Exclusions can be removed by clicking on the Remove Exclude button, or by un-checking
a box on the directory listing.
Checking the status of your backup account
Once the backup root directories are configured, the My Storgae page will
contain the summary of your backup storage account, updated daily.
Check this page regularly.
You will be notified by e-mail when your purchased backup storage is about
to run out. If needed, purchase more credit, or reduce the backup size by adding more
exclusions or removing backup roots you no longer need (Manage Backup button).
Accessing your backup
Backup directories are exported from backup server and mounted on our login nodes, cbsulogin.tc.cornell.edu
Each user-specified backup root has a corresponding location under /backups/backup1 on both login nodes.
This location reflects the owner, source server, and backup root. The picture below shows three examples, with different
parts of the path color-coded for clarity.
The first backup root is one user's home directory located on Network Storage.
The second is an example of a storage group space, also located on Network Storage under /home.
The last backup root is a directory located on a hosted server cbsubscb02.
Each of these locations is, in turn, organized in current snapshot and backup snapshot
directories. For example, listing the content of the first of the directories above will show
output similar to
The directory current contains the current snapshot, whereas the bak_* directories (each marked with the
date) contain files changed or deleted between the date of the directory and the backup cycle preceding it.
The current directory and each of the bak_* directories contain the actual files and
directories being backed up; in the example above - the directory home/bukowski and its backed up content.
The files on the backup mounts can be listed, browsed, and looked into using regular Linux commands (cd,
less, cat, text editors) or graphical File Manager tool (if connected to cbsulogin or cbsulogin2 via VNC).
are the same as those on the source directories, except that the write permission is always revoked.
The same tools can be used to retrieve files from backup (just copy the files you want from backup directories
to wherever you need them).
How much disk space will your backup take on backup server? It all depends on how many changes your source directory
undergoes each backup cycle. If changes (i.e., size of files being added, deleted, or changed) are small,
the bulk of the backup will be concentrated in the current snapshot, while the bak_* snapshots
(each containing only changes) will be negligibly small. In such a case your backup size will be close
to the size of the source directory. On the other hand, if a lot of changes are made every day, the size of
each of the bak_* directories may become close to the size of the current snapshot, in which
case the total size of your backup will be about [size of the source directory] X (Retention/Frequency+1).
In practice, of course, the actual size will be somewhere between these two extremes.
Few words of caution
My Storage page
- Avoid triggering big backup events. A big (and mostly unnecessary) backup event may happen
if one or more large files are moved around between subdirectories or just renamed. If a large file
within backup root is moved or renamed, it will be transferred (with the new name) to the current snapshot
on backup server while its previous copy (with old name, but otherwise identical) will be saved
in a bak_* snapshot. The result: extra network traffic during backup and doubled backup storage charge.
- Avoid backing up the same directory multiple times. This may happen, for example, when you back up
your directory located within your lab's storage group which itself is already backed up entirely
(i.e., without exclusions) by your lab manager.
- The backup does not follow symlinks. If a directory you are backing up contains symbolic links (shortcuts)
to files located elsewhere, these files will not be backed up unless they are included explicitly in some
backup root directory.