Working with files
The DCC has several shared file systems available for all users of the cluster. General partitions are on Isilon, 40Gbps or 10Gbps network attached storage arrays.
Sensitive data is not permitted on cluster storage.
||650 TB||Unpartitioned, high speed volume, shared across all users. Files older than 75 days are purged automatically.||None|
||10 GB||Use for personal scripts, and other environment setup. When a users Duke ID is deactivated, their access and home directory is automatically removed from the cluster.||None|
||1 TB, Expandable for a fee||Private to each lab group, working space for applications and projects that last longer than 75 days.||7 day snapshot|
||Fee-based||Archival storage that can be mounted to the DCC to ease transfer of data to computational space and results out to long term storage. Because I/O will not be as performant as with cluster storage, job file access should not be configured that will cause excessive read/write to Data Commons storage.||Optional 30 day backup|
How should I use DCC storage?
To optimize the performance of the cluster and make your utilization efficient, we recommend the following:
- /home -> personal scripts and configuration files, environment setup information
- /group -> software installations, lab specific scripts, moderately sized data sets or intermediate results that are needed for longer than 75 days.
/work -> large data sets under analysis, intermediate results. In the root folder, create your own folder for your use with:
Remember: Files older than 75 days are automatically purged!
/datacommons -> long term storage for source data and results data
Some DCC compute nodes also have a /scratch volume that is local to the compute node. This can be used when highly performant storage is needed during a job, but data should be deleted at the completion of the job. /scratch is not available on every node, and sizes vary, so use of this requires strong working knowledge of the nodes on the DCC.
Viewing usage in...
View your current utilization with:
du -hd1 /hpc/home/<netid>
$ du -hd1 /hpc/home/kk338 1.5M /hpc/home/kk338/.config 32K /hpc/home/kk338/Desktop 175K /hpc/home/kk338/.vnc 138K /hpc/home/kk338/.dbus 32K /hpc/home/kk338/.singularity 173K /hpc/home/kk338/.ipython 56K /hpc/home/kk338/.java 80K /hpc/home/kk338/.ssh 80K /hpc/home/kk338/bin 6.0K /hpc/home/kk338/R 104K /hpc/home/kk338/danai 698K /hpc/home/kk338/.cache 11M /hpc/home/kk338/ondemand 3.4M /hpc/home/kk338/.local 195K /hpc/home/kk338/.jupyter 22M /hpc/home/kk338/.comsol 28K /hpc/home/kk338/.conda 160K /hpc/home/kk338/.gnupg 200K /hpc/home/kk338/tutorial 383K /hpc/home/kk338/.matlab 243M /hpc/home/kk338
View your current group volume size and amount used with:
df -h /hpc/group/<groupname>
$ df -h /hpc/group/rescomp Filesystem Size Used Avail Use% Mounted on oit-nas-fe13.dscr.duke.local:/hpc-rescomp 1.0T 390G 635G 39% /hpc/group/rescomp
View your current usage of /work with:
storage-report -u <netid>
$ storage-report -u tm103 Report data: Thu Nov 18 18:00:00 EST 2021 - Fri Nov 19 02:31:06 EST 2021 Directory: /work Report type: User tm103 2.7TiB /work/tm103/ 15MiB /work/tmp_dir/ 152KiB /work/pgi/
View your current volume size and amount used with:
df -h /datacommons/<groupname>
$ df -h /datacommons/plusds Filesystem Size Used Avail Use% Mounted on oit-nas-fe13dc.dscr.duke.local:/ifs/datacommons/plusds 12T 12T 716G 95% /datacommons/plusds
SCP (Secure Copy)
You must be connected to the Duke network or from off campus, use VPN.
The general syntax to copy a file to the DCC is (push):
$scp <localpath.file> <netid>@dcc-login.oit.duke.edu:<dccpath>
The general syntax to copy a file from the DCC is (pull):
$scp <netid>@dcc-login.oit.duke.edu:<dccpath.filename> <localpath>
While you can use
scp -r to recursively copy all of the files in a directory, we recommend the use of
rsync for a large number of files.
When executing SCP for files to or from the DCC, MFA is required, and will default to your first available option. If you are having trouble with MFA and SCP, get help with MFA, or bypass MFA all together by setting up and using SSH Keys to access the DCC.
Sample command and output pushing a file from my workstation to my group directory:
kk338@CDSS-5630 ~ % scp jobs.txt email@example.com:/hpc/group/rescomp/kk338 Enter passphrase for key '/Users/kk338/.ssh/id_rsa': jobs.txt 100% 3847KB 8.4MB/s 00:00
Sample command and output pulling a file from my group directory to my local workstation (note the use of
. to denote current working directory):
kk338@CDSS-5630 ~ % scp firstname.lastname@example.org:/hpc/group/rescomp/kk338/DailyUsage.xlsx . Enter passphrase for key '/Users/kk338/.ssh/id_rsa': DailyUsage.xlsx 100% 40KB 361.1KB/s 00:00
In these examples, SSH keys are used to simplify the login process.
rsync for large files:
rsync –rP dir1/ email@example.com:.
rsync –rP firstname.lastname@example.org:~/dir1 .
For large data transfers, external transfers, and repetitive transfers, we recommend the use of Globus.