Skip to content

Working with files

Overview

The DCC has several shared file systems available for all users of the cluster. General partitions are on Isilon, 40Gbps or 10Gbps network attached storage arrays.

Sensitive data is not permitted on cluster storage.

Path Size Description Backups
/work 650 TB Unpartitioned, high speed volume, shared across all users. Files older than 75 days are purged automatically. None
/hpc/home/<netid> 10 GB Use for personal scripts, and other environment setup. When a users Duke ID is deactivated, their access and home directory is automatically removed from the cluster. None
/hpc/group/<groupname> 1 TB, Expandable for a fee Private to each lab group, working space for applications and projects that last longer than 75 days. 7 day snapshot
/datacommons/<groupname> Fee-based Archival storage that can be mounted to the DCC to ease transfer of data to computational space and results out to long term storage. Because I/O will not be as performant as with cluster storage, job file access should not be configured that will cause excessive read/write to Data Commons storage. Optional 30 day backup

How should I use DCC storage?

To optimize the performance of the cluster and make your utilization efficient, we recommend the following:

  • /home -> personal scripts and configuration files, environment setup information
  • /group -> software installations, lab specific scripts, moderately sized data sets or intermediate results that are needed for longer than 75 days.
  • /work -> large data sets under analysis, intermediate results. In the root folder, create your own folder for your use with: mkdir <netid>

    Remember: Files older than 75 days are automatically purged!

  • /datacommons -> long term storage for source data and results data

Some DCC compute nodes also have a /scratch volume that is local to the compute node. This can be used when highly performant storage is needed during a job, but data should be deleted at the completion of the job. /scratch is not available on every node, and sizes vary, so use of this requires strong working knowledge of the nodes on the DCC.

Viewing usage in...

.../home

View your current utilization with: du -hd1 /hpc/home/<netid>

$ du -hd1 /hpc/home/kk338
1.5M    /hpc/home/kk338/.config
32K /hpc/home/kk338/Desktop
175K    /hpc/home/kk338/.vnc
138K    /hpc/home/kk338/.dbus
32K /hpc/home/kk338/.singularity
173K    /hpc/home/kk338/.ipython
56K /hpc/home/kk338/.java
80K /hpc/home/kk338/.ssh
80K /hpc/home/kk338/bin
6.0K    /hpc/home/kk338/R
104K    /hpc/home/kk338/danai
698K    /hpc/home/kk338/.cache
11M /hpc/home/kk338/ondemand
3.4M    /hpc/home/kk338/.local
195K    /hpc/home/kk338/.jupyter
22M /hpc/home/kk338/.comsol
28K /hpc/home/kk338/.conda
160K    /hpc/home/kk338/.gnupg
200K    /hpc/home/kk338/tutorial
383K    /hpc/home/kk338/.matlab
243M    /hpc/home/kk338

.../group

View your current group volume size and amount used with: df -h /hpc/group/<groupname>

$ df -h /hpc/group/rescomp
Filesystem                                 Size  Used Avail Use% Mounted on
oit-nas-fe13.dscr.duke.local:/hpc-rescomp  1.0T  390G  635G  39% /hpc/group/rescomp

.../work

View your current usage of /work with: storage-report -u <netid>

$ storage-report -u tm103
Report data: Thu Nov 18 18:00:00 EST 2021 - Fri Nov 19 02:31:06 EST 2021
Directory:   /work
Report type: User tm103

2.7TiB  /work/tm103/
15MiB   /work/tmp_dir/
152KiB  /work/pgi/

.../datacommons

View your current volume size and amount used with: df -h /datacommons/<groupname>

$ df -h /datacommons/plusds
Filesystem                                              Size  Used Avail Use% Mounted on
oit-nas-fe13dc.dscr.duke.local:/ifs/datacommons/plusds   12T   12T  716G  95% /datacommons/plusds

Transferring files

SCP (Secure Copy)

You must be connected to the Duke network or from off campus, use VPN.

The general syntax to copy a file to the DCC is (push):

$scp <localpath.file> <netid>@dcc-login.oit.duke.edu:<dccpath>

The general syntax to copy a file from the DCC is (pull):

$scp <netid>@dcc-login.oit.duke.edu:<dccpath.filename> <localpath>

While you can use scp -r to recursively copy all of the files in a directory, we recommend the use of rsync for a large number of files.

When executing SCP for files to or from the DCC, MFA is required, and will default to your first available option. If you are having trouble with MFA and SCP, get help with MFA, or bypass MFA all together by setting up and using SSH Keys to access the DCC.

Sample command and output pushing a file from my workstation to my group directory:

kk338@CDSS-5630 ~ % scp jobs.txt kk338@dcc-login.oit.duke.edu:/hpc/group/rescomp/kk338
Enter passphrase for key '/Users/kk338/.ssh/id_rsa': 
jobs.txt                                      100% 3847KB   8.4MB/s   00:00

Sample command and output pulling a file from my group directory to my local workstation (note the use of . to denote current working directory):

kk338@CDSS-5630 ~ % scp kk338@dcc-login.oit.duke.edu:/hpc/group/rescomp/kk338/DailyUsage.xlsx .
Enter passphrase for key '/Users/kk338/.ssh/id_rsa': 
DailyUsage.xlsx                                               100%   40KB 361.1KB/s   00:00

In these examples, SSH keys are used to simplify the login process.

rsync

Use rsync for large files:

rsync –rP dir1/ netid@dcc-login-02.oit.duke.edu:.

or:

rsync –rP netid@dcc-login.oit.duke.edu:~/dir1 .

Globus

For large data transfers, external transfers, and repetitive transfers, we recommend the use of Globus.

Back to top