Overview
The Duke Compute Cluster consists of machines that the University has provided for community use and that researchers have purchased to conduct their research. The equipment is housed in enterprise-grade data centers on Duke’s West Campus. Cluster hardware is heterogenous based on installation date and current standards.
General users have access to common nodes purchased by the University and low-priority access to researcher purchased nodes. Researchers who have provided equipment have high-priority access to their nodes in addition to the general access. Low priority consumption of cycles greatly increases the efficiency of the cluster overall, while also providing all users the benefit of being able to access more than their own nodes’ cycles when they might need it. Low priority jobs on the machines yield to high priority jobs.
Appropriate Use
Users of the cluster agree to only run jobs that relate to the research mission of Duke University. Use of the cluster for the following activities is prohibited:
- Financial gain
- Commercial or business use
- Unauthorized use or storage of copyright-protected or proprietary resources
- Unauthorized mining of data on or off campus (including many web scraping techniques)
Data Security and Privacy
Users of the cluster are responsible for the data they introduce to the cluster and must follow all applicable Duke (including IRB), school, and departmental policies on data management and data use. Security and compliance provisions on the cluster are sufficient to meet the Duke data classification standard for public or restricted data. Use of sensitive data (e.g. legally protected data such as PHI or FERPA) or data bound by certain restrictions in data use agreements is not allowed. Data that has been appropriately de-identified or obfuscated potentially may be introduced to the cluster without violating data use agreements or government regulations.
As a shared resource, privacy on the cluster is constrained and users of the cluster must conduct themselves in a manner that respects other researchers’ privacy. Cluster support staff have access to all data on the cluster and may inspect elements of the system from time to time. Metadata on the cluster and utilization by group (and sometimes user) will be made available to all cluster users and Duke stakeholders.
Getting a DCC Account
Duke researches are granted access to the DCC by the point of contact for their group or lab, if you need help finding your point of contact, check the list of current DCC groups.
Group point of contacts manage DCC membership through our self service portal: rtoolkits.web.duke.edu. If you are a faculty researcher and are interested in setting up a new group, either to purchase equipment for the cluster, or to try out the cluster using common and scavenger access, contact us.
Any user with a Duke netid can be added to the DCC.
Purchasing resources for the DCC
For research groups needing access to additional computing resources, PIs may purchase compute nodes from current standard options to add to the cluster.
There are no operating costs for managing and housing PI-purchased compute nodes that are part of the standard cluster installation. Owners have priority access to the computing resources they purchase, but can access more nodes for their research if they need to.
Research groups interested in purchasing computing nodes or additional storage should email Research Computing.
Best Practices for Use of Shared Resources
Cluster users are working in a shared environment and must adhere to usage best practices to ensure the performance for all cluster users.
Computational Work (Jobs) on Shared Resources
All computational work should be submitted to the cluster through the job scheduler (SLURM). Running jobs on the login nodes is an abuse of the system Common partition resources should be used judiciously. Groups with sustained needs should purchase nodes for high-priority access. Use of scavenger partitions is encouraged for bursting, large workloads, and other short term needs Use of long running jobs on common and scavenger partitions is discouraged. This is for fairness to other users and because node failures and scheduled maintenance may require interruption of processes. The use of check-pointing is good computing practice for long running jobs on all partitions.
Cluster Shared Storage Resources (/work and /scratch)
DCC storage is designed and optimized for very large data under computation not data storage. Labs requiring long term data storage may upgrade their group storage or add additional storage at a cost, see our pricing.
In order to keep processing overhead low and operations fast on shared storage, there are no backups, and no logging of usage actions. Since these areas are susceptible to data loss, users of the cluster should retain a copy of their irreplaceable data at a separate location and they should remove results from shared space frequently.
Capacity is at a premium and users should clean up and remove their own data at the conclusion of their computation. Additionally, to prevent shared volumes from filling up, files older than 75 days on /work and will be purged on the 1 and 15 of every month. Notifications will not be sent. Touching files to expressly avoid the purge process is prohibited. If storage utilization reaches potentially impactful levels to users, the following procedure will be used:
- If utilization exceeds 80%, notice will be sent to top storage users advising that we are approaching capacity, save essential results to lab storage, and delete files that are least impactful to ongoing work
- If utilization exceeds 90%, files from the notified top storage users will be purged until utilization is back at 80%
- If the above efforts do not succeed in reducing utilization, a general purge will be run off cycle with decreasing age of files as needed, notifications will be sent to all /work users
Users who require exceptional use of /work (>20TB for more than 1 week) must notify rescomputing@duke.edu. Purge practices will change over time based on the needs of managing the cluster.