Research Cluster

The Research Computing group provides a Red Hat Linux based HPC environment that includes systems of various capabilities and serving a variety of campus research communities. Our Research Cluster, Starlight, is based on Slurm. There are several Slurm partitions in our Research Computing environment, and they are outlined below.

ORION

Orion is our general Slurm partition made up of a mix of Intel Xeon and AMD EPYC compute nodes running Red Hat Enterprise Linux 9.2, that is available for use in any faculty-sponsored research projects. UNC Charlotte Faculty and Graduate student researchers may fill out the account request form for access to our system. For more information about submitting jobs to Orion, check out the Orion (Slurm) User Notes.

110 nodes / 6512 cores

  • 76 nodes with
    • Dual 24-Core Intel Xeon Gold 6248R CPU @ 3.00GHz (48 cores)
    • 384GB RAM (8GB / core)
    • 100GBit EDR Infiniband Interconnect
  • 10 nodes with
    • Dual 32-Core AMD EPYC 7502 CPU @ 2.5GHz (up to 3.35GHz Max Boost); (64 cores)
    • 512GB RAM (8GB / core)
    • 100GBit HDR100 Infiniband Interconnect
  • 20 nodes with
    • Dual 48-Core AMD EPYC 9454 CPU @ 2.75GHz (up to 3.8GHz Max Boost); (96 cores)
    • 784GB RAM (8.1GB / core)
    • 100GBit HDR100 Infiniband Interconnect
  • 1 node with
    • Dual 24-Core Intel Xeon Gold 6248R CPU @ 3.00GHz (48 cores)
    • 1.5TB RAM (31.25GB / core)
    • 100Gbit EDR Infiniband Interconnect
  • 1 node with
    • Quad Intel 16-Core Xeon E7-4850 CPU @ 2.10GHz (64 cores)
    • 4TB RAM (62.5GB / core)
    • 100GBit EDR Infiniband Interconnect
  • 1 node with
    • Quad Intel 32-Core Xeon Gold 6448H CPU @ 2.4GHz (128 cores)
    • 4TB RAM (31.25GB / core)
    • 100GBit HDR100 Infiniband Interconnect
  • 1 node with
    • Dual 32-Core AMD EPYC 7542 CPU @ 2.9GHz (up to 3.4GHz Max Boost) (64 cores)
    • 4TB RAM (62.5GB / core)
    • 200GBit HDR Infiniband Interconnect

GPU

GPU is a general use Slurm partition made up of several GPU compute nodes, that is available for use in any faculty-sponsored research projects. For more information about submitting jobs to the GPU partition, check out the “Submitting a GPU Job” section in the Orion & GPU (Slurm) User Notes.

13 nodes / 60 NVIDIA GPUs / 544 computing cores:

  • 2 “V100S” GPU nodes with
    • Dual 8-Core Intel Xeon Silver 4215R CPU @ 3.20GHz (16 cores)
    • 192GB RAM (12GB / core)
    • 1 node w/ 8 x NVIDIA Tesla V100S Tensor Core GPUs (32GB HBM2 RAM per GPU)
    • 1 node w/ 4 x NVIDIA Tesla V100S Tensor Core GPUs (32GB HBM2 RAM per GPU)
    • 100Gbit EDR Infiniband Interconnect
  • 2 “A100” GPU nodes with
    • Dual 16-Core Intel Xeon Gold 6326 CPU @ 2.90GHz (32 cores)
    • 256GB RAM (8GB / core)
    • 4 x NVIDIA A100 Tensor Core GPUs (80GB HBM2e RAM per GPU)
    • 100Gbit HDR Infiniband Interconnect
  • 1 “HGX A100” GPU node with
    • Dual 64-Core AMD EPYC 7742 CPU @ 2.25GHz (up to 3.4GHz Boost); 128 cores total
    • 1TB RAM (8GB / core)
    • 8 x NVIDIA A100 Tensor Core GPUs (40GB HBM2 RAM per GPU)
    • 8-way NVLink
    • 200Gbit HDR Infiniband Interconnect
  • 6 “A40” GPU nodes with
    • Dual 16-Core Intel Xeon Gold 6326 CPU @ 2.90GHz (32 cores / node)
    • 256GB RAM (8GB / core)
    • 4 x NVIDIA A40 Data Center GPUs (48GB GDDR6 ECC RAM per GPU)
    • 100Gbit HDR100 Infiniband Interconnect
  • 2 “L40S” GPU nodes with
    • Dual 32-Core Intel Xeon Platinum 8362 CPU @ 2.80GHz
    • 512GB RAM (8GB / core)
    • 4 x NVIDIA L40S Data Center GPUs (48GB GDDR6 ECC RAM per GPU)
    • 100Gbit HDR100 Infiniband Interconnect

Nebula and NEBULA_GPU

Nebula and Nebula_GPU are virtual partitions running as a lower priority overlay to insure maximum system efficiency by using cores from multiple partitions including faculty sponsored systems. The resources in these partitions will vary over time. We encourage you to try these partitions for short-running jobs (<48hours). Please keep in mind:

  • Nebula and Nebular_GPU jobs have a max time of 48 hours
  • The available nodes in these partitions will vary, but you can always see the current node list with the “pestat” command:
    • pestat -p Nebula
    • pesata -p Nebula_GPU
  • The available GPUs in the Nebula_GPU partition will vary, but you can always see what GPUs are available with the “sinfo” command:
    • sinfo -p Nebula_GPU -o “%14N %6c %8m %34f %20G”

DATA STORAGE

URC provides a unified storage environment that is shared across the research computing environment. The interactive nodes (hpc.charlotte.edu) and the compute nodes all access the storage as network mounted storage as though it is local to each node.

Each user is provided with:

  • a 500 GB home directory that is backed up for disaster recovery at /users/username
  • 5TB of temporary scratch space (up to 10TB upon request) at /scratch/username

Scratch space is for holding temporary data needed by currently running jobs only and is not meant to hold critical data long term. Note that scratch is not backed up and certain failures or user errors will result in data loss. DO NOT store important data in scratch. If scratch fills, URC staff may delete older data.

The home directory (/users/username) is the default working directory when initially logged into the interactive nodes and for any node running a batch job in Slurm.

Shared storage volumes in /projects are available for research groups upon request and must be requested by and owned by a faculty member. (Subject to available space.)

NEVER modify the permissions on your /home or /scratch directory. If you need assistance, please contact us.

Upon request any user can request access to a high performance temporary space designed to alleviate bottlenecks when frequently reading and writing to files within jobs. For many applications this storage is at least an order of magnitude faster than our other storage. It is also the most expensive storage and is actively managed by an automated process to free up space. Access can be requested for individual temporary space by any user. Faculty members can also request project temporary space.

The path for the high performance temporary space is /vast/temp/username. The /vast/temp space is a volatile temporary space which is not backed up and where files not recently used will be deleted by the system. The system operates with a goal of not deleting /vast/temp files which have been used within the past 30 days, but under heavy load the retention time may be shorter. Users are responsible for making sure important data is moved to the directories which are backed up (/users/username and /project).

Data Protection

All URC storage has storage device level redundancy using RAID or other methods to protect against media failures. This does not protect against system failures or user errors so it is important for users to be aware of the subset of the storage systems with additional backup protection.

Although URC backs up some file spaces, be sure to maintain an additional copy of critical data outside of the HPC environment.

The Home directories and /project directories have two sets (Daily and Weekly) of backups each home and project directories. These are rolling backups going back 7 days (Daily) and 4 weeks (Weekly)/

The backup approach for each type of storage space is summarized below

  • /users/username directories have a 7-day backup and 4-week backup.
  • /projects have a 7-day backup and 4-week backup.
  • /scratch/username directories are NOT backed up.
  • /vast/temp is NOT backed up AND files not recently accessed are subject to automated system deletion.


In addition to our general purpose Slurm partitions, we manage and provide infrastructure support for a number of cluster partitions that were purchased by individual faculty or research groups to meet their specific needs. These resources include:

DRACO

10 nodes / 416 cores:

  • 8 nodes with
    • Dual 18-Core Intel Xeon Gold 6154 CPU @ 3.00GHz (36 cores / node)
    • 388GB RAM (10.7GB / core)
    • 100Gbit EDR Infiniband Interconnect
  • 2 nodes with
    • quad Intel 2.1GHz 16-core processors – Xeon E7-4850 v4
    • 4TB RAM (64GB / core)

PISCES

6 nodes / 216 cores:

  • 6 nodes with:
    • Dual 18-Core Intel Xeon Gold 6154 CPU @ 3.00GHz (36 cores / node)
    • 388GB RAM (10.7GB / core)
    • 100Gbit EDR Infiniband Interconnect

SERPENS

13 nodes / 612 computing cores:

  • 12 nodes with:
    • Dual 24-Core Intel Xeon Gold 6248R CPU @ 3.00GHz (48 cores / node)
    • 384GB RAM (8GBs / core)
    • 100GBit EDR Infiniband Interconnect
  • 1 (Interactive) node with:
    • Dual 18-Core Intel Xeon Gold 6154 CPU @ 3.00GHz (36 cores / node)
    • 384GB RAM (10.67GBs / core)
  • 182TB dedicated, usable RAID storage

PEGASUS

3 nodes / 100 cores:

  • 1 nodes with
    • Dual 18-Core Intel Xeon Gold 6154 CPU @ 3.00GHz (36 cores / node)
    • 388GB RAM (10.7GB / core)
    • 100Gbit EDR Infiniband Interconnect
  • 2 nodes with
    • Dual 16-Core Intel Xeon E5-2697A v4 CPU @ 2.6GHz (32 cores / node)
    • 256GB RAM (8GB / core)
    • 100Gbit EDR Infiniband Interconnect
  • 1 node with
    • Dual 32-Core AMD EPYC 7513 CPU @ 2.6GHz up to 3.9GHz Max Boost (64 cores / node)
    • 4TB RAM (62.5GB / core)
    • 100GBit HDR Infiniband Interconnect

HERCULES

2 nodes / 52 cores:

  • 1 node with
    • Dual 10-Core Intel Xeon Silver 4114 CPU @ 2.20GHz (20 cores)
    • 192GB RAM (9.6GB / core)
    • 2 x NVIDIA Titan V GPUs (12GB HBM2 RAM per GPU)
    • 100GBit EDR Infiniband Interconnect
  • 1 node with
    • Single 32-Core AMD EPYC 7502 CPU @ 2.5GHz (up to 3.35GHz Max Boost); (32 cores)
    • 256GB RAM (4GBs / core)
    • 2 x NVIDIA A100 Tensor Core GPUs (40GB HBM2e RAM per GPU)
    • 100GBit EDR Infiniband Interconnect