Migrating to Red Hat 9
We are in the process of upgrading from Red Hat Enterprise Linux version 8 to version 9. Along with the OS upgrade, we have installed the latest version of Slurm, and many of the applications have been updated too. We have built a completely new cluster environment based on Red Hat 9, and the current production cluster environment based on Red Hat 8 remains intact.
The Red Hat 9 based cluster is now open to all users. This will allow you to try your compute jobs in the new Red Hat 9 cluster environment. This will also allow you to confirm that your codes work prior to turning off the old Red Hat 8 based cluster.
Here are some notes that may help you with your migration to Red Hat 9:
—–
- To submit jobs to the new Red Hat 9 based cluster, SSH to hpc9.charlotte.edu instead of hpc.charlotte.edu
- You will still have access to the current Red Hat 8 based cluster; simply SSH to hpc.charlotte.edu (instead of hpc9), just like you’ve always done.
- There are currently 2 compute partitions in the Red Hat 9 cluster: Orion and GPU. You are still limited to 512 active CPU cores in Orion, just like the Red Hat 8 partition.
- The Red Hat 9 based Orion cluster is made up of our newest 96-core AMD EPYC-based compute nodes. Adjust your jobs accordingly. We will soon upgrade the 48-core Intel Xeon-based compute nodes from Red Hat 8 to Red Hat 9, and incorporate them into the new environment.
- The GPU partition has only 2 GPU compute nodes in it, each with 4 NVIDIA L40S GPUs. As we upgrade the Red Hat 8 based GPU nodes, this partition will grow.
- Both of these partitions will continue to grow, and the Red Hat 8 partitions will shrink over time as we upgrade and migrate compute nodes from Red Hat 8 to 9.
- If you have submit scripts from the “old” cluster that don’t seem to work on Red Hat 9, please check a couple of things:
- Take a close look at the environment modules that you are loading. The versions of many of the applications have changed, so your script may need to be updated to reflect the version(s) we now have installed. Also, double-check the names of the environment modules. Some of the module names have changed, even within the same version.
- If an environment module that you use is missing, please let me know right away. There were some applications we had on Red Hat 8, that would not install and/or compile under Red Hat 9, and there may be others that we have not yet migrated. Just let us know as soon as possible, so that we can try to get it working on Red Hat 9.
- If you load any environment modules in your startup scripts (.bash_profile or .bashrc) that do not exist in Red Hat 9, you will receive an error like this:
- ERROR: Unable to locate a modulefile for ‘xxxxx/1.0’
- If this happens, you will need to list the available environment modules for the applications you would like to preload, see what versions are available, and update your .bash_profile or .bashrc file.
- If you SSH into the HPC cluster using the Windows Powershell, you can potentially run into an issue when SSH’ing into the new Red Hat 9 based cluster. Turns out Windows has an outdated OpenSSL library that causes an issue when trying to SSH into Red Hat 9. More info and workaround at the following links:
- https://serverfault.com/questions/994646/ssh-on-windows-corrupted-mac-on-input
- https://www.nrel.gov/hpc/announcements/posts/windows-ssh-workaround.html
- If you have this issue, here are a few solutions:
- Use PuTTY ( https://www.putty.org/ )
- Use MobaXterm ( https://mobaxterm.mobatek.net/ )
- If you’d like to use SSH in the Windows Terminal, you have to add a switch to your SSH command (in red):
ssh -m hmac-sha2-512 <username>@hpc9.charlotte.edu