Recommendations for HPC Environment Design¶
SLURM Setup (From Controller VM)¶
Create a group for the slurm VM (add at least
slurm1as a node in the group, set additional groups ofservices,cluster,domainallows for more diverse group management):metal configure group slurm
Customise
slurm1node configuration (set the primary IP address to 10.10.0.6):metal configure node slurm1
Create
/var/lib/metalware/repo/config/slurm1.yamlwith the following network and server definition:slurm: is_server: true
Add the following to
/var/lib/metalware/repo/config/domain.yaml(setserverto the hostname of the SLURM VM):slurm: server: slurm1 is_server: false mungekey: ff9a5f673699ba8928bbe009fb3fe3dead3c860c
Additionally, add the following to the
setup:namespace list in/var/lib/metalware/repo/config/domain.yaml:- /opt/alces/install/scripts/06-slurm.sh
Download the
slurm.shscript to the above location:mkdir -p /opt/alces/install/scripts/ cd /opt/alces/install/scripts/ wget -O 06-slurm.sh https://raw.githubusercontent.com/alces-software/knowledgebase/release/2017.1/epel/7/slurm/slurm.sh
Build SLURM RPMs in custom repo (
/opt/alces/repo/custom/Packages), a guide to building the SLURM RPMs can be found in the SLURM documentation. Once the packages have been moved to the previously mentioned custom repo directory, rebuild the repo withcreaterepo customFollow Client Deployment Example to setup the SLURM node
Note
All systems that are built will have SLURM installed and the SLURM daemon running which will allow that node to submit and run jobs. Should this not be desired then the service can be permanently stopped and disabled with systemctl disable slurmd && systemctl stop slurmd on the node which is no longer to run SLURM.
Modules Setup (From Deployment VM)¶
The environment modules software allows for dynamic path changing on a user profile.
Create a group for the modules VM (add at least
apps1as a node in the group, set additional groups ofservices,cluster,domainallows for more diverse group management):metal configure group apps
Customise
apps1node configuration (set the primary IP address to 10.10.0.7):metal configure node apps1
Create
/var/lib/metalware/repo/config/apps1.yamlwith the following network and server definition:modules: is_server: true
Add the following to
/var/lib/metalware/repo/config/domain.yaml(setserverto the IP of the apps VM):modules: server: 10.10.0.7 directory: /opt/apps is_server: false
Additionally, add the following to the
setup:namespace list in/var/lib/metalware/repo/config/domain.yaml:- /opt/alces/install/scripts/07-modules.sh
Download the
modules.shscript to the above location:mkdir -p /opt/alces/install/scripts/ cd /opt/alces/install/scripts/ wget -O 07-modules.sh https://raw.githubusercontent.com/alces-software/knowledgebase/release/2017.1/epel/7/modules/modules.sh
Follow Client Deployment Example to setup the apps node
Note
The apps directory can be setup on the storage node if one was created, this allows for all NFS exports to come from a centralised server.