Custom installation for WUR HPC Cluster
This is a repo containing roles and playbooks for installing a new WUR HPC cluster from scratch.
It consists of the following main components:
- A Pacemaker config for HA masters
- A DHCP/DNS/TFTP server config for cluster management
- A node booting environment (SALI)
- A scheduler config (Slurm)
- Configuration of an HPC software environment (Easybuild)
Plus configs for building chroots of compute nodes. Please see the attached roles, and look at the playbooks for examples.
Please note that the order of roles is important. Due to the prerequisite nature of some tasks, elements for configuring them relative to others are contained within each.
Requirements
Your initial head node will need:
- An 'internal' network connection (cluster facing, for DHCP/DNS/SALI booting)
- An 'external' network connection (internet/site network facing)
- A 'BMC' network connection (For connecting to OOB machine management interfaces)
- A shared block device for NFS sharing of the home directories/software environment
- This (currently) should be formatted as an LVM volume with an available VG called 'Vshared'
There are extra roles for e.g. joining an external LDAP/AD compatible site infrastructure. Please see the main config.
Playbook contents
The idea behind the playbook design is so that the admin can have one document that details the exact layout of the entire cluster. This then precludes the idea of hiding variables off in multiple different places, as this rapidly becomes difficult to gain oversight over. There is one main exception - site-based secrets (such as the AD joining bind data) that aren't site specific should be hidden away in a separate file so as to not get included in any local copy of this file you might have.