Installation Requirements¶
System Requirements¶
A Determined cluster has the following requirements.
Software¶
The Determined agent and master nodes must be configured with Ubuntu 16.04, Ubuntu 18.04, or CentOS 7, or with MacOS 10.13 or higher.
The agent nodes must have Docker installed.
To run jobs with GPUs, the Nvidia drivers must be installed on each Determined agent. Determined requires version >= 384.81 of the Nvidia drivers. (The Nvidia drivers can be installed as part of installing CUDA, but the rest of the CUDA toolkit is not required.)
Hardware¶
The Determined master node should be configured with at least 4 CPUs (Intel Broadwell or later), 8GB of RAM, and 200GB of free disk space. Note that the Determined master does not use GPUs.
Each Determined agent node should be configured with at least 2 CPUs (Intel Broadwell or later), 4GB of RAM, and 50GB of free disk space. If using GPUs, Nvidia GPUs with compute capability 3.7 or greater are required (e.g., K80, P100, V100, GTX 1080, GTX 1080 Ti, TITAN, TITAN XP).
Mac hardware 2010 or higher can also run a master and agent.
Note
Most of the disk space required by the master is due to the experiment metadata database; if PostgreSQL is set up on a different machine, the disk space requirements for the master are minimal (~100MB).
Installing Docker¶
Every agent node must have Docker installed to allow it to run containerized workloads.
Install on Linux¶
Install the latest release of Docker and nvidia-docker2.
On Ubuntu:
sudo apt-get update && sudo apt-get install -y software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" curl -fsSL https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y --no-install-recommends docker-ce nvidia-docker2 sudo systemctl reload docker sudo usermod -aG docker $USER
On CentOS:
sudo yum install -y yum-utils device-mapper-persistent-data lvm2 sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -fsSL https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo sudo yum install -y docker-ce nvidia-docker2 sudo systemctl start docker
Log out and start a new terminal session. Verify that the current user is in the
docker
group and that thenvidia
Docker runtime is installed:groups docker info | grep Runtimes
If using CentOS 7, enable the persistent storage of journalctl log messages so that logs are saved on machine reboot.
sudo mkdir /var/log/journal sudo systemd-tmpfiles --create --prefix /var/log/journal sudo systemctl restart systemd-journald