Requirements#

System Requirements#

A Determined cluster has the following requirements.

Software#

  • The Determined agent and master nodes must be configured with Ubuntu 20.04 or later, Enterprise Linux 7 (such as AlmaLinux, Red Hat Enterprise Linux, or Rocky Linux), or macOS 10.13 or later.

  • The agent nodes must have Docker installed.

  • To run jobs with GPUs, the NVIDIA drivers must be installed on each Determined agent. Determined requires a version greater than or equal to 450.80 of the NVIDIA drivers. The NVIDIA drivers can be installed as part of a CUDA installation but the rest of the CUDA toolkit is not required.

  • Determined supports the active Python versions.

Hardware#

  • The Determined master node should be configured with at least four Intel Broadwell or later CPU cores, 8GB of RAM, and 200GB of free disk space. The Determined master node does not need GPUs.

  • Each Determined agent node should be configured with at least two Intel Broadwell or later CPU cores, 4GB of RAM, and 50GB of free disk space. If you are using GPUs, NVIDIA GPUs with compute capability 6.0 or greater are required. These include P100, V100, A100, RTX 2080 Ti, RTX 3090, TITAN X, and TITAN XP.

Most of the disk space required by the master is because of the experiment metadata database. If PostgreSQL is set up on a different machine, the disk space requirements for the master are minimal (~100MB).

Install Docker#

Docker is a dependency of several Determined system components. For example, every agent node must have Docker installed to run containerized workloads.

Install on Linux#

  1. Install Docker. Docker version 20.10 or later is required on the machine where the agent is running.

    On Ubuntu:

    sudo apt update && sudo apt install -y ca-certificates curl gnupg
    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    
    sudo apt update && sudo apt install -y --no-install-recommends docker-ce
    sudo usermod -aG docker $USER
    sudo systemctl reload docker
    

    On Enterprise Linux:

    sudo dnf install -y device-mapper-persistent-data lvm2
    sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    
    sudo dnf install -y docker-ce
    sudo usermod -aG docker $USER
    sudo systemctl start docker
    
  2. If the machine has GPUs that you want to use with Determined, install the NVIDIA Container Toolkit to allow Docker to run containers that use the GPUs. For more information, see the NVIDIA documentation.

    On Ubuntu:

    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null
    
    sudo apt update
    sudo apt install -y nvidia-container-toolkit
    sudo systemctl restart docker
    

    On Enterprise Linux:

    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    
    sudo dnf install -y nvidia-container-toolkit
    sudo systemctl restart docker
    
  3. Log out and start a new terminal session.

  4. Verify that the current user is in the docker group and, if the machine has GPUs, that Docker can start a container using them:

    groups
    docker run --gpus all --rm debian:10-slim nvidia-smi
    
  5. If you are using Enterprise Linux, enable the journalctl log messages persistent storage so logs are saved on machine reboot:

    sudo mkdir /var/log/journal
    sudo systemd-tmpfiles --create --prefix /var/log/journal
    sudo systemctl restart systemd-journald
    

Install on macOS#

  1. Install Docker for macOS by following the Docker documentation. The Docker documentation describes system requirements, chipset dependencies, and installation steps.

  2. Start Docker:

    open /Applications/Docker.app
    

Docker on macOS does not support containers that use GPUs. Because of this, macOS Determined agents are only able to run CPU-based workloads.

Install on Windows Subsystem for Linux (WSL)#

Follow the steps for installing Docker on Linux.