Shortcuts

Installation Requirements

System Requirements

A Determined cluster has the following requirements.

Software

  • The Determined agent and master nodes must be configured with Ubuntu 16.04, Ubuntu 18.04, or CentOS 7, or with macOS 10.13 or higher.

  • The agent nodes must have Docker installed.

  • To run jobs with GPUs, the Nvidia drivers must be installed on each Determined agent. Determined requires version >= 384.81 of the Nvidia drivers. (The Nvidia drivers can be installed as part of installing CUDA, but the rest of the CUDA toolkit is not required.)

Hardware

  • The Determined master node should be configured with at least 4 CPUs (Intel Broadwell or later), 8GB of RAM, and 200GB of free disk space. Note that the Determined master does not use GPUs.

  • Each Determined agent node should be configured with at least 2 CPUs (Intel Broadwell or later), 4GB of RAM, and 50GB of free disk space. If using GPUs, Nvidia GPUs with compute capability 3.7 or greater are required (e.g., K80, P100, V100, A100, GTX 1080, GTX 1080 Ti, TITAN, TITAN XP).

Note

Most of the disk space required by the master is due to the experiment metadata database; if PostgreSQL is set up on a different machine, the disk space requirements for the master are minimal (~100MB).

Installing Docker

Docker is a dependency of several components of the Determined system. For example, every agent node must have Docker installed to allow it to run containerized workloads.

Install on Linux

  1. Install Docker. We require the version of Docker 19.03 or newer on the machine where the agent is running.

    On Ubuntu:

    sudo apt-get update && sudo apt-get install -y software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    
    sudo apt-get update && sudo apt-get install -y --no-install-recommends docker-ce
    sudo systemctl reload docker
    sudo usermod -aG docker $USER
    

    On CentOS:

    sudo yum install -y yum-utils device-mapper-persistent-data lvm2
    sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    
    sudo yum install -y docker-ce
    sudo systemctl start docker
    
  2. If the machine has GPUs that you would like to use with Determined, install the Nvidia Container Toolkit to allow Docker to run containers that use them. For more information, see the Nvidia documentation.

    On Ubuntu:

    curl -fsSL https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
    
    sudo apt-get install -y --no-install-recommends nvidia-container-toolkit
    sudo systemctl restart docker
    

    On CentOS:

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -fsSL https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
    
    sudo yum install -y nvidia-container-toolkit
    sudo systemctl restart docker
    
  3. Log out and start a new terminal session. Verify that the current user is in the docker group and, if the machine has GPUs, that Docker can start a container using them.

    groups
    docker run --gpus all --rm debian:10-slim nvidia-smi
    
  4. If using CentOS 7, enable the persistent storage of journalctl log messages so that logs are saved on machine reboot.

    sudo mkdir /var/log/journal
    sudo systemd-tmpfiles --create --prefix /var/log/journal
    sudo systemctl restart systemd-journald
    

Install on macOS

  1. Install Homebrew.

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
    
  2. Install Docker via Homebrew.

    brew cask install docker
    
  3. Start Docker.

    open /Applications/Docker.app
    

Note

Docker on macOS does not support running containers that use GPUs. As a result, Determined agents on macOS will only be able to run CPU-based workloads.