Install Determined Using Docker¶
Preliminary Setup¶
Install Docker on all machines in the cluster. If the agents have GPUs, ensure that the nvidia-docker2 installation on each one is working as expected.
Pull the official Docker images for PostgreSQL and Hasura. We recommend using the versions listed below.
docker pull postgres:10 docker pull hasura/graphql-engine:v1.1.0
These images are not provided by Determined AI; please see their respective Docker Hub pages (PostgreSQL, Hasura) for more information.
Pull the Docker image for the master or agent on each machine (replace
VERSION
with a valid Determined version, such as the current version, 0.12.3):docker pull determinedai/determined-master:VERSION docker pull determinedai/determined-agent:VERSION
Configuring and Starting the Cluster¶
Determined Master and Agents¶
Configuration values can come from a file, environment variables, or command-line arguments. To get a default configuration file for the master, which contains a listing of the available options and descriptions for them, run:
id="$(docker create determinedai/determined-master:VERSION)"
docker cp "$id":/etc/determined/master.yaml .
docker rm "$id"
Then edit the master.yaml
configuration file as appropriate and start the
master with the edited configuration:
docker run -v "$PWD"/master.yaml:/etc/determined/master.yaml determinedai/determined-master:VERSION
The process for starting an agent is analogous, with the caveat that the agent
container must bind mount the host’s Docker daemon socket. This allows the
determinedai/determined-agent
container to orchestrate the containers that
execute trials and notebook servers:
docker run -v /var/run/docker.sock:/var/run/docker.sock \
-v "$PWD"/agent.yaml:/etc/determined/agent.yaml determinedai/determined-agent:VERSION
Environment variables and command-line arguments can be passed as usual for Docker:
docker run -e DET_DB_HOST=the-db-host determinedai/determined-master:VERSION --db-port=5432
docker run -v /var/run/docker.sock:/var/run/docker.sock \
-e DET_MASTER_HOST=the-master-host determinedai/determined-agent:VERSION run --master-port=8080
Note that the agent requires run
as the first argument if any
arguments are provided.
By default, the agent will use all the GPUs on the machine to run
Determined tasks; this behavior can be changed at startup using the
NVIDIA_VISIBLE_DEVICES
environment variable. GPUs can also be disabled and enabled at runtime
using the det slot disable
and det slot enable
CLI commands,
respectively.
Docker Networking for Master and Agents¶
As with any Docker container, the networking mode of the master and
agent can be changed using the --network
option to docker run
.
In particular, host mode networking (--network host
) can be useful
to optimize performance and in situations where a container needs to
handle a large range of ports, as it does not require network address
translation (NAT) and no “userland-proxy” is created for each port.
The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.
See Docker’s documentation for more details.
Managing the Cluster¶
By default, docker run
will run in the foreground, so that a
container can be stopped simply by pressing Control-C. If you wish to
keep Determined running for the long term, consider running the
containers detached and/or
with restart policies.
Using our deployment tool is also an
option.