Install Determined Using det deploy

This document shows how to deploy Determined locally or in a production cluster using the det deploy command-line tool, which automates the process of starting Determined as a collection of Docker containers. det deploy can also be used to install Determined on the cloud; for more information, see the AWS and GCP installation guides, respectively.

In a typical production setup, the master and agents will run on separate machines. They can also run on a single machine, which is especially useful for local development. This guide provides instructions for both scenarios.

Preliminary Setup

Note

Docker must be installed to use det deploy for local installations. Please refer to our installation steps for Docker.

Install determined python package by running

pip install determined

Configuring and Starting the Cluster

A configuration file is needed to set important values in the master, such as where to save model checkpoints. For information about how to create a configuration file, see Cluster Configuration. There are also sample configuration files available.

Note

det deploy will use a default configuration file if you don’t provide one. It also transparently manages PostgreSQL along with the master, so the configuration options related to those services do not need to be set.

Deploying a Single-Node Cluster

For local development or small clusters (such as a GPU workstation), you may wish to install both a master and an agent on the same node. To do this, run one of the following commands:

# If the machine has GPUs:
det deploy local cluster-up

# If the machine doesn't have GPUs:
det deploy local cluster-up --no-gpu

This will start a master and an agent on that machine. To verify that the master is running, navigate to http://<master-hostname>:8080 in a browser, which should bring up the Determined WebUI. If you’re using your local machine, for example, navigate to http://localhost:8080.

In the WebUI, navigate to the Cluster page, where you should now see slots available (either CPU or GPU, depending on what hardware is available on the machine).

For single-agent clusters launched with:

det deploy local cluster-up --auto-bind-mount <absolute directory path>

the cluster will automatically make the specified directory available to tasks on the cluster as ./shared_fs. If --auto-bind-mount is not specified, the cluster will default to mounting your home directory. This will allow you to access your local preferences and any relevant files stored in the specified directory with the cluster’s notebooks, shells, and tensorboard tasks. To disable this feature, use:

det deploy local cluster-up --no-auto-bind-mount

For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det deploy, use:

det deploy local cluster-up --master-config-path <path to master.yaml>

If you want to create more than one agent locally, you can use:

det deploy local cluster-up --agents <number of agents>

Stopping a Single-Node Cluster

To stop a Determined cluster, on the machine where a Determined cluster is currently running, run

det deploy local cluster-down

Note

det deploy local cluster-down will not remove any agents created with det deploy local agent-up. To remove these agents, use det deploy local agent-down.

Deploying a Standalone Master

In many cases, your Determined cluster will consist of multiple nodes. In this case you will need to start a master and agents separately. In order to start a standalone master, run:

det deploy local master-up

Note

For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det deploy, use the flag --master-config-path <path to master.yaml>.

To stop a running master, run:

det deploy local master-down

Deploying Agents

To deploy a standalone agent on a machine, run one of the following commands:

# If the machine has GPUs:
det deploy local agent-up <master_hostname>

# If the machine doesn't have GPUs:
det deploy local agent-up --no-gpu <master_hostname>

This will create an agent on that machine. To verify whether it has successfully connected to the master, navigate to the WebUI and check whether slots have appeared on the Cluster page.

To launch the agent into a specific resource pool, use the --agent-resource-pool flag:

det deploy local agent-up --agent-resource-pool=<resource_pool> <master_hostname>

For more information about resource pools, see Resource Pools.

To stop a running agent, run:

det deploy local agent-down