Shortcuts

Install Determined Using det-deploy

This document shows how to deploy Determined locally or in a production cluster using the det-deploy command-line tool, which automates the process of starting Determined as a collection of Docker containers.

In a typical production setup, the master and agents will run on separate machines. They can also run on a single machine, which is especially useful for local development. This guide provides instructions for both scenarios.

Preliminary Setup

Install det-deploy by running

pip install determined-deploy

Configuring and Starting the Cluster

A configuration file is needed to set important values in the master, such as where to save model checkpoints. For information about how to create a configuration file, see Cluster Configuration. There are also sample configuration files available.

Note

det-deploy will use a default configuration file if you don’t provide one. It also transparently manages PostgreSQL along with the master, so the configuration options related to those services do not need to be set.

Deploying a Single-Node Cluster

For local development or small clusters (such as a GPU workstation), you may wish to install both a master and an agent on the same node. To do this, run one of the following commands:

# If the machine has GPUs:
det-deploy local cluster-up

# If the machine doesn't have GPUs:
det-deploy local cluster-up --no-gpu

This will start a master and an agent on that machine. To verify that the master is running, navigate to http://<master-hostname>:8080 in a browser, which should bring up the Determined WebUI. If you’re using your local machine, for example, navigate to http://localhost:8080.

In the WebUI, navigate to the Cluster page, where you should now see slots available (either CPU or GPU, depending on what hardware is available on the machine).

For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det-deploy, use:

det-deploy local cluster-up --master-config-path <path to master.yaml>

If you want to create more than one agent locally, you can use:

det-deploy local cluster-up --agents <number of agents>

Stopping a Single-Node Cluster

To stop a Determined cluster, on the machine where a Determined cluster is currently running, run

det-deploy local cluster-down

Note

det-deploy local cluster-down will not remove any agents created with det-deploy local agent-up. To remove these agents, use det-deploy local agent-down.

Deploying a Standalone Master

In many cases, your Determined cluster will be split across multiple nodes. In this case you will need to start a master and agents separately. In order to start a standalone master, run:

det-deploy local master-up

Note

For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det-deploy, use the flag --master-config-path <path to master.yaml>.

To stop a running master, run:

det-deploy local master-down

Deploying Agents

To deploy a standalone agent on a machine, run one of the following commands:

# If the machine has GPUs:
det-deploy local agent-up <master_hostname>

# If the machine doesn't have GPUs:
det-deploy local agent-up --no-gpu <master_hostname>

This will create an agent on that machine. To verify whether it has successfully connected to the master, navigate to the WebUI and check whether slots have appeared on the Cluster page.

To stop a running agent, run:

det-deploy local agent-down