Install Determined Using det-deploy¶
This document shows how to deploy Determined locally or in a production
cluster using the det-deploy
command-line tool, which automates the
process of starting Determined as a collection of Docker containers.
In a typical production setup, the master and agents will run on separate machines. They can also run on a single machine, which is especially useful for local development. This guide provides instructions for both scenarios.
Configuring and Starting the Cluster¶
A configuration file is needed to set important values in the master, such as where to save model checkpoints. For information about how to create a configuration file, see Cluster Configuration. There are also sample configuration files available.
Note
det-deploy
will use a default configuration file if you don’t
provide one. It also transparently manages PostgreSQL along with the
master, so the configuration options related to those services do not
need to be set.
Deploying a Single-Node Cluster¶
For local development or small clusters (such as a GPU workstation), you may wish to install both a master and an agent on the same node. To do this, run one of the following commands:
# If the machine has GPUs:
det-deploy local cluster-up
# If the machine doesn't have GPUs:
det-deploy local cluster-up --no-gpu
This will start a master and an agent on that machine. To verify that
the master is running, navigate to http://<master-hostname>:8080
in
a browser, which should bring up the Determined WebUI. If you’re using
your local machine, for example, navigate to http://localhost:8080
.
In the WebUI, navigate to the Cluster
page, where you should now see
slots available (either CPU or GPU, depending on what hardware is
available on the machine).
For production deployments, you’ll want to use a cluster
configuration file. To provide this
configuration file to det-deploy
, use:
det-deploy local cluster-up --master-config-path <path to master.yaml>
If you want to create more than one agent locally, you can use:
det-deploy local cluster-up --agents <number of agents>
Stopping a Single-Node Cluster¶
To stop a Determined cluster, on the machine where a Determined cluster is currently running, run
det-deploy local cluster-down
Note
det-deploy local cluster-down
will not remove any agents created
with det-deploy local agent-up
. To remove these agents, use
det-deploy local agent-down
.
Deploying a Standalone Master¶
In many cases, your Determined cluster will be split across multiple nodes. In this case you will need to start a master and agents separately. In order to start a standalone master, run:
det-deploy local master-up
Note
For production deployments, you’ll want to use a cluster
configuration file. To provide this
configuration file to det-deploy
, use the flag
--master-config-path <path to master.yaml>
.
To stop a running master, run:
det-deploy local master-down
Deploying Agents¶
To deploy a standalone agent on a machine, run one of the following commands:
# If the machine has GPUs:
det-deploy local agent-up <master_hostname>
# If the machine doesn't have GPUs:
det-deploy local agent-up --no-gpu <master_hostname>
This will create an agent on that machine. To verify whether it has
successfully connected to the master, navigate to the WebUI and check
whether slots have appeared on the Cluster
page.
To stop a running agent, run:
det-deploy local agent-down