Install Determined Using det-deploy¶
This document shows how to deploy Determined locally or in a production
cluster using the det-deploy
command-line tool, which automates the
process of starting Determined as a collection of Docker containers.
In a typical production setup, the master and agents will run on separate machines. They can also run on a single machine, which is especially useful for local development. This guide provides instructions for both scenarios.
Configuring and Starting the Cluster¶
A configuration file is needed to set important values in the master,
such as where to save model checkpoints. For information about how to
create a configuration file, see Cluster Configuration. There are also
sample configuration files available. The
configuration file you create must be named master.yaml
(but it can be in any
directory).
Note
det-deploy
will use a default configuration file if you don’t provide one.
det-deploy
also transparently manages PostgreSQL and Hasura along with
the master, so the configuration options related to those services do
not need to be set.
Deploying a Single Node Cluster¶
For local development or small clusters (such as a GPU workstation), you may wish to to install both a master and an agent on the same node. To do this, run:
det-deploy local fixture-up
This will start a master and an agent on that machine. To verify that the
master is running, navigate to http://<master-hostname>:8080
in a
browser, which should bring up the Determined WebUI. If you’re using
your local machine, for example, navigate to http://localhost:8080
.
In the WebUI, navigate to the Cluster
page, where you should now see
slots available (either CPU or GPU, depending on what hardware is
available on the machine).
For production deployments, you’ll want to use a cluster configuration file.
To provide this configuration file to det-deploy
, use:
det-deploy local fixture-up --etc-root <directory containing master.yaml>``
If you want to create more than one agent locally, you can use:
det-deploy local fixture-up --agents <number of agents>
Stopping a Single Node Cluster¶
To stop a determined cluster, on the machine where a determined cluster is currently running, run
det-deploy local fixture-down
Note
det-deploy local fixture-down
will not remove any agents created
with det-deploy local agent-up
. To remove these agents,
use det-deploy local agent-down
Deploying a Standalone Master¶
In many cases, your Determined cluster will be split across multiple nodes. In this case you will need to start a master and agents separately. In order to start a standalone master, run:
det-deploy local master-up
Note
For production deployments, you’ll want to use a cluster configuration file.
To provide this configuration file to det-deploy
, use the flag --etc-root <directory containing master.yaml>
.
To stop a running master, run:
det-deploy local master-down
Deploying Agents¶
To deploy a standalone agent, on the machine you want to run the agent run:
det-deploy local agent-up <master_hostname>
This will create an agent on that machine. To verify whether it has
successfully connected to the master, navigate to the WebUI and check
whether slots have appeared on the Cluster
page.
To stop a running agent, run:
det-deploy local agent-down