Shortcuts

Install Determined Using Debian Packages

For systems running Ubuntu 16.04 or 18.04, we support installing the Determined master and agent using Debian packages and running them as systemd services.

Preliminary Setup

PostgreSQL

Determined uses a PostgreSQL database to store experiment and trial metadata. You may either use your Linux distribution’s package and service or a Docker container.

With either of the following installation methods, PostgreSQL will typically run with a default of 100 max_connections, which is sufficient for Determined. You may validate by issuing the following query via a database client like psql:

SHOW max_connections;

Installing PostgreSQL via apt

We recommend installing the following version of PostgreSQL:

sudo apt install postgresql-10

Next, configure a system account that Determined will use to connect to PostgreSQL. For example, to use the default postgres user but update its password:

sudo -u postgres psql postgres
postgres=# \password postgres

Finally, create a database for Determined’s use:

postgres=# CREATE DATABASE determined;

Running PostgreSQL in Docker

Pull the official Docker image for PostgreSQL. We recommend using the version listed below.

docker pull postgres:10

This image is not provided by Determined AI; please see its Docker Hub page for more information.

Start PostgreSQL as follows:

docker run \
    -d \
    --restart unless-stopped \
    --name determined-db \
    -p 5432:5432 \
    -v determined_db:/var/lib/postgresql/data \
    -e POSTGRES_DB=determined \
    -e POSTGRES_PASSWORD=<Database password> \
    postgres:10

If the master will connect to PostgreSQL via Docker networking, exposing port 5432 via the -p argument isn’t necessary; however, you may still want to expose it for administrative or debugging purposes. In order to expose the port only on the master machine’s loopback network interface, pass -p 127.0.0.1:5432:5432 instead of -p 5432:5432.

Master and Agent

  1. Go to the webpage for the latest Determined release.

  2. Download the appropriate package file, which will have the name determined-master_VERSION_linux_amd64.deb (with VERSION replaced by an actual version, such as 0.12.5).

  3. Run

    sudo apt install <path to downloaded file>
    

Before running the Determined agent, you will have to install Docker on each agent machine and, if the machine has GPUs, ensure that the Nvidia Container Toolkit is working as expected.

Apart from that, the agent follows the same process as the master, except that “master” should be replaced by “agent” everywhere it appears.

Configuring and Starting the Cluster

  1. Ensure that an instance of PostgreSQL is running and accessible from the machine or machines where the master will be run.

  2. Edit the YAML configuration files at /etc/determined/master.yaml (for the master) and /etc/determined/agent.yaml (for each agent) as appropriate for your setup. Ensure that the user, password, and database name correspond to your PostgreSQL configuration, e.g.:

    db:
      host: <PostgreSQL server IP or hostname, e.g., 127.0.0.1 if running on the master>
      port: <PostgreSQL port, e.g., 5432 by default>
      name: <Database name, e.g., determined>
      user: <PostgreSQL user, e.g., postgres>
      password: <Database password>
    
  3. Start the master.

    sudo systemctl start determined-master
    

    The master can also be run directly with the command determined-master, which may be helpful for experimenting with Determined (e.g., testing different configuration options quickly before writing them to the configuration file).

  4. Verify that the master started successfully by viewing the log:

    journalctl -u determined-master
    

    You should see logging indicating that the master can successfully connect to the database, and the last line should indicate http server started on the configured WebUI port (8080 by default). You can also validate that the WebUI is running by navigating to http://<master>:8080 with your web browser (or https://<master>:8443 if TLS is enabled). You should see No Agents on the right-hand side of the top navigation bar.

  5. Start the agent on each agent machine:

    sudo systemctl start determined-agent
    

    Similarly, the agent can be run with the command determined-agent.

  6. Verify that each agent started successfully by viewing the log:

    journalctl -u determined-agent
    

    You should see logging indicating that the agent started successfully, detected compute devices, and connected to the master. On the Determined WebUI, you should now see slots available, both on the right-hand side of the top navigation bar, and if you select the Cluster view in the left-hand navigation panel.

Managing the Cluster

To configure a service to start running automatically when its machine boots up, run sudo systemctl enable <service>, where the service is determined-master or determined-agent. You can also use sudo systemctl enable --now <service> to enable and immediately start a service in one command.

To view the logging output of a service, run journalctl -u <service>.

To manually stop a service, run sudo systemctl stop <service>.