Dynamic Agents on GCP¶
This document describes how to install, configure, and upgrade a deployment of PEDL with Dynamic Agents on GCP
Compute Engine Project¶
The PEDL master and the PEDL agents are intended to run in the same project.
When using Dynamic Agents on GCP, PEDL identifies the Compute Engine instances that it is managing using a configurable instance label (see configuration for details). Administrators should be careful to ensure that this label is not used by other Compute Engine instances that are launched outside of PEDL; if that assumption is violated, unexpected behavior may occur.
Compute Engine Images¶
The PEDL master node will run on a custom image that will be shared with you by Determined AI.
PEDL agent nodes will run on a custom image that will be shared with you by Determined AI.
Compute Engine Machine Types¶
- The PEDL master node should be deployed on a Compute Engine instance with >= 2 CPUs (Intel
Broadwell or later), 4GB of RAM, and 100GB of disk storage. This would be a Compute Engine
n1-standard-2or more powerful.
GCP API Access¶
The PEDL master needs to run as a service account that has the permissions to manage Compute Engine instances. There are two options:
Create a particular service account with the
Compute Adminrole. Then set the PEDL master to use this account. See Compute Engine IAM roles for more details on how to configure the service account.
In order for the PEDL agent to be associated with a service account, the PEDL master needs to have access to service accounts. Please ensure the service account of the PEDL master has the
Service Account Userrole.
In order for the PEDL agent to use a shared VPC, the service account that the master runs with needs to have the
Compute Network Userrole.
Use the default service account and add the
Compute Engine: Read Writescope.
Optionally, the PEDL agent may be associated with a service account.
Access scopes are the legacy method of specifying permissions for your instance. A best practice is to set the full cloud-platform access scope on the instance, then securely limit the service account's API access with Cloud IAM roles. See Access Scopes for details.
See Network Requirements for details.
The PEDL Cluster is configured with
master.yaml file located at
/usr/local/pedl/etc on the
PEDL master instance. Below you'll find an example configuration. See
Cluster Configuration for details.
provisioner: master_url: <scheme://host:port> startup_script: <startup script> agent_docker_network: pedl max_idle_agent_period: 5m provider: gcp base_config: <instance resource base configuration> project: <project id> zone: <zone> boot_disk_size: 200 boot_disk_source_image: projects/<project-id>/global/images/<image-name> label_key: <label key for agent discovery> label_value: <label value for agent discovery> name_prefix: <name prefix> network_interface: network: projects/<project>/global/networks/<network> subnetwork: projects/<project>/regions/<region>/subnetworks/<subnetwork> external_ip: false network_tags: ["<tag1>", "<tag2>"] service_account: email: "<service account email>" scopes: ["https://www.googleapis.com/auth/cloud-platform"] instance_type: machine_type: n1-standard-32 gpu_type: nvidia-tesla-v100 gpu_num: 4 max_instances: 5
How to attach a disk containing a data set to each dynamic agent¶
If your input data set is on a persistent disk, you can attach that disk to each dynamic agent by using the base instance configuration and preparing commands. The following is an example configuration. See REST Resource: instances for the full list of configuration options supported by GCP. See Formatting and mounting a zonal persistent disk for more examples of formatting or mounting disks in GCP.
Here is an example master configuration of attaching a second existing disk.
provisioner: startup_script: | lsblk mkdir -p /mnt/disks/second mount -o discard,defaults /dev/sdb1 /mnt/disks/second lsblk provider: gcp base_config: disks: - mode: READ_ONLY boot: false source: zones/<zone>/disks/<the name of the existing disk> autoDelete: false boot_disk_size: 200 boot_disk_source_image: projects/<project>/global/images/<image name>
If a specific non-root user needs to access the disk, please run the tasks linked with the POSIX UID/GID of the user (See Running tasks as particular agent users for details.) and grant access to the corresponding UID/GID.
After installing the master, you can use the following command to validate if you could read and write on the attached disk.
cat > command.yaml << EOF bind_mounts: - host_path: /mnt/disks/second container_path: /second EOF # Test attached read-only disk. pedl command run --config-file command.yaml ls -l /second
These instructions describe how to install PEDL for the first time; for directions on how to upgrade an existing PEDL installation, see the Upgrades section below.
Ensure that you are using the most up-to-date PEDL images. Keep the image IDs handy as we will need them later.
To install the master, we will launch an instance from the PEDL master image.
Let's start by navigating to the Compute Engine Dashboard of the GCP Console. Click "Create Instance" and follow the instructions below:
Choose Machine Type: we recommend a
n1-standard-2or more powerful.
Configure Boot Disk:
a. Choose Boot Disk Image: find the PEDL master image in "Images" and click "Select".
b. Set Boot Disk Size: set
Sizeto be at least 100GB. If you have a previous PEDL installation that you are upgrading, you want to use the snapshot or existing disk. This disk will be used to store all your experiment metadata and checkpoints.
Configure Identity and API access: choose the
service accountaccording to these requirements.
Configure Firewalls: choose or create a security group according to these Network Requirements. Check off
Allow HTTP traffic.
Review and launch the instance.
SSH into the PEDL master and edit the config at
/usr/local/pedl/etc/master.yamlaccording to the guide on Cluster Configuration.
Start the PEDL master by entering
make -C /usr/local/pedl enable-masterinto the terminal.
There is no installation needed for the Agent. The PEDL master will dynamically launch PEDL agent instances based on the Cluster Configuration.
Upgrading an existing PEDL installation with Dynamic Agents on GCP requires the same steps as an installation without dynamic agents. See upgrades.