Dynamic Agents on AWS

This document describes how to install, configure, and upgrade a deployment of PEDL with Dynamic Agents on AWS. See the topic guide on dynamic agents for an overview of dynamic agents in PEDL.

System Requirements

EC2 Instance Tags

An important assumption of PEDL with Dynamic Agents is that any EC2 instances with the configured tag_key:tag_value pair are managed by the PEDL master (See configuration). If this pair is not unique to your PEDL installation, there will be unexpected behavior for your installation of PEDL and any EC2 instances with the configured tag_key:tag_value pair.

EC2 AMIs

  • The PEDL master node will run on a custom AMI that will be shared with you by Determined AI.

  • PEDL agent nodes will run on a custom AMI that will be shared with you by Determined AI.

EC2 Instance Types

  • The PEDL master node should be deployed on an EC2 instance supporting >= 2 CPUs (Intel Broadwell or later), 4GB of RAM, and 100GB of disk storage. This would be an EC2 t2.medium or more powerful.

  • Each PEDL agent node must be any of the P3 or P2 instances on AWS. This can be configured in the Cluster Configuration.

Master IAM Role

The PEDL master needs to have an IAM role with the following permissions:

  • ec2:CreateTags: used to tag the PEDL agent instances that the PEDL master provisions. These tags are configured by the Cluster Configuration.

  • ec2:DescribeInstances: used to find active PEDL agent instances based on tags.

  • ec2:RunInstances: used to provision PEDL agent instances.

  • ec2:TerminateInstances: used to terminate idle PEDL agent instances.

An example IAM policy with the appropriate permissions is below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:TerminateInstances",
                "ec2:CreateTags",
                "ec2:RunInstances"
            ],
            "Resource": "*"
        }
    ]
}

If you need to attach an instance profile to the agent, make sure to add PassRole policy to the master role for the agent role. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "<arn::agent-role>"
    }
  ]
}

See Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances for details.

Network Requirements

See Network Requirements for details.

Cluster Configuration

The PEDL Cluster is configured with master.yaml file located at /usr/local/pedl/etc/ on the PEDL master instance. Below you'll find an example configuration. See Cluster Configuration for details.

provisioner:
  master_url: <scheme://host:port>
  startup_script: <startup script>
  agent_docker_network: pedl
  max_idle_agent_period: 5m

  provider: aws
  region: <region>
  root_volume_size: 200
  image_id: <AMI id>
  tag_key: <tag key for agent discovery>
  tag_value: <tag value for agent discovery>
  instance_name: determined-ai-agent
  ssh_key_name: <ssh key name>
  iam_instance_profile_arn: <iam instance profile arn>
  network_interface:
    public_ip: true
    security_group_id: <security group id>
    subnet_id: <subnet id>
  instance_type: p3.8xlarge
  max_instances: 5

Installation

These instructions describe how to install PEDL for the first time; for directions on how to upgrade an existing PEDL installation, see the Upgrades section below.

Ensure that you are using the most up-to-date PEDL AMIs. Keep the AMI IDs handy as we will need them later (e.g., ami-0f4677bfc3161edc8).

Master

To install the master, we will launch an instance from the PEDL master AMI.

Let's start by navigating to the EC2 Dashboard of the AWS Console. Click "Launch Instance" and follow the instructions below:

  1. Choose AMI: find the PEDL Master AMI in "My AMIs" and click "Select".

  2. Choose Instance Type: we recommend a t2.medium or more powerful.

  3. Configure Instance: choose the IAM role according to these requirements.

  4. Add Storage: click Add New Volume and add an EBS volume of at least 100GB. If you have a previous PEDL installation that you are upgrading, you want to use the attach the same EBS volume as the previous installation. This volume will be used to store all your experiment metadata and checkpoints.

  5. Configure Security Group: choose or create a security group according to these Network Requirements.

  6. Review and launch the instance.

  7. SSH into the PEDL master and edit the config at /usr/local/pedl/etc/master.yaml according to the guide on Cluster Configuration.

  8. Start the PEDL master by entering make -C /usr/local/pedl enable-master into the terminal.

Agent

There is no installation needed for the Agent. The PEDL master will dynamically launch PEDL agent instances based on the Cluster Configuration.

Upgrades

Upgrading an existing PEDL installation with Dynamic Agents on AWS requires the same steps as an installation without dynamic agents. See upgrades.