Shortcuts

Setting up an AWS Kubernetes (EKS) Cluster

Determined can be installed on a cluster that is hosted on a managed Kubernetes service such as Amazon EKS. This document describes how to set up an EKS cluster with GPU-enabled nodes. The recommended setup includes deploying a cluster with a single non-GPU node that will host the Determined master and database, and an autoscaling group of GPU nodes. After creating a suitable EKS cluster, you can then proceed with the standard instructions for installing Determined on Kubernetes.

Determined requires the Kubernetes cluster to be running version >= 1.15 and to have GPU-enabled nodes.

Prerequisites

Before setting up an EKS cluster, the user should have the latest versions of AWS CLI, kubectl, and eksctl installed on their local machine.

Additionally, make sure to be subscribed to the EKS-optimized AMI with GPU support. Continuing without subscribing will cause node creation to fail.

Making an S3 Bucket

One resource that eksctl does not automatically create is an S3 bucket, which is necessary for Determined to store checkpoints. To quickly create an S3 bucket, use the command:

aws s3 mb s3://<bucket-name>

The bucket name needs to be specified in both the eksctl cluster config as well as the Determined Helm chart.

Creating the Cluster

The quickest and easiest way to deploy an EKS cluster is with eksctl. eksctl supports cluster creation with either command line arguments or a cluster config file. Below is a template config that deploys a managed node group for Determined’s master instance, as well as an autoscaling GPU node group for workers. To fill in the template, insert the cluster name and S3 bucket name.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: <cluster-name> # Specify your cluster name here
  region: us-west-2 # The default region is us-west-2
  version: "1.15" # 1.16 and 1.17 are also supported

# Cluster availability zones must be explicitly named in order for single availability zone node groups to work.
availabilityZones:
  - "us-west-2b"
  - "us-west-2c"
  - "us-west-2d"

iam:
  withOIDC: true # Enables IAM IODC provider
  serviceAccounts:
  - metadata:
      name: checkpoint-storage-s3-bucket
      # If no namespace is set, "default" will be used.
      # Namespace will be created if it does not already exist.
      namespace: default
      labels:
        aws-usage: "determined-checkpoint-storage"
    attachPolicy: # Inline policy can be defined along with `attachPolicyARNs`
      Version: "2012-10-17"
      Statement:
      - Effect: Allow
        Action:
        - "s3:ListBucket"
        Resource: 'arn:aws:s3:::<bucket-name>' # Name of the previously created bucket
      - Effect: Allow
        Action:
        - "s3:GetObject"
        - "s3:PutObject"
        - "s3:DeleteObject"
        Resource: 'arn:aws:s3:::<bucket-name>/*'
  - metadata:
      name: cluster-autoscaler
      namespace: kube-system
      labels:
        aws-usage: "determined-cluster-autoscaler"
    attachPolicy:
      Version: "2012-10-17"
      Statement:
      - Effect: Allow
        Action:
        - "autoscaling:DescribeAutoScalingGroups"
        - "autoscaling:DescribeAutoScalingInstances"
        - "autoscaling:DescribeLaunchConfigurations"
        - "autoscaling:DescribeTags"
        - "autoscaling:SetDesiredCapacity"
        - "autoscaling:TerminateInstanceInAutoScalingGroup"
        - "ec2:DescribeLaunchTemplateVersions"
        Resource: '*'

managedNodeGroups:
  - name: managed-m5-2xlarge
    instanceType: m5.2xlarge
    availabilityZones:
      - us-west-2b
      - us-west-2c
      - us-west-2d
    minSize: 1
    maxSize: 2
    volumeSize: 200
    iam:
      withAddonPolicies:
        autoScaler: true
        cloudWatch: true
    ssh:
      allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key
    labels:
      nodegroup-type: m5.2xlarge
      nodegroup-role: cpu-worker
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/user-eks: "owned"
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: m5.2xlarge
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-role: cpu-worker

nodeGroups:
  - name: p2-8xlarge-us-west-2b
    instanceType: p2.8xlarge # 8 GPUs per machine
    # Restrict to a single AZ to optimize data transfer between instances
    availabilityZones:
      - us-west-2b
    minSize: 0
    maxSize: 2
    volumeSize: 200
    volumeType: gp2
    iam:
      withAddonPolicies:
        autoScaler: true
        cloudWatch: true
    ssh:
      allow: true # This will use ~/.ssh/id_rsa.pub as the default ssh key.
    labels:
      nodegroup-type: p2.8xlarge-us-west-2b
      nodegroup-role: gpu-worker
      # https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#special-note-on-gpu-instances
      k8s.amazonaws.com/accelerator: nvidia-tesla-k80
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/user-eks: "owned"
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: p2.8xlarge-us-west-2b
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-role: gpu-worker

The cluster specified above allows users to run experiments on an untainted p2.8xlarge instances with minor additions to their experiment configs. To create a cluster with tainted instances, see the Tainting Nodes section below.

To launch the cluster with eksctl, run:

eksctl create cluster --config-file <cluster config yaml>

Note

For an experiment to run, its config must be modified to specify a service account for S3 access . An example of this is provided in the Configuring Per-Task Pod Specs section of the Custom Pod Specs guide.

Creating a kubeconfig for EKS

After creating the cluster, kubectl should be used to deploy apps. In order for kubectl to be used with EKS, users need to create or update the cluster kubeconfig. This can be done with the command:

aws eks --region <region-code> update-kubeconfig --name <cluster_name>

Enabling GPU support

To use GPU instances, the NVIDIA Kubernetes device plugin needs to be installed. Use the following command to install the plugin:

# Deploy a DaemonSet that enables the GPUs.
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

Enabling Autoscaler

Lastly, EKS requires manual deployment of an autoscaler. Save the following configuration in a new file such as determined-autoscaler.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '8085'
    spec:
      serviceAccountName: cluster-autoscaler
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: "Equal"
          value: "true"
          effect: NoSchedule
      containers:
        - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.17.3
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --scale-down-delay-after-add=5m
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<cluster-name>
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-bundle.crt"

To deploy an autoscaler that works with Determined, apply the official autoscaler configuration first, then apply the custom determined-autoscaler.yaml.

# Apply the official autoscaler configuration
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-control-plane.yaml

# Apply the custom deployment
kubectl apply -f <cluster-autoscaler yaml, e.g. `determined-autoscaler.yaml`>

Changes to Experiment Config

To run an experiment with EKS, two additions must be made to the experiment config. A service account must be specified in order to allow Determined to save checkpoints to S3 and tolerances, if there are tainted nodes, must be listed for the experiment to be scheduled. An example of the necessary changes is shown here:

environment:
  pod_spec:
    ...
    spec:
      ...
      serviceAccountName: checkpoint-storage-s3-bucket
      # Tolerations should only be included if nodes are tainted
      tolerations:
        - key: <tainted-group-key, e.g p2.8xlarge-us-west-2b>
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"

Details about pod configuration can be found in Per-Task Pod Specs.

Changes to Determined

Following the deployment of EKS, make sure that the necessary changes to Determined have been applied in order to successfully run experiments. These changes include adding the created S3 bucket to Determined’s Helm chart and specifying a service account in the default pod specs. When modifying the Helm chart to include S3, no keys or endpoint urls are needed. Additionally, if running on tainted nodes, be sure to add pod tolerations to the experiment spec to ensure they will get scheduled.

Managing an EKS Cluster

Node Taints

For general instructions on adding taints and tolerations to nodes, please see the Taints and Tolerations section in our Guide to Kubernetes. There, you can find an explanation of taints and tolerations, as well as instructions for using kubectl to add them to existing clusters.

It is important to note that if you use EKS to create nodes with taints, you must also add tolerations using kubectl; otherwise, Kubernetes will be unable to schedule pods on the tainted node.

To taint nodes, users will need to add a taint type and a tag to the node group specified in the cluster config from Creating the Cluster. An example of the modifications is shown for a p2.8xlarge node group:

- name: p2-8xlarge-us-west-2b
  ...
  taints:
    p2.8xlarge-us-west-2b: "true:NoSchedule"
  ...
  tags:
    ...
    k8s.io/cluster-autoscaler/node-template/taint/p2.8xlarge-us-west-2b: "true:NoSchedule"

Furthermore, tainting requires changes to be made to the GPU enabling DaemonSet and more additions to the experiment config. First, to change the DaemonSet, save a copy of the official version and make the following additions to its tolerations:

spec:
  tolerations:
  ...
  - key: p2.8xlarge-us-west-2b
    operator: Exists
    effect: NoSchedule

To modify the experiment config to run on tainted nodes, refer to the Changes to Experiment Config section.