Skip to content

Operating PEDL with Kubernetes

This document describes how to install, configure, and upgrade a PEDL deployment that is running on Kubernetes.

Concepts

In a standard ("bare metal") installation of PEDL, each PEDL agent runs workloads by launching containers via the local Docker daemon on each agent machine. Some customers prefer using PEDL in this mode, because it does not require installing or configuring a third-party cluster manager or container orchestration system.

PEDL also supports Kubernetes-based deployments; this can be convenient for customers that already use Kubernetes for container orchestration. In this mode, PEDL is installed as a Helm package.

Prerequisites

  • Kubernetes 1.8+

Installing the Chart

To install the chart with the release name my-release:

$ helm install --name my-release deploy/kubernetes/pedl-*.tgz

This command deploys the PEDL Master and an accompanying PostgreSQL database on the Kubernetes cluster in the default configuration. The configuration section lists the parameters that can be configured during installation.

Tip: List all releases using helm list.

Upgrading the Chart

To upgrade the my-release deployment:

helm upgrade my-release deploy/kubernetes/pedl-*.tgz

Uninstalling the Chart

To uninstall/delete the my-release deployment:

$ helm delete my-release

Managing PEDL Agents

Agent pods are managed via Kubernetes DaemonSets and node label selectors.

To add PEDL agents to the cluster, label the desired nodes with determined.ai/pedl-agent-gpus:

kubectl label nodes <node-name> determined.ai/pedl-agent-gpus=<count>

The count can be any value between 1 and agent.maxGPUSlots.

When agent.enableCPUScheduling is set, CPU shares can be scheduled with determined.ai/pedl-agent-cpus instead:

kubectl label nodes <node-name> determined.ai/pedl-agent-cpus=<count>

To remove an agent from the node where the agent is running, remove the label from node:

kubectl label nodes <node-name> determined.ai/pedl-agent-gpus-

Configuration

The following table lists the configurable parameters of the PEDL chart and their default values.

Parameter Description Default
masterImage PEDL master image repository determinedai/pedl-master
agentImage PEDL agent image repository determinedai/pedl-agent
agent.enableCPUScheduling Enable agents to schedule tasks on CPUs false
agent.maxCPUSlots The maximum CPUs agents will reserve for trial runner slots 0
agent.maxGPUSlots The maximum GPUs agents will reserve for trial runner slots 16
agent.resources PEDL agent resource limits and requests limits: {cpu: 1, memory: 2Gi}, requests: {cpu: 0.1, memory: 256Mi}
trialRunner.network The docker network mode of the trial runner bridge
trialRunner.uid The UID to use when running a trial runner container nil
trialRunner.gid The GID to use when running a trial runner container nil
imagePullPolicy Image pull policy IfNotPresent
resources PEDL master resource limits and requests limits: {cpu: 2, memory: 8Gi}, requests: {cpu: 1, memory: 2Gi}
nodeSelector Node labels for pod assignment {}
tolerations Toleration labels for pod assignment []
registry.server Determined AI registry server https://index.docker.io/v1/
registry.user Determined AI registry username determinedaicustomer
registry.password Determined AI registry password aPGMABpTTW6Aj2LtseRZCnVD9W3kJvtsJNVzrapD
registry.email Determined AI registry email hello@determined.ai
service.type ClusterIP, NodePort, or LoadBalancer ClusterIP
service.port External port available to pods in the cluster 8080
service.externalIPs External IP addresses connected to the service. nil
service.nodePort Exposed node port for service type NodePort nil
service.clusterIP Manual IP address assigned to the service nil
service.loadBalancerIP Manual IP address assigned to the load balancer nil
service.loadBalancerSourceRanges Restrict traffic through the load balancer to the client IP ranges nil
service.annotations Additional annotations to append to the service nil
ingress.enabled Enable ingress controller resource false
ingress.annotations Specify ingress class nil
ingress.hosts PEDL master hostnames nil
ingress.tls TLS certificates associated with an Ingress nil

PostgreSQL Configuration

The following tables lists the configurable parameters of the PostgreSQL dependency and their default values.

N.B. These configurations should be under the postgresql prefix (e.g. postgresql.image).

Parameter Description Default
image postgres image repository postgres
imageTag postgres image tag 10.7
imagePullPolicy Image pull policy Always if imageTag is latest, else IfNotPresent
imagePullSecrets Image pull secrets nil
resources Postgres resource limits and requests limits: {cpu: 2, memory: 8GB}, requests: {cpu: 1, memory: 2GB}
postgresUser Username of new user to create. pedl
postgresPassword Password for the new user. pedl
postgresDatabase Name for new database to create. pedl
postgresInitdbArgs Initdb Arguments nil
schedulerName Name of an alternate scheduler nil
postgresConfig Runtime Config Parameters nil
persistence.enabled Use a PVC to persist data true
persistence.existingClaim Provide an existing PersistentVolumeClaim nil
persistence.storageClass Storage class of backing PVC nil (uses alpha storage class annotation)
persistence.accessMode Use volume as ReadOnly or ReadWrite ReadWriteOnce
persistence.annotations Persistent Volume annotations {}
persistence.size Size of data volume 100Gi
persistence.subPath Subdirectory of the volume to mount at postgresql-db
persistence.mountPath Mount path of data volume /var/lib/postgresql/data/pgdata
resources CPU/Memory resource requests/limits Memory: 256Mi, CPU: 100m
service.externalIPs External IPs to listen on []
service.port TCP port 5432
nodeSelector Node labels for pod assignment {}
affinity Affinity settings for pod assignment {}
tolerations Toleration labels for pod assignment []

Persistence

The PEDL chart mounts a Persistent Volume for the PostgreSQL instance. If the PersistentVolumeClaim should not be managed by the chart, define postgresql.persistence.existingClaim.

Existing PersistentVolumeClaims

  1. Create the PersistentVolume
  2. Create the PersistentVolumeClaim
  3. Install the chart
$ helm install --set postgresql.persistence.existingClaim=PVC_NAME pedl-*.tgz