Contents Menu Expand Light mode Dark mode Auto light/dark mode
Determined AI Documentation
Light Logo Dark Logo
version 0.21.0
  • Welcome

Getting Started

  • Try Determined
    • Run Your First Experiment
    • PyTorch MNIST Tutorial
    • PyTorch Porting Tutorial
    • TensorFlow Keras Fashion MNIST Tutorial
  • How It Works
    • Introduction to Determined
    • System Architecture
  • Quickstart for Model Developers
  • Examples
  • Model Hub Library
    • Huggingface Trainsformers
      • Tutorial
      • Examples
    • MMDetection

Set Up Determined

  • Set Up Guide
    • Deploy on Prem
      • Installation Requirements
      • Install Determined Using Docker
      • Install Determined Using det deploy
      • Install Determined Using Linux Packages
      • Install Determined Using Homebrew (macOS)
    • Deploy on AWS
      • Install Determined
      • Deploy Determined with Dynamic Agents
      • Use Spot Instances
    • Deploy on GCP
      • Install Determined
      • Deploy Determined with Dynamic Agents
    • Deploy on Kubernetes
      • Install Determined on Kubernetes
      • Set up and Manage an Azure Kubernetes Service (AKS) Cluster
      • Set up and Manage an AWS Kubernetes (EKS) Cluster
      • Set up and Manage a Google Kubernetes Engine (GKE) Cluster
      • Development Guide
      • Customize a Pod
      • Helm and Kubectl Command Examples
      • Troubleshooting
    • Deploy on Slurm/PBS
      • Installation Requirements
      • Install Determined on Slurm/PBS
      • Provide a Container Image Cache
      • HPC Launching Architecture
      • Known Issues
  • Basic Setup
  • Security
    • OAuth 2.0 Configuration
    • Transport Layer Security
    • OpenID Connect Integration
    • SAML Integration
    • SCIM Integration
    • RBAC
  • User Accounts
  • Workspaces and Projects
  • Logging and Elasticsearch
  • Cluster Usage History
  • Monitor Experiment Through Webhooks
    • Through Zapier
    • Through Slack
  • Upgrade
  • Troubleshooting

Model Developer Guide

  • Overview
  • Distributed Training
  • Prepare Container Environment
    • Set Environment Images
    • Customize Environment
  • Prepare Data
  • Training API Guides
    • Core API
    • PyTorch API
    • PyTorch Lightning API
    • Keras API
    • DeepSpeed API
      • Usage Guide
      • Advanced Usage
      • PyTorchTrial to DeepSpeedTrial
    • Estimator API
  • Hyperparameter Tuning
    • Configure Hyperparameter Ranges
    • Hyperparameter Search Constraints
    • Instrument Model Code
    • Handle Trial Errors and Early Stopping Requests
    • Search Methods
      • Adaptive (Asynchronous) Method
      • Grid Method
      • Random Method
      • Single Search Method
      • Custom Search Methods
  • Submit Experiment
  • How to Debug Models
  • Model Management
    • Checkpoints
    • Organize Models in the Model Registry
  • Best Practices

Reference

  • Overview
  • Python SDK
  • REST API
  • Training Reference
    • det
    • det.core
    • det.pytorch
    • det.pytorch.samplers
    • det.pytorch.deepspeed
    • det.pytorch.lightning
    • det.keras
    • det.estimator
    • Experiment Configuration
  • Model Hub Reference
    • MMDetection API
    • Transformers API
  • Deployment Reference
    • Common Configuration Options
    • Master Configuration Reference
    • Agent Configuration Reference
    • Helm Chart Configuration Reference
  • Job Configuration Reference
  • Custom Searcher Reference

Tools

  • Commands and Shells
  • WebUI Interface
  • Jupyter Notebooks
  • TensorBoards
  • Exposing Custom Ports

Integrations

  • Works with Determined
  • IDE Integration
  • Prometheus and Grafana
  • Open Source Licenses

Model Developer GuideΒΆ

How Determined Works

Learn about core concepts, key features, and system architecture.

Preparing Container Environment

Resources for preparing your container environment.

Preparing Data

What is the best way to load data into your ML models? This depends on several factors...

Using a Training API

Learn how to work with Training APIs and configure your distributed training experiments.

Hyperparameter Tuning

Conceptual information about why hyperparameter tuning can be challenging and why it's important.

Submitting Experiment

Find out how to run an experiment by providing a launcher.

Debugging Models

Step-by-step instructions for debugging your models.

Managing Models

Model management involves using and deleting checkpoints, archiving experiments, and managing trained models.

Best Practices

General tips for the trial definition, and best practices for separating configuration from code.

Next
Distributed Training with Determined
Previous
Troubleshooting
Copyright © 2023, Determined AI