Skip to main content
Ctrl+K
Logo image
version 0.22.0
⌘+K
  • Welcome

Getting Started

  • How It Works
    • Introduction to Determined
    • System Architecture
  • Tutorials
    • Run Your First Experiment
    • PyTorch MNIST Tutorial
    • PyTorch Porting Tutorial
    • TensorFlow Keras Fashion MNIST Tutorial
  • Quickstart for Model Developers
  • Examples
  • Model Hub Library
    • Huggingface Trainsformers
      • Tutorial
      • Examples
    • MMDetection

Set Up Determined

  • Set Up Guide
    • Deploy on Prem
      • Installation Requirements
      • Install Determined Using Docker
      • Install Determined Using det deploy
      • Install Determined Using Linux Packages
      • Install Determined Using Homebrew (macOS)
    • Deploy on AWS
      • Install Determined
      • Deploy Determined with Dynamic Agents
      • Use Spot Instances
    • Deploy on GCP
      • Install Determined
      • Deploy Determined with Dynamic Agents
    • Deploy on Kubernetes
      • Install Determined on Kubernetes
      • Set up and Manage an Azure Kubernetes Service (AKS) Cluster
      • Set up and Manage an AWS Kubernetes (EKS) Cluster
      • Set up and Manage a Google Kubernetes Engine (GKE) Cluster
      • Development Guide
      • Customize a Pod
      • Helm and Kubectl Command Examples
      • Troubleshooting
    • Deploy on Slurm/PBS
      • Installation Requirements
      • Install Determined on Slurm/PBS
      • Provide a Container Image Cache
      • HPC Launching Architecture
      • Known Issues
  • Basic Setup
  • Security
    • OAuth 2.0 Configuration
    • Transport Layer Security
    • OpenID Connect Integration
    • SAML Integration
    • SCIM Integration
    • RBAC
  • User Accounts
  • Workspaces and Projects
  • Logging and Elasticsearch
  • Cluster Usage History
  • Monitor Experiment Through Webhooks
    • Through Zapier
    • Through Slack
  • Upgrade
  • Troubleshooting

Model Developer Guide

  • Overview
  • Distributed Training
  • Prepare Container Environment
    • Set Environment Images
    • Customize Environment
  • Prepare Data
  • Training API Guides
    • Core API
    • PyTorch API
    • PyTorch Lightning API
    • Keras API
    • DeepSpeed API
      • Usage Guide
      • Advanced Usage
      • PyTorchTrial to DeepSpeedTrial
    • Estimator API
  • Hyperparameter Tuning
    • Configure Hyperparameter Ranges
    • Hyperparameter Search Constraints
    • Instrument Model Code
    • Handle Trial Errors and Early Stopping Requests
    • Search Methods
      • Adaptive (Asynchronous) Method
      • Grid Method
      • Random Method
      • Single Search Method
      • Custom Search Methods
  • Submit Experiment
  • How to Debug Models
  • Model Management
    • Checkpoints
    • Organize Models in the Model Registry
  • Best Practices

Reference

  • Overview
  • Python SDK
  • REST API
  • Training Reference
    • det
    • det.core
    • det.pytorch
    • det.pytorch.samplers
    • det.pytorch.deepspeed
    • det.pytorch.lightning
    • det.keras
    • det.estimator
    • Experiment Configuration
  • Experiment Configuration Reference
  • Model Hub Reference
    • MMDetection API
    • Transformers API
  • Deployment Reference
    • Common Configuration Options
    • Master Configuration Reference
    • Agent Configuration Reference
    • Helm Chart Configuration Reference
  • Job Configuration Reference
  • Custom Searcher Reference
  • CLI Reference

Tools

  • Overview
  • CLI User Guide
  • Commands and Shells
  • WebUI Interface
  • Jupyter Notebooks
  • TensorBoards
  • Exposing Custom Ports

Integrations

  • Works with Determined
  • IDE Integration
  • Prometheus and Grafana
  • Open Source Licenses
Set Up Reference
Release Notes Blog

Model Developer Guide

Model Developer Guide#

How Determined Works

Learn about core concepts, key features, and system architecture.

Preparing Container Environment

Resources for preparing your container environment.

Preparing Data

What is the best way to load data into your ML models? This depends on several factors...

Using a Training API

Learn how to work with Training APIs and configure your distributed training experiments.

Hyperparameter Tuning

Conceptual information about why hyperparameter tuning can be challenging and why it's important.

Submitting Experiment

Find out how to run an experiment by providing a launcher.

Debugging Models

Step-by-step instructions for debugging your models.

Managing Models

Model management involves using and deleting checkpoints, archiving experiments, and managing trained models.

Best Practices

General tips for the trial definition, and best practices for separating configuration from code.

previous

Troubleshooting

next

Distributed Training with Determined

By hello@determined.ai

© Copyright 2023, Determined AI.