Skip to main content
Ctrl+K
Logo image
version 0.23.0
⌘+K
  • Welcome

Get Started

  • How It Works
    • Introduction to Determined
    • System Architecture
  • Tutorials
    • Run Your First Experiment
    • PyTorch MNIST Tutorial
    • PyTorch Porting Tutorial
    • TensorFlow Keras Fashion MNIST Tutorial
  • Quickstart for Model Developers
  • Examples
  • Model Hub Library
    • Huggingface Trainsformers
      • Tutorial
      • Examples
    • MMDetection

Set Up

  • Basic Setup
  • Setup Guides
    • Deploy on Prem
      • Installation Requirements
      • Install Determined Using Docker
      • Install Determined Using det deploy
      • Install Determined Using Linux Packages
      • Install Determined Using Homebrew (macOS)
      • Install Determined Using Windows Subsystem for Linux (Windows)
    • Deploy on AWS
      • Install Determined
      • Deploy Determined with Dynamic Agents
      • Use Spot Instances
    • Deploy on GCP
      • Install Determined
      • Deploy Determined with Dynamic Agents
    • Deploy on Kubernetes
      • Install Determined on Kubernetes
      • Set up and Manage an Azure Kubernetes Service (AKS) Cluster
      • Set up and Manage an AWS Kubernetes (EKS) Cluster
      • Set up and Manage a Google Kubernetes Engine (GKE) Cluster
      • Development Guide
      • Customize a Pod
      • Helm and Kubectl Command Examples
      • Troubleshooting
    • Deploy on Slurm/PBS
      • Installation Requirements
      • HPC Launching Architecture
      • HPC Launcher Security Considerations
      • Install Determined on Slurm/PBS
      • Provide a Container Image Cache
      • Known Issues
  • Security
    • OAuth 2.0 Configuration
    • Transport Layer Security
    • OpenID Connect Integration
    • SAML Integration
    • SCIM Integration
    • RBAC
  • User Accounts
  • Workspaces and Projects
  • Logging and Elasticsearch
  • Cluster Usage History
  • Monitor Experiment Through Webhooks
    • Through Zapier
    • Through Slack
  • Upgrade
  • Troubleshooting

Model Developer Guide

  • Overview
  • Distributed Training
  • Prepare Container Environment
    • Set Environment Images
    • Customizing Your Environment
  • Prepare Data
  • Training API Guides
    • Core API User Guide
    • PyTorch API
    • PyTorch Lightning API
    • Keras API
    • DeepSpeed API
      • API Usage Guide
      • Autotuning
      • Advanced Usage
      • PyTorchTrial to DeepSpeedTrial
    • Estimator API
  • Hyperparameter Tuning
    • Configure Hyperparameter Ranges
    • Hyperparameter Search Constraints
    • Instrument Model Code
    • Handle Trial Errors and Early Stopping Requests
    • Search Methods
      • Adaptive (Asynchronous) Method
      • Grid Method
      • Random Method
      • Single Search Method
      • Custom Search Methods
  • Submit Experiment
  • How to Debug Models
  • Model Management
    • Checkpoints
    • Organize Models in the Model Registry
  • Best Practices

Reference

  • Overview
  • Python SDK
  • REST API
  • Training Reference
    • det
    • det.core
    • det.pytorch
    • det.pytorch.samplers
    • det.pytorch.deepspeed
    • det.pytorch.lightning
    • det.keras
    • det.estimator
    • Experiment Configuration
  • Experiment Configuration Reference
  • Model Hub Reference
    • MMDetection API
    • Transformers API
  • Deployment Reference
    • Common Configuration Options
    • Master Configuration Reference
    • Agent Configuration Reference
    • Helm Chart Configuration Reference
  • Job Configuration Reference
  • Custom Searcher Reference
  • CLI Reference

Tools

  • Overview
  • CLI User Guide
  • Commands and Shells
  • WebUI Interface
  • Jupyter Notebooks
  • TensorBoards
  • Exposing Custom Ports

Integrations

  • Works with Determined
  • IDE Integration
  • Prometheus and Grafana
  • Open Source Licenses
Set Up Reference
Release Notes Blog

Model Developer Guide

Model Developer Guide#

How Determined Works

Learn about core concepts, key features, and system architecture.

Preparing Container Environment

Resources for preparing your container environment.

Preparing Data

What is the best way to load data into your ML models? This depends on several factors...

Using a Training API

Learn how to work with Training APIs and configure your distributed model-dev-guide experiments.

Hyperparameter Tuning

Conceptual information about why hyperparameter tuning can be challenging and why it's important.

Submitting Experiment

Find out how to run an experiment by providing a launcher.

Debugging Models

Step-by-step instructions for debugging your models.

Managing Models

Model management involves using and deleting checkpoints, archiving experiments, and managing trained models.

Best Practices

General tips for the trial definition, and best practices for separating configuration from code.

previous

Troubleshooting

next

Distributed Training with Determined

By hello@determined.ai

© Copyright 2023, Determined AI.