Distributed Training with Determined#

Learn how to perform optimized distributed training with Determined to speed up the training of a single trial.

In Concepts of Distributed Training, you’ll learn about the following topics:

  • How Determined distributed training works

  • Reducing computation and communication overhead

  • Training effectively with large batch sizes

  • Model characteristics that affect performance

  • Debugging performance bottlenecks

  • Optimizing training

Visit Implementing Distributed Training to discover how to implement distributed training, including the following:

  • Connectivity considerations for multi-machine training

  • Configuration including slots per trial and global batch size

  • Considerations for concurrent data downloads

  • Details to be aware regarding scheduler behavior

  • Accelerating inference workloads

Additional Resources: