Determined is an open-source Deep Learning Training Platform that incorporates cutting-edge research and years of practical experience to help deep learning teams train models more quickly, easily share GPU resources, and effectively collaborate. Determined allows deep learning engineers to focus on building and training models at scale, without needing to worry about DevOps or writing custom code for common tasks like fault tolerance or experiment tracking. More information about Determined can be found on the website.
What does Determined handle for you?¶
You can think of Determined as a platform that bridges the gap between tools like TensorFlow and PyTorch—which work great for a single researcher with a single GPU—and the challenges that arise when doing deep learning at scale, as teams, clusters, and data sets all increase in size.
Determined’s key capabilities include:
high-performance distributed training without any additional changes to your model code
intelligent hyperparameter optimization based on cutting-edge research
flexible GPU scheduling, including dynamically resizing training jobs on-the-fly and automatic management of cloud resources on AWS and GCP
built-in experiment tracking, metrics storage, and visualization
automatic fault tolerance for deep learning training jobs
To use Determined, you can continue using popular deep learning frameworks such as TensorFlow and PyTorch; you just need to modify your model code to implement the Determined API.
To install Determined, please follow the installation instructions. Determined can be installed on the public cloud, an on-premise cluster, or a local development machine.
Each user should also install the Determined command-line tools on systems they will use to access Determined.
We recommend you follow our Quick Start Guide if you’re new to Determined.
Next, learn more about our Experiment APIs by following our tutorials. If you’re using TensorFlow Keras or Estimator and want to get started quickly with a minimal modification of source code, follow our Native API tutorial:
If you’re using PyTorch or want a finer-grained level of control over the training loop, follow our Trial API tutorials in your preferred framework:
Use the links below to start learning more about Determined’s capabilities.
Reproducibility, TensorBoard, and Notebooks
The Determined documentation is divided up into five main categories:
Tutorials are simple step-by-step guides about getting started with different topics of Determined. Tutorials are a good place to get started with using the product.
Topic Guides discuss concepts and topics at a high level. They provide useful information and explanation.
Reference guides contain technical reference for our APIs. They describe how to use it; however, these guides assume you have a working understanding of key concepts of Determined.
How-to guides take you through the steps needed to address key use-cases. You can think of them as advanced tutorials that assume some knowledge of key concepts of Determined.
System administration guides take you through what’s needed to set up and configure the Determined system.