Workload Alerting#

Workload alerting allows you to monitor the state of your experiments and share important information with your team members. This feature enables proactive issue detection while maintaining a good signal-to-noise ratio.

Note

To use this experimental feature, enable “Webhook Improvement” in user settings.

Key Concepts#

  • Webhook Trigger options: “All experiments in Workspace” and “Specific experiment(s) with matching configuration”

  • Webhook Exclusion

  • Trigger Types: COMPLETED, ERROR, TASKLOG, CUSTOM

  • Alert Levels: INFO, WARN, DEBUG, ERROR

For detailed information on supported triggers and example usage, see Notifications.

Creating Webhooks#

As a non-admin user with Editor or higher permissions, you can configure webhooks within your workspace. Here’s how to create webhooks:

  1. Navigate to the Webhooks section in the WebUI.

  2. Select New Webhook.

  3. In the New Webhook dialogue:

    • Select your Workspace

    • Name your webhook

    • Paste the webhook URL (e.g., from Zapier)

    • Set Type to either Default or Slack

    • Select the Trigger event (COMPLETED, ERROR, TASKLOG, or CUSTOM)

    • Choose the Trigger by option: “All experiments in Workspace” or “Specific experiment(s) with matching configuration”

    • If “Specific experiment(s) with matching configuration”, note the Webhook Name for use in experiment configurations

  4. Click Create Webhook.

Deleting Webhooks#

To delete a webhook, select the more-options menu to the right of the webhook record to expand available actions.

Editing Webhooks#

To edit a webhook, select the more-options menu to the right of the webhook record to expand available actions.

Note

Determined only supports editing the URL of webhooks. To modify other attributes, delete and recreate the webhook.

Use Cases#

Webhooks in Determined offer versatile solutions for various monitoring and alerting needs. Let’s explore some common use cases to help you leverage this powerful feature effectively.

Case 1: Share Simple State on All Experiments in Workspace#

This use case is ideal for teams that want to maintain a broad overview of all experiments running in a workspace, ensuring that no important updates are missed.

  1. Create a webhook with the “All experiments in Workspace” option.

  2. Select the desired trigger events (COMPLETED, ERROR, TASKLOG).

  3. All experiments in the workspace will now trigger this webhook unless explicitly excluded.

Case 2: Exclude Specific Experiments from Triggering Webhooks#

During active development or debugging, you may want to prevent certain experiments from triggering alerts to reduce noise and focus on specific tasks.

  1. Edit the experiment configuration:

    integrations:
      webhooks:
        exclude: true
    
  2. Run the experiment and verify that no webhooks are triggered.

Case 3: Customizable Monitoring for Specific Experiments#

For critical experiments or those requiring special attention, you can set up custom monitoring to receive tailored alerts based on specific conditions or events in your code.

  1. Create a webhook with the “Specific experiment(s) with matching configuration” option and “CUSTOM” trigger type.

  2. Note the Webhook Name.

  3. In the experiment configuration, reference the webhook:

    integrations:
      webhooks:
        webhook_name:
          - <webhook_name>
    
  4. In your experiment code, use the core_context.alert() function to trigger the webhook:

    with det.core.init() as core_context:
        core_context.alert(
            title="Custom Alert",
            description="This is a custom alert",
            level="INFO"
        )
    
  5. Run the experiment and check the event log in your webhook service for the custom data.

For more details on custom triggers, see Notifications.

Best Practices#

  • Use “Open” subscription mode for general monitoring of all experiments in a workspace.

  • Leverage “Run specific” mode and custom triggers for fine-grained control over alerts for critical experiments.

  • Use webhook exclusion for experiments under active iteration to reduce noise.

  • Regularly review and update your webhook configurations to ensure they remain relevant and useful.