 SLA
SLA
Available on: >= 0.20.0
Assert that your workflows meet SLAs.
What is an SLA
A Service Level Agreement (SLA) is a core property of a flow that defines a behavior to trigger if the flow runs too long or fails to meet the defined assertion.
SLA types
Currently, Kestra supports the following SLA types:
- MAX_DURATION — the maximum allowed execution duration before the SLA is breached
- EXECUTION_ASSERTION — an assertion defined by a Pebble expression that must be met during the execution. If the assertion doesn't hold true, the SLA is breached.
How to use SLAs
SLAs are defined using the sla property at the root of a flow, and they declare the desired state that must be met during executions of the flow.
MAX_DURATION
If a workflow execution exceeds the expected duration, an SLA can trigger corrective actions, such as cancelling the execution.
The following SLA cancels an execution if it takes more than 8 hours:
id: sla_example
namespace: company.team
sla:
  - id: maxDuration
    type: MAX_DURATION
    duration: PT8H
    behavior: CANCEL
    labels:
      sla: miss
      reason: durationExceeded
tasks:
  - id: punctual
    type: io.kestra.plugin.core.log.Log
    message: Workflow started, monitoring SLA compliance
  - id: sleepyhead
    type: io.kestra.plugin.core.flow.Sleep
    duration: PT9H
  - id: never_executed_task
    type: io.kestra.plugin.core.log.Log
    message: This task will never start because the SLA was breached
EXECUTION_ASSERTION
An SLA can also be based on an assertion that must hold true during execution. If the assertion fails, the SLA is breached.
The following SLA fails if the output of mytask is not equal to expected output:
id: sla_demo
namespace: company.team
sla:
  - id: assert_output
    type: EXECUTION_ASSERTION
    assert: "{{ outputs.mytask.value == 'expected output' }}"
    behavior: FAIL
    labels:
      sla: miss
      reason: outputMismatch
tasks:
  - id: mytask
    type: io.kestra.plugin.core.debug.Return
    format: expected output
SLA behavior
The behavior property of an SLA defines the action to take when the SLA is breached. The following behaviors are supported:
- CANCEL — cancels the execution
- FAIL — fails the execution
- NONE — logs a message
In addition, each breached SLA can set labels that can be used to filter executions or trigger follow-up actions.
Alerts on SLA breaches
For example, if you want to receive a Slack alert when an SLA is breached, you can use a Flow trigger to react to cancelled or failed executions labeled with sla: miss:
id: sla_miss_alert
namespace: system
tasks:
  - id: send_alert
    type: io.kestra.plugin.notifications.slack.SlackIncomingWebhook
    url: "{{secret('SLACK_WEBHOOK')}}"
    messageText: "SLA breached for flow `{{trigger.namespace}}.{{trigger.flowId}}` with ID `{{trigger.executionId}}`"
triggers:
  - id: alert_on_failure
    type: io.kestra.plugin.core.trigger.Flow
    labels:
      sla: miss
    states:
      - FAILED
      - WARNING
      - CANCELLED
Best practice: Use labels with SLAs to track SLA breaches across environments, and pair them with alerting or monitoring flows for proactive response.
Was this page helpful?
