Service Level Objectives
SLOs define measurable reliability targets. Instead of vague "99.9% availability" goals, SLOs track actual performance against a target using DQL-based indicators.
Key Concepts
Term What It Means Example
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
SLO The target you commit to 99.5% availability over 7 days
SLI The metric that measures it Successful requests / total requests
Error budget How much failure is allowed 0.5% = ~50 min downtime per week
Burn rate How fast you're consuming the budget 2x = will exhaust in half the time
Creating an SLO
- Ctrl+K โ "Service-Level Objectives" โ Create new
- Choose a template (recommended) or write custom DQL
- Select entity (host, service) and set target
- Save โ the SLO starts tracking immediately
Built-in Templates
Template SLI (auto-generated DQL)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Host CPU utilization timeseries sli=avg(dt.host.cpu.usage)
Service availability timeseries {total=sum(dt.service.request.count), failures=sum(dt.service.request.failure_count)}
Service performance timeseries total=avg(dt.service.request.response_time)
K8s cluster CPU/memory efficiency timeseries sli=avg(dt.kubernetes.cluster.cpu_usage_percent)
๐ก After creating from a template, click "Edit SLI" to see the generated DQL โ great way to learn SLO query patterns.
SLO Tiers (ACE Best Practice)
Tier Target Warning Use Case
โโโโโโโโ โโโโโโโ โโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ
High 99.0% 99.5% Revenue-critical services
Medium 98.0% 99.0% Internal business apps
Low 95.0% 98.0% Dev/staging environments
Burn Rate Alerting
Burn rate measures how fast you're consuming your error budget. Create an anomaly detector on the SLO burn rate metric to get alerted before the SLO target is breached.
Burn Rate Meaning Action
โโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
< 1 Under budget โ healthy None
1-4 Slow burn โ will miss SLO eventually Investigate
4-10 Fast burn โ urgent Page on-call
> 10 Critical โ SLO will fail soon Immediate action
๐ Try it: Create an SLO from the "Service availability" template for your main service. Set target to 99.5%. Watch it track over the next few hours.
Error Budget & Burn Rate
- Burn rate 1.0 = consuming budget at target pace
- Burn rate > 1.0 = will breach SLO before period ends
- Dynatrace auto-calculates burn rate and raises events via Anomaly Detection
Custom SLI DQL must produce an sli field returning array of doubles. Use timeseries for metrics, makeTimeseries for logs/spans.