SLO Management
Service-Level Objectives track reliability targets. This module covers SLI patterns, burn rate alerting, and the official SLO templates.
SLI DQL Patterns
Custom SLIs must produce an sli field returning an array of double values.
Service Availability
timeseries {
total = sum(dt.service.request.count),
failures = sum(dt.service.request.failure_count)
}, by: { dt.smartscape.service }
| fieldsAdd entityName = getNodeName(dt.smartscape.service)
| fieldsAdd sli = (((total[] - failures[]) / total[]) * 100)
| fieldsRemove total, failures
Service Error Rate by K8s Cluster
timeseries {
total = sum(dt.service.request.count),
errors = sum(dt.service.request.failure_count)
}, by: { dt.smartscape.service, k8s.cluster.name }
| filter k8s.cluster.name == "production-cluster"
| fieldsAdd errorRate = (errors[] / total[]) * 100
| fieldsAdd sli = 100 - errorRate[]
Response Time Performance
timeseries total = avg(dt.service.request.response_time),
default: 0, by: { dt.smartscape.service }
| fieldsAdd high = iCollectArray(if(total[] > (1000 * 500), total[]))
| fieldsAdd low = iCollectArray(if(total[] <= (1000 * 500), total[]))
| fieldsAdd highRespTimes = iCollectArray(if(isNull(high[]), 0, else: 1))
| fieldsAdd lowRespTimes = iCollectArray(if(isNull(low[]), 0, else: 1))
| fieldsAdd sli = 100 * (lowRespTimes[] / (lowRespTimes[] + highRespTimes[]))
Error Budget & Burn Rate
Concept What It Means
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Error budget Amount of "allowed" downtime before breaching target
Burn rate 1.0 Consuming budget exactly at target pace
Burn rate > 1.0 Will breach SLO before evaluation period ends
Burn rate < 1.0 Under budget โ room to spare
Dynatrace auto-calculates burn rate and raises events via Anomaly Detection. You can also create custom burn-rate alerts.
Official SLO Templates (7)
Template What It Measures
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
Host CPU utilization CPU idle percentage
Service availability Request success rate
Service performance Response time threshold
K8s cluster CPU efficiency Cluster CPU utilization
K8s cluster memory efficiency Cluster memory utilization
K8s namespace CPU efficiency Namespace CPU utilization
K8s namespace memory efficiency Namespace memory utilization
Key Rules
- Use
timeseriesfor metric-based SLOs (pre-aggregated, faster, cheaper) - Use
makeTimeseriesfor event/log/span-based SLOs getNodeName()extracts entity display name from entity ID- SLO names must be unique (duplicate โ 400 error)
โถ Knowledge Check
Q: What does a burn rate of 2.0 mean?
- โ You have 2x the error budget remaining
- โ You're consuming error budget at 2x the target pace โ will breach before period ends
- โ Your SLO target is 2%