Homeโ€บ๐Ÿ”” Phase 3: Migrate Alerting & SLOsโ€บModule 143 min read ยท 15/21

Migrate Classic โ†’ Platform SLOs

Hands-on

SLO Migration

Gen2 SLOs used metric-based SLIs. Gen3 SLOs use DQL-based SLIs โ€” more flexible, queryable, and integrated with Grail.

๐Ÿ”ง Migration Step: Convert Classic SLOs

Step  Action                                  Where
โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1     List classic SLOs                         Service-Level Objectives Classic app
2     Note each SLO's metric, target, window    Export or screenshot for reference
3     Create platform SLO from template          Service-Level Objectives app โ†’ + SLO
4     Select matching template                   Service availability, response time, etc.
5     Choose entity and set target               Match your Gen2 values
6     Verify SLO value matches Gen2              Compare side-by-side for 1 week
7     Set up burn rate alerting                  Create anomaly detector on SLO metric
8     Disable classic SLO after validation       Don't delete โ€” disable first

โš ๏ธ Dynatrace is working on a dedicated SLO upgrade guide. Classic SLOs use functional metrics that behave differently during transformation. For now, recreate SLOs manually using the built-in templates โ€” they handle the DQL SLI generation automatically.

Creating an SLO (UI)

๐Ÿ›  Try it: Ctrl+K โ†’ "Service-Level Objectives" โ†’ Create new โ†’ Choose a template (Service availability, Host CPU, etc.) โ†’ Select entities โ†’ Set target (e.g., 99.5%) โ†’ Save.

Built-in Templates

Template                              What It Measures
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Host CPU usage utilization            CPU idle time across hosts
Service availability                  Successful request ratio
Service performance                   Requests faster than X ms
K8s cluster CPU usage efficiency      Cluster CPU utilization
K8s cluster memory usage efficiency   Cluster memory utilization
K8s namespace CPU/memory efficiency   Namespace-level resource usage

Custom SLO with DQL

For custom SLIs, write a DQL timeseries query. The SLI must be a single metric-based timeseries aggregation:

// Host CPU idle (works โ€” single metric aggregation)
timeseries sli = avg(dt.host.cpu.idle), by:{dt.entity.host}

// Service request count (works โ€” single metric)
timeseries sli = sum(dt.service.request.count), by:{dt.entity.service}

โš ๏ธ SLI limitation: Arithmetic expressions like (count - failures) / count * 100 FAIL with "parameter has to be a metric-based timeseries aggregation". Use the built-in templates for availability SLOs โ€” they handle the math internally.

โš ๏ธ API gotcha: The criteria field must be an array (not object). This causes 400 errors if wrong.

Out-of-the-Box SLO Templates

Gen3 includes pre-built SLO templates โ€” no need to write DQL from scratch:

  1. Ctrl+K โ†’ "Service-Level Objectives" โ†’ Create new
  2. Choose a template (service availability, response time, synthetic, etc.)
  3. Select your entity and target
  4. The DQL SLI is auto-generated

๐Ÿ’ก After creating from a template, click "Edit SLI" to see the generated DQL โ€” great way to learn SLO query patterns.

ACE Best Practice: SLO Tiers

The Dynatrace ACE team recommends three SLO tiers based on business criticality:

Tier      Target   Warning   Burn Rate Threshold   Use Case
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
High      99.0%    99.5%     4                      Revenue-critical services
Medium    98.0%    99.0%     10                     Internal business apps
Low       95.0%    98.0%     20                     Dev/staging environments

Burn Rate Alerting

Burn rate measures how fast you're consuming your error budget. A burn rate of 1 means you'll exactly exhaust the budget by the end of the window. Higher = faster burn = more urgent.

Burn Rate    Meaning                              Action
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
< 1          Under budget โ€” healthy                None
1-4          Slow burn โ€” will miss SLO eventually  Investigate
4-10         Fast burn โ€” urgent                    Page on-call
> 10         Critical โ€” SLO will fail soon         Immediate action

๐Ÿ’ก Create an anomaly detector that monitors SLO burn rate. When burn rate exceeds the tier threshold, trigger a workflow to notify the team. This is proactive SLO management โ€” issues get fixed before they impact users.