Migrate Metric Events โ Anomaly Detectors
Anomaly Detectors (Replacing Metric Events)
Gen2 used metric events with threshold-based alerting. Gen3 uses Davis anomaly detectors โ DQL-powered, with three analysis models.
Three Analyzer Models
Model How It Works Best For
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ
Static Threshold Fixed number (e.g., CPU > 90%) Known limits
Auto-Adaptive Learns baseline, alerts on Dynamic workloads
deviation from normal
Seasonal Baseline Learns daily/weekly patterns, Traffic with patterns
alerts on anomalies (business hours, weekends)
๐ Try it: Ctrl+K โ "Anomaly Detection" โ "+ Anomaly detector" โ paste: timeseries failures=avg(dt.service.request.failure_rate), by:{dt.entity.service}, interval:1m โ Static threshold โ 5% โ ABOVE. This alerts when any service's error rate exceeds 5% โ a real production alert you'd actually use.
๐ง Migration Step: Use the Built-in Transpiler
๐ก Dynatrace has a built-in metric event transpiler. The Anomaly Detection app can auto-convert your classic metric events to DQL-based anomaly detectors.
Step Action Where
โโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 Open Anomaly Detection app Ctrl+K โ "Anomaly Detection"
2 Enable authorization settings โ โ Authorization settings โ enable
3 Click "Improve metric events with DQL" Custom alert โ Improve metric events
4 Select metric events to transform Check the ones you want to convert
5 Click "Transform" Auto-converts metric selector โ DQL
6 Verify the converted query โฎ โ Open with โ View and execute query
7 Check the original is disabled Transformation page โ State: Disabled
โ ๏ธ The transpiler only works with metric selector events, not metric key events. Metric key events must be manually rewritten as DQL timeseries queries. After transformation, the classic metric event is automatically disabled and the new anomaly detector is activated.
Manual Conversion (for metric key events)
For metric key events that the transpiler can't handle, convert manually:
Classic Metric Event Gen3 Anomaly Detector
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Metric key: builtin:host.cpu.usage Query: timeseries avg(dt.host.cpu.usage), interval:1m
Threshold: 90 Threshold: 90
Condition: ABOVE Alert condition: ABOVE
Alert delay: 3 of 5 samples Violating: 3, Sliding window: 5
Creating an Anomaly Detector (UI)
- Open the Anomaly detectors app
- Click + Anomaly detector
- Write a DQL timeseries query:
timeseries avg(dt.host.cpu.usage), interval:1m - Choose analyzer model (Static threshold is simplest)
- Set threshold, condition (ABOVE/BELOW), and sensitivity
- Configure event template (name, type, severity)
- Save and enable
โ ๏ธ Anomaly detector queries MUST use interval:1m. Other intervals will fail.
Naming Convention
[P1] Service Name โ What's Wrong (Critical)
[P3] Service Name โ What's Wrong (Warning)
Examples:
[P1] Control Center โ CPU Usage Critical
[P3] apmlabs.link โ Response Time Warning
[P1] All Services โ Error Rate Spike
API Deep Dive
For automation, anomaly detectors use the Settings API with schema builtin:davis.anomaly-detectors:
{
"schemaId": "builtin:davis.anomaly-detectors",
"value": {
"enabled": true,
"title": "[P1] High CPU",
"analyzer": {
"name": "...StaticThresholdAnomalyDetectionAnalyzer",
"input": [
{"key": "query", "value": "timeseries avg(dt.host.cpu.usage), interval:1m"},
{"key": "threshold", "value": "90"},
{"key": "alertCondition", "value": "ABOVE"}
]
}
}
}
Record-Based Alerting (New)
Since SaaS 1.337, Anomaly Detection also supports record-based alerting โ you can trigger alerts on records (logs, events, spans), not just timeseries metrics. This means you can alert on specific log patterns, event counts, or span attributes directly.
๐ก Timeseries-based alerting (shown above) remains the primary pattern for infrastructure and service metrics. Record-based alerting is best for log-driven alerts, security events, or business event thresholds where you need to match specific record conditions.
ACE Best Practice: P1/P3 Pairing
๐ก Every P1 (critical) alert MUST have a matching P3 (warning) alert. The P3 fires first, giving teams time to investigate before the P1 escalates.
Alert P3 Warning P1 Critical
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
CPU Usage > 70% > 90%
Memory Usage > 80% > 95%
Disk Usage > 80% > 90%
Service Response Time > 2000ms > 5000ms
Service Error Rate > 1% > 5%
Sliding Window Parameters
Parameter What It Does Recommended
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
slidingWindow Number of samples in the evaluation window 5
violatingSamples How many must breach threshold to alert 3
dealertingSamples How many must be OK to close the alert 5
alertOnMissingData Alert when no data arrives false
โ ๏ธ Always set alertOnMissingData: false. Otherwise, maintenance windows and agent restarts trigger false alerts.