Alert Creation
Hands-onCreating Alerts for Extensions
An extension without alerts is just data collection. Alerts turn metrics into actionable problems that generate tickets and wake people up at 3 AM. Getting them right matters.
Alert Architecture
Dynatrace metric events (alerts) use the Settings API v2 with schema builtin:anomaly-detection.metric-events. Each alert defines:
- What to watch: A metric key or selector
- When to fire: Threshold + sliding window
- What to create: Event type, title, description
- Where to attach: Entity dimension key (for problem grouping)
The Sliding Window
Never alert on a single datapoint. Use a sliding window to avoid false positives:
model_properties:
type: STATIC_THRESHOLD
threshold: 90 # Fire when value exceeds 90
alert_condition: ABOVE
samples: 35 # Look at last 35 samples
violating_samples: 3 # Fire if 3 of 35 exceed threshold
dealerting_samples: 5 # Clear after 5 consecutive samples below threshold
alert_on_no_data: false # Don't alert when device is unreachable
With 1-minute polling, samples: 35 covers ~35 minutes. Requiring 3 violations means the condition must persist, not just spike once.
Priority Levels
| Priority | Event Type | Typical Use |
|---|---|---|
| P1 (Severe) | CUSTOM_ALERT | Service down, critical threshold (≥90%) |
| P2 (Critical) | CUSTOM_ALERT | High threshold (≥80%), degraded state |
| P3 (Warning) | CUSTOM_ALERT | Warning threshold (≥70%), informational |
Entity Dimension Key (CRITICAL)
The eventEntityDimensionKey determines which entity the problem is raised on. Always use the parent entity — this is where tickets get generated:
# CORRECT: Problem raised on the device (parent)
event_entity_dimension_key = "dt.entity.myext:device"
# WRONG: Problem raised on the interface (child) — tickets scattered
event_entity_dimension_key = "dt.entity.myext:interface"
Title Placeholders
Include the affected entity name in the alert title so operators know what's broken without opening the problem:
# For child entity alerts (interfaces, ports, etc.)
title = "[P2] {dims:dt.entity.myext:interface.name} - High Bandwidth Utilization"
# For parent entity alerts
title = "[P1] {dims:dt.entity.myext:device.name} - CPU Critical"
Alert Configuration Template
{
"enabled": true,
"summary": "[P2] Extension - High CPU Alert",
"eventEntityDimensionKey": "dt.entity.myext:device",
"eventTemplate": {
"title": "[P2] {dims:dt.entity.myext:device.name} - High CPU",
"description": "CPU usage exceeded 80% threshold",
"eventType": "CUSTOM_ALERT",
"davisMerge": false
},
"modelProperties": {
"type": "STATIC_THRESHOLD",
"threshold": 80,
"alertCondition": "ABOVE",
"alertOnNoData": false,
"dealertingSamples": 5,
"samples": 35,
"violatingSamples": 3
},
"queryDefinition": {
"type": "METRIC_KEY",
"metricKey": "com.dynatrace.extension.myext.cpu",
"aggregation": "AVG"
}
}
func: Metrics in Alerts
For func: calculated metrics, you cannot use METRIC_KEY query type. Use METRIC_SELECTOR instead:
"queryDefinition": {
"type": "METRIC_SELECTOR",
"metricSelector": "func:com.dynatrace.extension.myext.bandwidth_pct:splitBy(\"dt.entity.myext:interface\")"
}
Deployment via Settings API v2
curl -X POST "$BASE/api/v2/settings/objects" \
-H "Authorization: Api-Token $TOKEN" \
-H "Content-Type: application/json" \
-d '[{
"schemaId": "builtin:anomaly-detection.metric-events",
"scope": "environment",
"value": { ... alert config ... }
}]'
Common Mistakes
- alertOnNoData: true — fires when device is unreachable for maintenance. Always set to
false. - davisMerge: true — merges your alert with Davis AI problems. Set to
falseto keep extension alerts separate. - Wrong entity dimension — attaching to child entity scatters problems across hundreds of interfaces instead of grouping on the device.
- Missing {dims:} in title — operators see "High CPU" but don't know which device without opening the problem.
- Threshold unit mismatch — querying a metric that returns bytes but setting threshold as if it's megabytes.
Real Example: Customer Feedback
After deploying ASR alerts, the customer reported "too many CPU alerts." Root cause: 3 duplicate pre-existing alerts from a previous team. We deleted the duplicates and kept our properly-configured ones with samples: 35 sliding window.
Another customer found the Catalyst uptime threshold was 360000 instead of 3600000 (6 minutes vs 60 minutes). A single zero cost them false alerts for weeks.
🛠 Hands-On Exercise
Edit the YAML in the editor, then click "Check My Work" to validate.
Alert Configuration
This extension monitors a network switch. Create the YAML metrics and topology needed to support these alerts:
- CPU alert at P1 (≥90%), P2 (≥80%), P3 (≥70%)
- Memory alert at the same thresholds
- Uptime alert at P3 (< 1 hour = 3600000 timeticks)
Make sure:
- The parent entity type has
role: default(alerts attach here) - Metric keys are correct for the alert query type
- All metrics have proper metadata