Daily Health Report Workflow
A workflow that runs every morning, queries key metrics, formats a report, and emails it to the team.
Architecture
Schedule (08:00 daily)
โ DQL: Host health
โ DQL: Service errors
โ DQL: Active problems
โ DQL: SLO status
โ JavaScript: Format report
โ Email: Send to team
DQL Queries for the Report
Host Health Summary
timeseries cpu = avg(dt.host.cpu.usage, scalar:true), by:{dt.entity.host}
| fieldsAdd current = arrayAvg(cpu)
| fields dt.entity.host, current
Service Error Summary (24h)
timeseries {
total = sum(dt.service.request.count),
failures = sum(dt.service.request.failure_count)
}, by:{dt.entity.service}, from:now()-24h
| fieldsAdd total_sum = arraySum(total), fail_sum = arraySum(failures)
| fieldsAdd error_pct = 100.0 * fail_sum / total_sum
| fields dt.entity.service, total_sum, fail_sum, error_pct
Active Problems
fetch events, from:now()-24h
| filter event.kind == "DAVIS_PROBLEM"
| filter event.status == "ACTIVE"
| fields display_id, event.name
Workflow Setup
- Create workflow with Schedule trigger (08:00 daily)
- Add DQL tasks for each query above
- Add JavaScript task to format results into a readable report
- Add Email task:
{{ result("format_report").output }} - Set a service user as actor
๐ก We have a production-tested 11-task executive daily report workflow template with sparklines, health scores, and trend arrows. See the dt-automation skill for the full JSON template.
๐ Try it: Open Workflows โ "+ Workflow" โ add a Schedule trigger (daily 09:00) โ add a "Execute DQL query" task with fetch dt.entity.host | summarize total=count(), healthy=countIf(state == "RUNNING") โ add a "Send email" task. You just automated your morning health check.
SLO Error Budget & Burn Rate
SLOs track error budget โ the amount of "allowed" downtime before breaching your target:
- Burn rate 1.0 = consuming budget exactly at target pace
- Burn rate > 1.0 = will breach SLO before evaluation period ends
- Auto-alerting: the SLO app raises burn-rate events via Anomaly Detection
Official SLI DQL pattern (service availability):
timeseries {
total = sum(dt.service.request.count),
failures = sum(dt.service.request.failure_count)
}, by: { dt.smartscape.service }
| fieldsAdd sli = (((total[] - failures[]) / total[]) * 100)
Key: the sli field must return an array of double values. Use timeseries for metrics, makeTimeseries for logs/spans.