Incident Response Workflow
A workflow that fires on Davis problems, gathers context, and notifies the right team with actionable information.
Architecture
Davis Problem trigger
โ DQL: Get problem details
โ DQL: Get affected entity health
โ DQL: Get recent changes
โ JavaScript: Assess severity, decide action
โ Email/Slack: Notify with context
Context Queries
Problem Details
fetch events, from:now()-1h
| filter event.kind == "DAVIS_PROBLEM"
| filter display_id == "P-XXXXX"
| fields display_id, event.name, event.status, affected_entity_ids
Entity Health (last hour)
timeseries {
cpu = avg(dt.host.cpu.usage),
mem = avg(dt.host.memory.usage)
}, by:{dt.entity.host}, from:now()-1h
Recent Deployments
fetch events, from:now()-24h
| filter event.type == "CUSTOM_DEPLOYMENT"
| fields timestamp, event.name
| sort timestamp desc
| limit 5
Decision Logic (JavaScript Task)
// In the JavaScript task:
const cpu = result("entity_health").records[0].cpu;
const hasDeployment = result("recent_changes").records.length > 0;
if (cpu > 90) return { action: "ESCALATE", reason: "CPU critical" };
if (hasDeployment) return { action: "INVESTIGATE", reason: "Recent deployment" };
return { action: "MONITOR", reason: "No obvious cause" };
Notification Template (Jinja)
Subject: [{{ result("decide").action }}] {{ event()["event.name"] }}
Problem: {{ event()["display_id"] }}
Status: {{ event()["event.status"] }}
Action: {{ result("decide").action }}
Reason: {{ result("decide").reason }}
โ ๏ธ Set actor to a service user, NEVER owner. The service user needs: storage:*:read, email:emails:send, app-engine:apps:run, automation:workflows:run.
๐ Try it: Open Workflows โ "+ Workflow" โ add a "Davis problem" trigger โ add a "Execute DQL query" task to gather context โ add a "Send email" task with the results. Now every Davis problem automatically sends you a context-rich notification.