Homeโ€บ๐Ÿ’ฐ SRE & FinOpsโ€บModule 11 min read ยท 2/6

Incident Response Workflow

Hands-on

Incident Response Workflow

A workflow that fires on Davis problems, gathers context, and notifies the right team with actionable information.

Architecture

Davis Problem trigger
  โ†’ DQL: Get problem details
  โ†’ DQL: Get affected entity health
  โ†’ DQL: Get recent changes
  โ†’ JavaScript: Assess severity, decide action
  โ†’ Email/Slack: Notify with context

Context Queries

Problem Details

fetch events, from:now()-1h
| filter event.kind == "DAVIS_PROBLEM"
| filter display_id == "P-XXXXX"
| fields display_id, event.name, event.status, affected_entity_ids

Entity Health (last hour)

timeseries {
  cpu = avg(dt.host.cpu.usage),
  mem = avg(dt.host.memory.usage)
}, by:{dt.entity.host}, from:now()-1h

Recent Deployments

fetch events, from:now()-24h
| filter event.type == "CUSTOM_DEPLOYMENT"
| fields timestamp, event.name
| sort timestamp desc
| limit 5

Decision Logic (JavaScript Task)

// In the JavaScript task:
const cpu = result("entity_health").records[0].cpu;
const hasDeployment = result("recent_changes").records.length > 0;

if (cpu > 90) return { action: "ESCALATE", reason: "CPU critical" };
if (hasDeployment) return { action: "INVESTIGATE", reason: "Recent deployment" };
return { action: "MONITOR", reason: "No obvious cause" };

Notification Template (Jinja)

Subject: [{{ result("decide").action }}] {{ event()["event.name"] }}

Problem: {{ event()["display_id"] }}
Status: {{ event()["event.status"] }}
Action: {{ result("decide").action }}
Reason: {{ result("decide").reason }}

โš ๏ธ Set actor to a service user, NEVER owner. The service user needs: storage:*:read, email:emails:send, app-engine:apps:run, automation:workflows:run.

๐Ÿ›  Try it: Open Workflows โ†’ "+ Workflow" โ†’ add a "Davis problem" trigger โ†’ add a "Execute DQL query" task to gather context โ†’ add a "Send email" task with the results. Now every Davis problem automatically sends you a context-rich notification.