Service Performance Recipes
Response Time by Service
timeseries avg(dt.service.request.response_time), by:{dt.entity.service}
⚠️ Response time is in microseconds. Divide by 1000 for ms.
Request Throughput
timeseries sum(dt.service.request.count), by:{dt.entity.service}
Failure Rate
timeseries avg(dt.service.request.failure_rate), by:{dt.entity.service}
P95 Response Time
timeseries percentile(dt.service.request.response_time, 95), by:{dt.entity.service}
Golden Signals (all 4 in one query)
timeseries {
latency = avg(dt.service.request.response_time),
throughput = sum(dt.service.request.count),
errors = sum(dt.service.request.failure_count),
error_rate = avg(dt.service.request.failure_rate)
}, by:{dt.entity.service}
Service List with Type
fetch dt.entity.service
| fields entity.name, serviceType, tags
Slowest Endpoints (via Spans)
fetch spans
| filter span.kind == "SERVER"
| summarize avg_duration = avg(duration), cnt = count(), by:{service.name, span.name}
| sort avg_duration desc
| limit 10
Service Errors in Logs
fetch logs
| filter loglevel == "ERROR"
| summarize cnt=count(), by:{dt.entity.service}
| sort cnt desc
💡 The "Golden Signals" query gives you latency, throughput, errors, and error rate in one chart — the four signals every SRE monitors.
🛠 Try it: Open a Notebook → run timeseries {rt=avg(dt.service.request.response_time), err=avg(dt.service.request.failure_rate)}, by:{dt.entity.service}. Click any service name in the results to jump to its detail page.