Trace Analysis
Distributed traces are stored in the spans table. Each span represents one operation in a request's journey through your system.
Find Slow Requests
// Top 10 slowest server-side spans
fetch spans, from:now()-1h
| filter span.kind == "SERVER"
| fields trace_id, service.name, span.name, duration
| sort duration desc
| limit 10
Response Time Percentiles by Endpoint
fetch spans, from:now()-2h
| filter span.kind == "SERVER" AND http.request.method IS NOT NULL
| summarize p50=percentile(duration, 50), p95=percentile(duration, 95),
p99=percentile(duration, 99), count=count(),
by:{http.route}
| sort p95 desc
Error Rate by Service
// IMPORTANT: use http.response.status_code for HTTP services
// span status == "ERROR" is unreliable for HTTP services
fetch spans, from:now()-1h
| filter span.kind == "SERVER"
| summarize total=count(),
errors=countIf(http.response.status_code >= 500 or otel.status_code == "ERROR"),
by:{service.name}
| fieldsAdd error_rate = 100.0 * toDouble(errors) / toDouble(total)
| sort error_rate desc
⚠️ status == "ERROR" can return 0 errors even with thousands of 5xx responses on HTTP services. Always use http.response.status_code >= 500 for HTTP error rates.
Exception Analysis
// Extract exceptions from span events
fetch spans, from:now()-2h
| expand span.events
| filter span.events[`event.name`] == "exception"
| fields trace_id, span.name,
span.events[`exception.type`],
span.events[`exception.message`]
| summarize count=count(), by:{span.events[`exception.type`]}
| sort count desc
💡 After expand, access fields with brackets: span.events[field] NOT span.events.field.
Database Query Analysis
// Top slow database queries
fetch spans, from:now()-1h
| filter db.system IS NOT NULL
| summarize avg_dur=avg(duration), count=count(), by:{db.statement}
| sort avg_dur desc
| limit 20
Trace-to-Log Correlation
// Find logs for a specific trace
fetch logs, from:now()-1h
| filter trace_id == "abc123..."
| fields timestamp, content, loglevel
| sort timestamp asc
Throughput Timeseries
fetch spans, from:now()-2h
| filter span.kind == "SERVER"
| makeTimeseries count=count(), avg_dur=avg(duration), interval:5m
▶ Knowledge Check
Q: After expand span.events, how do you access the exception type?
- ❌ span.events.exception.type
- ✅ span.events[`exception.type`]
- ❌ span.events->exception.type
Q: Why is status == "ERROR" unreliable for HTTP error rates?
- ✅ It can return 0 errors even with thousands of 5xx responses
- ❌ It only works for gRPC services
- ❌ It requires a special OAuth scope