Playbook: Slow Application
User reports: "The app is slow." Here's the step-by-step investigation.
๐ก Start with the Problems app. Open Ctrl+K โ "Problems" โ Davis may have already detected the root cause. If your team has created troubleshooting guides (notebooks/dashboards prefixed with [TSG]), Dynatrace Intelligence will auto-suggest relevant ones for the active problem.
Step 1: Check if Davis Already Found It
fetch events, from:now()-24h
| filter event.kind == "DAVIS_PROBLEM"
| filter event.status == "ACTIVE"
| fields display_id, event.name, affected_entity_ids
Step 2: Service Response Time
timeseries avg(dt.service.request.response_time), by:{dt.entity.service}
Look for spikes. Which service is slow? Note the entity ID.
Step 3: Is It Throughput or Latency?
timeseries {
rt = avg(dt.service.request.response_time),
count = sum(dt.service.request.count)
}, by:{dt.entity.service}
If throughput dropped AND latency spiked โ likely a backend issue. If throughput is normal but latency spiked โ slow dependency.
Step 4: Check the Host
timeseries {
cpu = avg(dt.host.cpu.usage),
mem = avg(dt.host.memory.usage),
disk = avg(dt.host.disk.used.percent)
}, by:{dt.entity.host}
CPU > 90%? Memory exhausted? Disk full? These cause application slowness.
Step 5: Find Slow Traces
fetch spans
| filter span.kind == "SERVER"
| filter duration > 5000000000
| fields trace.id, service.name, span.name, duration
| sort duration desc
| limit 10
Step 6: Check Logs for Errors
fetch logs, from:now()-1h
| filter loglevel == "ERROR"
| fields timestamp, content, dt.entity.service
| sort timestamp desc
| limit 20
Decision Tree
Davis found a problem? โ Follow Davis root cause
โ No
Service RT spiked? โ Check host resources (Step 4)
โ Host OK
Slow traces found? โ Drill into trace waterfall
โ No traces
Error logs present? โ Fix the errors first
โ No errors
Check external dependencies โ DNS, network, third-party APIs
๐ Try it: Open the Problems app (Ctrl+K โ "Problems") and check for active problems. Click any problem to see Davis's root cause analysis โ it automatically correlates across hosts, services, and traces.
Official Regression Thresholds
From Dynatrace's own troubleshooting playbooks โ a regression is confirmed when:
Signal Regression Threshold
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
P95 response time Increased >20% OR absolute >2s (>2,000,000,000 ns)
Error rate Increased >1 percentage point
Throughput Dropped >20% (without corresponding traffic drop)
Key Rules
โ ๏ธ ALWAYS start with Davis problems โ never do broad log searches. Use the Problems app first, then scope all queries to the problem's timeframe and affected entities.
- Never query logs without context โ broad log searches hit scan limits and return 0 results
- Scope queries:
from: problemStart - 5min, to: problemEnd + 5min - Look for trace_id in logs โ correlate logs โ traces for root cause
- HTTP error rate: use
countIf(http.response.status_code >= 500)โ spanstatus == "ERROR"is unreliable for HTTP services