Homeโ€บ๐Ÿ’ฐ SRE & FinOpsโ€บModule 21 min read ยท 3/6

Capacity Planning

Hands-on

Capacity Planning

Use historical trends to predict when you'll run out of resources โ€” before it becomes an incident.

CPU Trend (7-day)

timeseries avg(dt.host.cpu.usage), by:{dt.entity.host}, from:now()-7d

Is the average increasing day over day? If yes, you're approaching capacity.

Memory Trend (7-day)

timeseries avg(dt.host.memory.usage), by:{dt.entity.host}, from:now()-7d

Disk Growth Rate

timeseries avg(dt.host.disk.used.percent), by:{dt.entity.host}, from:now()-7d

If disk grows 1% per day and you're at 80%, you have ~20 days before it's full.

Service Throughput Growth

timeseries sum(dt.service.request.count), by:{dt.entity.service}, from:now()-7d

Growing throughput with flat infrastructure = future performance degradation.

Week-over-Week Comparison

// This week's average CPU
timeseries this_week = avg(dt.host.cpu.usage, scalar:true), from:now()-7d, to:now()

// Compare with previous week manually by running:
timeseries last_week = avg(dt.host.cpu.usage, scalar:true), from:now()-14d, to:now()-7d

Capacity Planning Checklist

Metric                  Warning Zone    Action
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
CPU avg > 70%           Approaching     Plan scaling or optimization
Memory avg > 80%        Approaching     Check for leaks, plan upgrade
Disk > 80%              Critical        Clean up or expand storage
Throughput growing 10%+ Weekly          Plan horizontal scaling
Error rate trending up  Degrading       Investigate before it breaks

๐Ÿ’ก Set up a weekly trend report workflow that compares this week vs last week. If any metric is consistently growing, it's a capacity signal โ€” act before it becomes an incident.

๐Ÿ›  Try it: Open a Notebook โ†’ run timeseries cpu=avg(dt.host.cpu.usage), by:{dt.entity.host}, from:now()-30d โ†’ switch to "Line chart" visualization. Look for upward trends โ€” those hosts need capacity attention before they alert.