Grail Architecture Deep Dive
Grail โ Where All Data Lives After Migration
After migration, all observability data lives in Grail โ Dynatrace's unified data lakehouse. In Gen2, metrics, logs, and traces were in separate stores. In Gen3, everything goes into Grail and is queryable with DQL. Understanding Grail is essential before migrating dashboards and alerts.
Buckets: How Data is Organized
Grail organizes data into buckets. Each bucket has its own retention policy and access controls.
Bucket What's In It Default Retention
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
default_logs Application & system logs 35 days (configurable 1dโ10y)
default_metrics All metric data 15 months (1-min granularity)
default_events Davis events, problems 35 days
default_bizevents Business events 35 days
default_spans Distributed traces 10 days (configurable 10dโ10y)
๐ก You can create custom buckets with different retention. Example: a compliance_logs bucket with 5-year retention for audit logs, while regular logs keep 35 days.
OpenPipeline: The Processing Layer
Before data reaches Grail, it passes through OpenPipeline. This is where you:
- Parse โ extract structured fields from raw log lines
- Filter โ drop noisy or irrelevant data before storage
- Route โ send data to different buckets based on rules
- Extract metrics โ create metrics from log patterns
- Transform โ enrich data with additional fields
๐ Try it: In your Dynatrace environment, go to Settings โ Log Monitoring โ Log processing (or search "OpenPipeline" in the app launcher). You'll see the default pipeline with its processing stages.
Retention: Configurable Per Bucket
Data Type Default Retention Configurable Range
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
Logs 35 days 1 day โ 10 years (+1 week)
Spans (traces) 10 days 10 days โ 10 years (+1 week)
Metrics 15 months Fixed (1-min granularity)
Events 35 days Configurable
Business events 35 days Configurable
Security (builtin) 3 years Fixed
Security (3rd pty) 1 year Fixed
You configure retention per bucket. This means you can keep security logs for years while keeping debug logs for only a few days โ optimizing cost without losing compliance data.
Access Control at the Data Level
Grail supports fine-grained access control:
Level What It Controls Example
โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
Bucket Who can query which bucket Team A โ only compliance_logs
Table Who can query which data type Devs โ logs + traces, not billing
Record Filter rows by attribute Only see data where team == "yours"
Field Mask or hide specific columns Hide PII fields from non-admins
โ ๏ธ This is a major upgrade from Gen2 management zones, which could only filter by entity. Grail permissions filter by data attributes โ much more powerful.
Key Takeaways
- Grail = one store for all observability data (metrics, logs, traces, events, entities)
- OpenPipeline = processing layer between ingestion and storage
- Buckets = data containers with per-bucket retention and access control
- DQL = the single query language that works across all data in Grail
Under the Hood: Why Grail is Different
Schema-on-Read (No Upfront Schema)
Traditional data warehouses require you to define schemas before storing data. Grail uses schema-on-read โ data is stored as-is, and you define the structure when you query it. This means:
- No schema migrations when data formats change
- No data loss from schema mismatches
- Full flexibility to explore any data at any time
Datawarping (Patented Technology)
Datawarping is Dynatrace's patented storage and retrieval technology. Instead of building indexes to find data (which consume 90-99% overhead), Datawarping inverts the problem โ it identifies where data is not stored, which is 250x faster than traditional indexing. This eliminates:
- Manual index management
- Hot/cold storage tiers and rehydration
- The trade-off between query speed and storage cost
Massively Parallel Processing (MPP)
Every DQL query is automatically split across thousands of parallel nodes. Each node processes its data segment independently, then results are merged. This means:
- Query performance scales linearly with data volume
- No single point of failure (fault-tolerant by design)
- Data never leaves the environment scope (security isolation)
๐ก Grail combines the cost efficiency of a data lake with the query performance of a data warehouse โ without indexes, without manual tiering, without schema management. That's why it's called a "data lakehouse".
Built-in Security Buckets
Bucket What's In It Retention
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ
default_securityevents_builtin Dynatrace-generated security 3 years
default_securityevents Third-party security events 1 year
๐ก Discover all buckets: fetch dt.system.buckets | fields name, table, retentionDays
Custom Buckets
Create custom buckets for different retention or compliance needs:
- Compliance bucket โ 7 years retention for audit logs
- Short-term bucket โ 7 days for debug logs (save cost)
- Team bucket โ separate storage per team with Grail permissions
Route data to custom buckets using OpenPipeline โ Dynatrace's data processing engine that handles ingestion, transformation, and routing at scale.
๐ง Migration Step: Plan Your Data Retention
Before migrating, map your current retention to Grail buckets:
Step Action Command / UI
โโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 Check current retention settings Settings โ Data privacy โ Retention
2 Identify compliance requirements Which data needs 1yr+ retention?
3 Plan custom buckets compliance_logs (5yr), debug_logs (7d)
4 Create buckets in Grail Settings โ Grail โ Buckets โ + Bucket
5 Set up OpenPipeline routing Settings โ Log Monitoring โ OpenPipeline
6 Verify data flows to correct buckets fetch dt.system.buckets | fields name, retentionDays
๐ก Cost optimization: In Gen2, all logs had the same retention. In Gen3, you can route debug logs to a 7-day bucket and audit logs to a 5-year bucket. This alone can reduce storage costs by 30-50%.