📡 SNMP Extensions — Module 3

Groups & Feature Sets

Hands-on

Why Multiple Groups Matter

In Module 2, we put everything in one group. That works, but it has a critical flaw: if one subgroup fails, ALL metrics in the group stop.

This is a real production bug. Here's what happened with a Cisco ACI spine switch:

Device: 10.250.11.51 (spine switch)
Problem: SNMP agent bug on ifIndex 402718780
Effect: ifDescr GETBULK walk hangs for 180 seconds (5 retries × 30s timeout)
Result: ALL table metrics blocked — CPU, memory, PSU, temperature = ZERO DATA

Meanwhile, the two leaf switches (.59, .60) with no buggy interfaces
reported everything perfectly.

The fix? Separate SNMP groups. Each group polls independently. If the interface walk hangs, CPU/memory/PSU/temperature still get collected.

Group Architecture

snmp:
  # Group 1: Device health (CPU, memory, PSU, temp)
  - group: Device Default
    interval:
      minutes: 1
    dimensions: [...]
    metrics: [...]        # Scalar metrics (sysUpTime)
    subgroups:
      - subgroup: CPU and Memory
      - subgroup: Power Supply
      - subgroup: Temperature

  # Group 2: Interfaces (separate polling — fault isolated)
  - group: Interfaces
    interval:
      minutes: 1
    dimensions: [...]
    subgroups:
      - subgroup: Interface Status
      - subgroup: Interface Counters

Now if the interface walk hangs on a buggy device, Group 1 still polls successfully. You get CPU, memory, PSU, and temperature data even when interfaces are broken.

Feature Sets

Feature sets let users toggle groups of metrics on/off in the monitoring configuration UI.

subgroups:
  - subgroup: CPU and Memory
    featureSet: CPU and Memory    # ← User can enable/disable this
    table: true
    metrics:
      - key: my_device.cpu_usage
        value: oid:1.3.6.1.4.1.9.9.109.1.1.1.1.8
        type: gauge

  - subgroup: Temperature
    featureSet: Temperature       # ← Separate toggle
    table: true
    metrics:
      - key: my_device.sensor_value
        value: oid:1.3.6.1.4.1.9.9.91.1.1.1.1.4
        type: gauge

In the monitoring configuration JSON:

{
  "featureSets": ["CPU and Memory", "Temperature"]
}

Or use "featureSets": ["all"] to enable everything.

Rules:

  • Metrics NOT in any featureSet are always reported (default metrics)
  • A metric inherits the featureSet of its subgroup, which inherits from its group
  • Metric-level featureSet overrides subgroup-level, which overrides group-level
  • Max 10 groups per extension, 10 subgroups per group

Complete Multi-Group Example

Here's a production-quality extension structure with fault isolation:

name: custom:com.dynatrace.extension.my-network-device
version: 0.0.2
minDynatraceVersion: "1.318.0"
author:
  name: Student

metrics:
  - key: my_device.sysuptime
    metadata:
      displayName: System Uptime
      unit: Count
  - key: my_device.cpu_usage
    metadata:
      displayName: CPU Usage
      unit: Percent
  - key: my_device.memory_used
    metadata:
      displayName: Memory Used
      unit: Byte
  - key: my_device.psu.status
    metadata:
      displayName: PSU Status
      unit: Count
  - key: my_device.sensor_value
    metadata:
      displayName: Sensor Temperature
      unit: Celsius
  - key: my_device.if.speed
    metadata:
      displayName: Interface Speed
      unit: MegaBit
  - key: my_device.if.in.octets.count
    metadata:
      displayName: Bytes In
      unit: Byte
  - key: my_device.if.out.octets.count
    metadata:
      displayName: Bytes Out
      unit: Byte

snmp:
  # ─── GROUP 1: Device Health (fault-isolated from interfaces) ───
  - group: Device Default
    interval:
      minutes: 1
    dimensions:
      - key: device.address
        value: this:device.address
      - key: device.port
        value: this:device.port
      - key: sys.name
        value: oid:1.3.6.1.2.1.1.5.0
      - key: sys.description
        value: oid:1.3.6.1.2.1.1.1.0
      - key: device.type
        value: const:Network Device
    metrics:
      - key: my_device.sysuptime
        value: oid:1.3.6.1.2.1.1.3.0
        type: gauge
    subgroups:
      - subgroup: CPU and Memory
        featureSet: CPU and Memory
        table: true
        dimensions:
          - key: cpu.index
            value: oid:1.3.6.1.4.1.9.9.109.1.1.1.1.1
        metrics:
          - key: my_device.cpu_usage
            value: oid:1.3.6.1.4.1.9.9.109.1.1.1.1.8
            type: gauge
          - key: my_device.memory_used
            value: oid:1.3.6.1.4.1.9.9.109.1.1.1.1.12
            type: gauge

      - subgroup: Power Supply
        featureSet: Power Supply
        table: true
        dimensions:
          - key: psu.descr
            value: oid:1.3.6.1.2.1.47.1.1.1.1.2
        metrics:
          - key: my_device.psu.status
            value: oid:1.3.6.1.4.1.9.9.117.1.1.2.1.2
            type: gauge

      - subgroup: Temperature
        featureSet: Temperature
        table: true
        dimensions:
          - key: sensor.descr
            value: oid:1.3.6.1.2.1.47.1.1.1.1.2
        metrics:
          - key: my_device.sensor_value
            value: oid:1.3.6.1.4.1.9.9.91.1.1.1.1.4
            type: gauge

  # ─── GROUP 2: Interfaces (polls independently) ───
  - group: Interfaces
    interval:
      minutes: 1
    dimensions:
      - key: device.address
        value: this:device.address
      - key: sys.name
        value: oid:1.3.6.1.2.1.1.5.0
    subgroups:
      - subgroup: Interface Status
        featureSet: Interfaces
        table: true
        dimensions:
          - key: if.name
            value: oid:1.3.6.1.2.1.31.1.1.1.1
          - key: if.alias
            value: oid:1.3.6.1.2.1.31.1.1.1.18
        metrics:
          - key: my_device.if.speed
            value: oid:1.3.6.1.2.1.31.1.1.1.15
            type: gauge
          - key: my_device.if.in.octets.count
            value: oid:1.3.6.1.2.1.31.1.1.1.6
            type: count
          - key: my_device.if.out.octets.count
            value: oid:1.3.6.1.2.1.31.1.1.1.10
            type: count

Cross-Table Bug (DED018)

This is the most common and dangerous bug in SNMP extensions. It happens when you put OIDs from different SNMP tables in the same table: true subgroup.

# ✗ WRONG — mixing ifTable OIDs with ipAdEntTable OIDs
- subgroup: Interface Info
  table: true
  dimensions:
    - key: if.name
      value: oid:1.3.6.1.2.1.31.1.1.1.1     # ifXTable (indexed by ifIndex)
  metrics:
    - key: my_device.if.speed
      value: oid:1.3.6.1.2.1.31.1.1.1.15     # ifXTable ✓
    - key: my_device.if.ipaddr
      value: oid:1.3.6.1.2.1.4.20.1.1        # ipAdEntTable ✗ DIFFERENT TABLE!

The problem: ifXTable is indexed by ifIndex (1, 2, 3...) but ipAdEntTable is indexed by IP address (10.0.0.1, 10.0.0.2...). GETBULK walks them together and the rows don't align.

# ✓ CORRECT — separate subgroups for different tables
- subgroup: Interface Status
  table: true
  dimensions:
    - key: if.name
      value: oid:1.3.6.1.2.1.31.1.1.1.1
  metrics:
    - key: my_device.if.speed
      value: oid:1.3.6.1.2.1.31.1.1.1.15

- subgroup: IP Addresses
  table: true
  dimensions:
    - key: if.ipaddr
      value: oid:1.3.6.1.2.1.4.20.1.1
  metrics:
    - key: my_device.if.ipadentifindex
      value: oid:1.3.6.1.2.1.4.20.1.2
      type: gauge

Compatible table combinations (same index space, safe to mix):

  • ifTable + ifXTable — both indexed by ifIndex
  • entPhysicalTable + entSensorValueTable — both indexed by entPhysicalIndex
  • entPhysicalTable + cefcFRUPowerStatusTable — both indexed by entPhysicalIndex

What's Next

In Module 4, we'll dive deeper into advanced SNMP patterns — MIB bundling, $networkFormat for address translation, variables for user-configurable filtering, and debugging SNMP polling issues from ActiveGate logs.

🛠 Hands-On Exercise

Edit the YAML in the editor, then click "Check My Work" to validate.

Fault Isolation with Groups

This extension has the ACI cross-table bug — all metrics are in one group, so a hanging interface walk blocks CPU/memory data.

  • Split into two groups: Device Default (CPU, memory, sysUpTime) and Interfaces (interface metrics)
  • Add featureSet: labels to each subgroup so they can be toggled independently
  • Make sure the Interfaces group has its own interval and dimensions

This is the exact fix we shipped in ACI v0.0.5.

extension.yamlYAML
Loading...