The Problem: Node‑Level Health Isn’t Enough
For modular Cisco platforms—Catalyst chassis, Nexus spines and leafs, big ISR/ASR routers—most of the real risk lives at the FRU (Field Replaceable Unit) level:
- Power supplies going in and out of service
- Fan trays limping along in warning state
- Line cards bouncing or failing
- Modules powered down by the chassis due to power or temperature
What’s New in Observability Self-Hosted 2026.2
In Observability Self-Hosted 2026.2 , Hardware Health has been enhanced to:
- Read Cisco’s FRU MIB to provide module‑level hardware health for supported Cisco devices.
- Extend from “box‑level” health to individual modules, including:
- Line cards
- Power supplies (PSUs)
- Fan trays
- Supervisor / processor / interface modules
This rides on top of the Hardware Health that are already polled:
- We still use Cisco’s environmental and entity MIBs .
- We now add more detail for Cisco platforms that expose module‑level data.
Key point: You do not need custom pollers, MIB imports, or manual OID mapping. If the device supports it and you’re on Observability Self-Hosted 2026.2 +, Observability Self-Hosted does the heavy lifting.
Which Cisco Devices Benefit?
Any Cisco device that exposes its module‑level health via the Cisco FRU MIB over SNMP can take advantage of this.
Typical examples include:
- Cisco Catalyst 9300 / 9400 / 9500 / 9600 series and similar chassis‑based models
- Cisco Nexus 7000 / 9000 series platforms that report FRU/module information
- Cisco ASR / ISR Where the platform implements FRU/module monitoring
What You Get: Module‑Level Signal in Plain Language
For each module (line card, supervisor, power module, etc.), Observability Self-Hosted now tracks:
- Admin state vs. real state
- Why the module last reset
- How long it’s been stable
For power supplies and other power‑managed FRUs, Observability Self-Hosted tracks:
- Admin vs. actual power state
- Whether that PSU is really contributing to the chassis
- Current draw where the device reports it
For fan trays, Observability Self-Hosted tells you if they’re:
- Up and running
- Running in a warning state
- Down (time to panic, or at least open a ticket)
All of this rolls into Hardware Health using the same Up / Warning / Critical / Unknown colors you already rely on—just at a much finer granularity.
Where to See This in Observability Self-Hosted
Node Details → Hardware Health
On a supported Cisco device running in Observability Self-Hosted 2026.2+:
- Open the node in Observability Self‑Hosted.
- Navigate to the Hardware Health widget.
You’ll now see:
- Additional sensors for:
- Status colors per module:
- Green (Up) – module is healthy.
- Yellow (Warning) – degraded but still running (for example, “On but fan failed”).
- Red (Critical) – module has failed or is powered down by the chassis due to conditions.
- Gray (Unknown) – Observability Self-Hosted can’t determine the state from the device.
Manage Hardware Sensors
From the same node:
- Open Manage Hardware Sensors.
- FRU/module sensors appear alongside your existing sensors.
- The Last Value column lets you confirm the module‑level view is being polled and updated.
This is the quickest way to confirm that your chassis is sharing module-level information with Observability Self-Hosted.
Technical Details (For Power Users)
This is one level deeper, here’s how the main Cisco FRU structures line up in Observability Self-Hosted.
Cisco FRU and Entity Structures
Area | MIB / Structure | What it represents |
|---|
Module status | cefcModuleStatusTable
| Per-module admin/oper state, reset reason, uptime |
FRU power status | cefcFRUPowerStatusTable
| Per-FRU power admin/oper state, current draw |
Fan tray status | cefcFanTrayStatusTable
| Per-fan-tray operational state (up/warning/down/unknown) |
FRU notifications | cefcMIBNotificationEnables
| Global enable flags for FRU/module/PSU notifications |
Physical inventory | entPhysicalTable (ENTITY-MIB)
| Chassis/modules/PSUs/fans with human-readable names |
Observability Self-Hosted uses ENTITY-MIB for the physical hierarchy (chassis, slots, modules, PSUs, fans) and Cisco FRU tables to attach live status and metrics to those components.
Metrics Collected Per Area
Component | Key metrics collected | Used for |
|---|
Modules | Admin status, oper status, last reset reason, last status change time (seconds), module uptime (secs) | Module health state + Message field |
PSUs | Power admin status, power oper status, current draw (where exposed) | PSU health + capacity/usage context |
Fans | Oper status (up/warning/down/unknown) | Fan health status |
Global | FRU notification enablement (status change, PSU output change, etc.) | Validating Cisco SNMP trap config |
Where This Shows Up in the Observability Self-Hosted UI
Data | UI widget / location | Notes |
|---|
Module status (Up/Warning/Critical/Unknown) | Node Details → Hardware Health | Each module/FRU appears as its own Hardware Health row |
PSU & fan tray state / current draw | Node Details → Hardware Health | Listed under the same Hardware Health widget |
All module/FRU entries + last polled value | Node → Manage Hardware Sensors | Per-sensor row with latest status/value |
Reset reason / last change / uptime (text) | Hardware Items → Message field (alerts/reports/SWQL) | Used directly in custom alerts and SWQL queries |