Following on from the Observability Lens, I wanted to take one part of the evidence layer and explore it in more detail:
Alert behaviour.
Most alert dashboards answer fairly simple questions:
- How many alerts are currently active?
- Which alerts triggered most recently?
- Which objects are generating the most noise?
- How has alert volume changed over time?
All of that is useful, but alert volume on its own does not always tell the full story.
A high number of alerts may represent a genuine incident.
It may also represent repeated alert signatures, stale backlog, poor reset logic, monitoring noise, or a small number of entities creating a disproportionate amount of operational pressure.
That became the idea behind the Alert Pressure Skyline.
From alert volume to alert posture
The aim is not simply to visualise how many alerts have triggered.
The aim is to interpret the behaviour of the alerting layer and present it as an operational posture:
Calm · Building · Pressured · Critical · Unknown
The dashboard uses live Orion evidence to assess:
- alert-trigger volume over the last 24 hours
- repeat-signature pressure
- active-alert backlog
- alerts active for longer than 24 hours
- seven-day movement
- classification coverage
- structured evidence markers
- concentration across operational domains
The result is a single alert-pressure score, supported by the evidence that caused the state to be selected.
The skyline
Each tower represents one completed day of alert-trigger evidence.
The visual language is intentionally simple:
- tower height represents trigger volume
- amber side towers represent repeat pressure
- signal masts represent structured evidence markers
- tower colour represents the interpreted daily state
- the telemetry ribbon shows movement across the completed-day evidence
- the peak marker highlights the highest-pressure day
The aim is to make changes in alert behaviour recognisable before someone starts reading the detailed figures.
A skyline that remains stable is very different from one showing sustained growth, repeated peaks, or persistent concentration.
Evidence drill layer
The lower section adds the supporting evidence needed to explain the posture.
It includes:
- a 14-day pressure trend
- the highest repeating alert signatures
- domain pressure across infrastructure, network, applications, virtualisation, capacity, polling hygiene, and unclassified evidence
- a compact state-interpretation panel
- the recommended next action
This is the part that turns the dashboard from a visual into an operational tool.
The dashboard does not just say:
CRITICAL · 70/100
It also explains why:
AGED ALERT BACKLOG 96%
and what should happen next:
REDUCE AGED BACKLOG AND ISOLATE THE HIGHEST-PRESSURE SIGNATURES
Why repeat pressure matters
Repeated alerts are not always separate operational problems.
Sometimes they are multiple pieces of evidence pointing back to the same underlying issue.
When the same alert signature repeatedly triggers, clears, and reappears, it creates operational load without necessarily improving visibility.
That is where alerting starts to become noise.
The skyline makes that pressure visible by separating raw trigger volume from repeated evidence.
Signal trust still matters
The dashboard also shows how much of the alert evidence can be classified.
That distinction is important.
A high-pressure state with strong classification coverage is actionable.
A high-pressure state with poor classification coverage is less trustworthy because the alerting layer cannot clearly explain where the pressure is coming from.
In that situation, the correct posture is not necessarily healthy or critical.
It may be:
UNKNOWN
The wider idea
Alert dashboards should not only show a list of active alerts.
They should help answer:
- Is the alerting layer behaving normally?
- Is pressure building?
- Is the backlog becoming stale?
- Are a small number of signatures driving the noise?
- Can the evidence be trusted?
- What should the engineer investigate first?
That is the direction with this dashboard:
Evidence → State → Pressure → Action
✌️
Update: Part 4 is now available: Platform Observability Reactor - Monitoring the Monitoring
This time, the lens turns back onto SolarWinds itself. The Reactor uses live platform evidence to show whether the monitoring system can be trusted before it reports on everything else.
Part 4: Platform Observability Reactor - Monitoring the Monitoring - THWACK