Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

SAM Node status contributors are missing...

I recently created a modern dashboard for UPS monitoring, showing some key metrics (battery capacity, input status, uptime remaining, etc.) using the Eaton Powerware UPS Monitoring template I downloaded from Thwack (thanks so much dcharville). I put this in place knowing that some UPS devices had either a warning or critical status based on runtime thresholds.

This was all great, but as the dashboard refreshed I noticed at times that the status on all the devices would suddenly go all Green. A few minutes later it would drop back to the appropriate mix of statuses (Up, Warning, Critical) that I was expecting. And I was pretty sure the runtime on all my UPS devices weren't somehow changing during these intervals.

As this seemed odd I checked the Node Child Status Participation settings when the nodes were all showing as UP, and discovered that all the SAM components were missing. And, a few minutes later when the statuses were back to normal, those components were back!


Node status contributors with SAM components		Node Status Contributors without SAM components

This seemed odd enough that I opened a ticket on the issue (Support Ticket #01327434) and, naturally, received the usual "it's the database" response. Unfortunately, while I was working through some remediation the VM hosting our main poller was corrupted and rolled back to a (much, much) older state, at which point I had to rebuild the server. The database, fortunately, we unaffected, and once I had the server running everything came back quite well.

Unfortunately this particular issue ended up being worse in the end, and I have now permanently lost the SAM components in terms of the status roll-up. While SAM is still functioning and I can see the status of individual applications and components, the status of those pieces are no longer rolling up to the nodes themselves, and are therefore not generating alerts. Alerts based directly on the application and component status continue to work fine. I re-opened the case (as it had been closed while I attempted to recover the VM), but so far I've only got the usual "We ran your original diagnostics package through the 'best practices' analyzer, and here's what it spit out" response from support. This, unfortunately, seems to be the new normal.

So, if anyone has seen something like this before, or has a suggestion, I'd be happy to hear it. We are currently running v2022.4.1 with a selection of modules (NPM, SAM, NCM, UDT, IPAM, VMAN, VNQM). While I am anxious to move to 2023.2, with all its bug fixes, I'd like to get this resolved before we move forward with an upgrade.

Thanks.

Find more posts tagged with

sam

status_information

child_status

Accepted answers

StillGoing

Upgrading to v2023.2 seems to have resolved these issues.

All comments

StillGoing

So sometimes reporting things is cathartic. As in, it can get you thinking..."Did I actually do all the stuff?"

So after writing this I opted to shutdown everything, cleanup all the temp files, run a fresh database maintenance routine and even re-run the Configurator. Now everything is back up and running ... and I have my SAM status rollups back.

Maybe it really WAS the database. Sigh.

That said, time will tell whether or not this is going to 'stick' or whether it is going to be an intermittent issue (like it was in the first place). So ... stay tuned!

StillGoing

The issue persists, unfortunately. While it is no longer 'stuck' in one state, it now intermittently jumps between one and the other.

I am using one UPS as my test case. Based on the UPS runtime this device should always be in a 'Critical' state. Instead, it jumps between 'Up' and 'Critical' based on whether the SAM status rollup is or is not working, as shown here. And this is just from the past hour.

If anyone has any suggestions, I'd be happy to hear about it!

Thanks.

StillGoing

Upgrading to v2023.2 seems to have resolved these issues.