cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

False reboot alerts due to SNMP service restart & Proper uptime information

False reboot alerts due to SNMP service restart & Proper uptime information

Reboot information is related to SNMP service/daemon. It should not be work like this. SNMP service/daemon can be crashed or restarted for some reason (most of the time). And for that it's not a reliable service for tracking reboots. We need other solution (maybe different OID) for this out of the box. There are plenty of threads about this issue. Some of them are below.

linux snmpd restart during logrotate triggers false reboot alert weekly

Better Method of Calculating Uptime

Re: Misleading SNMP Uptime information
Re: events with "...nodename REBOOTED AT...date time"
Uptime SNMP and reboot messages

is this normal? restart SNMP SERVICE and get a REBOOT event

False Uptime/Reboot readings

39 Comments
Level 10

I agree.  Having a restart notification tied to SNMP value changing or the service being stopped or started on a windows server is an issue. 

Level 14

Not only windows. We have Cisco ASR Routers shows false reboots because of snmp service restarts a lot. Device uptime is 8 weeks but shows 3 reboots yesterday. Unix/linux boxes are the same.

Level 16

I'm not sure there is much that can be done by the network management vendor.

Not every hardware vendor  provides a custom MIB variable that could be polled, and then we're asking for custom device support, which we could get today by using a universal device poller.

Level 8

This issue is causing a significant amount of problems in our server farm. We run 350+ RHEL6 servers, a significant number of which false alert on snmp daemon restarts.

We have $100k+ invested in our SW build (NPM, SAM, NCM) and this problem is significant enough to have management looking at other NMS solutions. Given the help desk/support resources involved when a false reboot alert is investigated, it wouldn't surprise me if they trashed SW and/or replaced it with Solutions Manager, etc..

You guys need to get on the ball and fix this issue ASAP.

Level 16

Solarwinds Orion NPM can do this for you today, with only a little effort:

If you've RHEL6 servers and have configured net-SNMP to give access to the host resources MIB then the following universal device poller will pull the hrSystemUptime for the devices, and the attached alert definition makes use of it to determine if the node has rebooted in the past 10 minutes (customize to send email, sms page etc.).

http://thwack.solarwinds.com/docs/DOC-170682

http://thwack.solarwinds.com/docs/DOC-170681

To disable the 'node rebooted' for SNMP agent reboots define a custom property on your RHEL6 hosts and suppress the node rebooted alerts if the property is set.

Level 14

Thank you for suggestion but IMHO it should come out of the box.

Level 10

I have a big problem with SolarWinds reporting reboots for servers when in fact all that happened was that the SNMP Agent was cycled on the server.  May of our users are confused about this.  Please either change the text to state what really happened (SNMP Agent cycled) or stop reporting it as a reboot when it is not.  We have implemented the hrSystemUptime device poller method which is way more likely to be an accurate reflection of an actual reboot of a server.

Any alert we propagate out based upon a snmp agent uptime counter being reset has the verbiage "Server xx has possibly rebooted".

In a previous life we watched for a specific event in the windows system event log.

Unix, you are able to issue a command line command to get the uptime...makes a nifty ssh command.

Level 11

One of the news features in the 10.7 RC could address this, being able to set the OID for the custom MIB in the Manage pollers feature would be great. We have a number of systems that generate these false negatives to the extent that we have stopped sending node reboot notifications in the ticket system.

Level 18

While you certainly can implement your own OID for hrsystemuptime instead of the default;

and while you can now (in 10.7) replace one for the other...

I also have never understood why Solarwinds hasn't changed this one. It seems like a  no-brainer. Does anybody actually WANT the agent restart as the source for uptime? I would think you could remove it and nobody would shed a tear.