cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

Misleading SNMP Uptime information

Jump to solution

Hello,

I have an issue with NPM incorrectly reporting reboots on Linux servers.

The problem is how snmpd on Linux interprets uptime based on the .1.3.6.1.2.1.1.3.0 OID. It measures how long the process has been alive for rather than the uptime of the OS.

# Running snmpwalk against a Linux server that hasn't been touched in a while
snmpwalk -v 2c -c public $LINUX_SERVER .1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1195874669) 138 days, 9:52:26.69

# Restart the SNMP daemon
snmpwalk -v 2c -c public $LINUX_SERVER .1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (373) 0:00:03.73

So although the server uptime hasn't changed I'm getting alerted to a reboot problem.

There is a better OID to use that represents the actual uptime of the system...

snmpwalk -v 2c -c public $LINUX_SERVER host.hrSystem.hrSystemUptime.0
HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (1195876954) 138 days, 9:52:49.54

But this counter seems to give weird results for Windows servers - this server has been up for about a month...

snmpwalk -v 2c -c public $WINDOWS_SERVER host.hrSystem.hrSystemUptime.0
HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (3254887187) 376 days, 17:21:11.8

snmpwalk -v 2c -c public $WINDOWS_SERVER .1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (325537591) 37 days, 16:16:15.91

Does anyone have any opinions about the apparently quirkiness of uptime monitoring with NPM? (or maybe the quirky nature of SNMP installations!)

Thanks

~sm

1 Solution
Level 12

we have the same problem, we have had to work around by just wrapping a process  the alert where the service desk have to check system uptime, AND last boot.  if they BOTH have a value that implies a reboot, then they investigate accordingly..  if they both do not agree, then we close the alert.

problem exists on both windows, unix and linux servers.  to get system uptime, we have a custom poller that displays ont he node details page

is a pain, so i am hoping someone can give some insight....

View solution in original post

9 Replies
Level 12

we have the same problem, we have had to work around by just wrapping a process  the alert where the service desk have to check system uptime, AND last boot.  if they BOTH have a value that implies a reboot, then they investigate accordingly..  if they both do not agree, then we close the alert.

problem exists on both windows, unix and linux servers.  to get system uptime, we have a custom poller that displays ont he node details page

is a pain, so i am hoping someone can give some insight....

View solution in original post

Level 14

Was this ever addressed, or was it acknowleged by SolarWinds?  I'm having the same problem on a Linux box using the latest/greatest NPM (v10.0)

I'd like to build a custom poller and/or configure a SNMP trap for an actual reboot, but I'd still get erroneous messages into the events screen.

Any suggestions would be helpful - -or an acknowledgement from SW that this is a known issue and is being investigated.

0 Kudos
Level 7

I've been struggling with this problem as well.  We have a mix of Cisco devices where the LastBoot trigger condition in Advanced Alerts works just fine.  However, when used with servers, I get hundreds of false triggers per day.  It doesn't matter which OS they are running.

I've tried setting up a UnDP and looking at OID: 1.3.6.1.2.1.1.3 or 1.3.6.1.2.1.25.1.1 but this does not work very well.  I've tried setting up a tranform but there is no conditional functionality.  I've tried configuring the Advanced alert to trigger when the response (in hundredths of seconds) is less than 6000 (1st minute) but this isn;'t working either. 

What is the best method to determine that a server has REALLY rebooted?  We use a mix of Linux, Solaris, Windows and even a few NetWare OS's. 

0 Kudos
Level 13

Hi All--

I searched the support logs and support recommends updating to the latest net-snmp for linux to see if this improves the situation. Have you tried that?

Also, is everyone polling for SysUpTime?

M

0 Kudos
Level 7

Hi Marie,

How old was that support note?  Which version of net-snmp?  This would only fix the issue for linux boxes though.  

I have seen a detection for a reboot/restart event in other monitoring packages where when uptime goes backwards (new value is less than previous read value) an event can be triggered - is there such a check in Solarwinds? 

0 Kudos
Level 14

adauria - I believe what you are describing is summed up here as a feature request.  Please add your vote to the request thread if this is what you are looking for.

0 Kudos
Level 13

Thanks John!

Adauris--you are right--the net-smnp fix is for Linux boxes only--thanks for catching that.

M

0 Kudos
Level 9

Thought I'd share this. For those who are using APM as well, you can monitor the System: System Up Time perfmon counter to get the value in seconds. This is what I'm doing now. I just have to find a way to change our alerts for reboots to be based on that instead of snmp service.

0 Kudos
Level 14

we only alert on our cisco boxes and even have those problems there...

0 Kudos