cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Highlighted
Level 14

Better Method of Calculating Uptime

So we just made the mistake of updating (adding) our SNMP community strings on our servers and restart the snmp agents to get the settings to take.  What we failed to realize was that Orion reads the uptime of the SNMP agent as the uptime of the servers.  BAD BAD BAD.  Lots of false reboot messages to lots of people.  Needless to say, we are now scrambling to provide a REAL uptime calculation.



So we are requesting that two things occur.  First, server uptime be moved from OID 1.3.6.1.21.25.1.1 to 1.3.6.1.2.1.25.1.1, which is actual server uptime, not the agent.  Second, some additional checking occurs when Orion sees the counter reset.  On windows this could be check the event log for a Shutdown Event / Reason (this would allow Orion to flag the reboot as intentional vs. unintentional).  There has to be some similar log on linux.


Please chime in if you would like to see this sooner rather than later.



Edit - To further support my cause, other posts on the topic are below:





29 Replies
Highlighted
Level 13

Re: Better Method of Calculating Uptime

Wow John--great post. Will make sure PM sees it.

M

0 Kudos
Highlighted
Level 9

Re: Better Method of Calculating Uptime

Adding my name to the the petition

0 Kudos
Highlighted
Level 7

Re: Better Method of Calculating Uptime

Add my vote as well.

As a potential solution / work-around, adding the conditional test "time goes backwards" to the advanced trigger options might help.

This test would check the current value read from an OID against the previous value and if less than the previous value, the trigger fires.

 

just an idea.

0 Kudos
Highlighted
Level 9

Re: Better Method of Calculating Uptime

Sooner.

Starting to monitor more Linux hosts and tired of seeing incorrect uptime and in the daily Event Summary as a "Node Rebooted".

0 Kudos
Highlighted
Level 14

Re: Better Method of Calculating Uptime

Missed this recent request for the same thing:

And a similar request for APM:

If we keep raking in the votes, we are gonna be moving on up, to a deluxe apartment in the sky.

0 Kudos
Highlighted
Level 9

Re: Better Method of Calculating Uptime

Just an FYI - I'm using the performance (perfmon) counter System Up Time under 'System'. It returns a value in seconds that is accurate no matter if snmp restarts, event log service restarts, etc.

My alert is based on this value being less than 5 minutes. The idea being that if the server reboots, within the first polling cycle, Orion should detect this value is less than 5 minutes so it must have rebooted.

Plus there's the added bonus of converting this seconds into various periods. I have a report that shows the value in days, minutes, seconds, and the last date it rebooted (today's date minus the seconds).

Thoughts?

Highlighted
Level 8

Re: Better Method of Calculating Uptime

Add me to the list here as well....this is something I've brought up and requested a few years ago and still hasn't bee integrated into the product.

Currently I poll my NetSNMP and any Linux/Unix based nodes every 10 minutes for hrSystemUptime.  I then trigger an alert if the value of hrSystemUptime is 'less than or equal to' 60000 (which equals 10minutes).


This is just too much work on my end for something that should be built correctly into the product.  I also have not found an equivalent accurate OID to poll for Cisco devices so I still have this issue with the majority of my nodes.

0 Kudos
Highlighted
Level 9

Re: Better Method of Calculating Uptime

how do you do that for linux devices? I've not done that before. we have them, but I don't know them very well yet I'm supposed to monitor them. thanks!

0 Kudos
Highlighted
Level 11

Re: Better Method of Calculating Uptime

Adding my vote.

1. Reporting SERVER Uptime based on SNMP Uptime is a LIE! I cannot accurately tell my company the uptime of ANY device we have.

2. Multiple times we have experienced issues with the SNMP service on a windows machines, or the snmpd process on Linux machines and have to restart the service/process. Every time this happens, I have to ReplyAll to the alert email that gets sent out saying "I'm sorry, Solarwids Orion is reporting a false alarm. This was not a reboot, rather this was me resolving an SNMP issue."

0 Kudos