12 Replies Latest reply on Oct 29, 2015 2:11 PM by fe_iara

    Nodes failing SNMP polling

    mdriskell

      Ok I know there are posts regarding this out there but the site is still having searching issues and every time I search I get 0 results.

      I need to try and create a report or query letting me know of any nodes configured as SNMP but are now failing.  We did a group policy push to all servers recently and several of them it appears SNMP didn't restart.  I am finding a handful today that haven't collected data in 2 weeks since the push.  The CPU and Memory resources don't have any data in them since 10/16.

      I will get into how ticked I am that an SNMP monitoring system doesn't alert out of the box when SNMP is failing later but right now I need to get some kind of grasp of how many of my nodes are affected.

        • Re: Nodes failing SNMP polling
          netlogix

          I set up 3 alerts that tells me if snmp isn't working... careful though, when orion is the issue, you will get an email for every node, volume, and interface (1800 for me).

          Here it is.  Tell me if you can get it, I don't know if that works for sharing.

           

          *edit* oh, it will also alert if an interface or volume disappears too.

            • Re: Nodes failing SNMP polling
              mdriskell

              Ok so I pulled the queries out of the alerts you provided (thanks) to run directly in my SQL console.  If I run the volumes query I get several results showing that volumes are not polling.  However if I run the Nodes query I get nothing back and I know for a fact that I have servers not getting SNMP data (CPU & Memory) back.

              I pulled up a server that I know is bad directly in the DB and I see that my CPULoad & Percent Memory used is -2 so it's not seeing it as Null in the DB.

              Here is the exact query stripped from the Nodes alert. 

               

               

               

               

               

               

               

              SELECT

               

               

              DISTINCT Nodes.NodeID AS NetObjectID, Nodes.Caption AS

              Name

              FROM

               

               

              Nodes

              WHERE

               

               

               

              (

               

               

               

              (Nodes.Status = '1' ) AND

               

               

               

               

              (

               

               

               

              (Nodes.PercentMemoryUsed IS NULL) OR

               

               

               

               

              (Nodes.CPULoad IS NULL) OR

               

               

               

               

              (DATEDIFF(ss, Nodes.LastSync, getdate())/Nodes.PollInterval >= 10

              ))

                • Re: Nodes failing SNMP polling
                  mdriskell

                  Not sure why it posted so strangely on here but that's the query.  Any thoughts on alterations to catch what I'm looking for. 

                  I have changed the Null to be -2 and now I show 215 results...I am testing now to see if SNMP is truly down on any of these.

                    • Re: Nodes failing SNMP polling
                      netlogix

                      The -2 should be icmp only nodes.

                      When you use this, how does it look?

                      SELECT DISTINCT Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name, Nodes.CPULoad, Nodes.PercentMemoryUsed, DATEDIFF(ss, Nodes.LastSync, getdate())/Nodes.PollInterval As MissedPolls
                      FROM Nodes
                      WHERE Nodes.Status = '1'
                        AND (Nodes.PercentMemoryUsed IS NULL
                             OR Nodes.CPULoad IS NULL
                             OR (DATEDIFF(ss, Nodes.LastSync, getdate())/Nodes.PollInterval >= 1)
                            )

                        • Re: Nodes failing SNMP polling
                          mdriskell

                          I return over 1500 results with the greatest number of missed polls being only 2

                            • Re: Nodes failing SNMP polling
                              netlogix

                              hmmm... that shouldn't be much of an issue (Although, you might want to throw a little more power at your sql and/or poller/s).  Well, they are being polled then, just nothing is collected(?).  I don't get it.  What about the volumes/interfaces?  when they are not reporting in is usually when I blame snmp.

                              How does this one look?

                              SELECT DISTINCT Nodes.Caption AS Name
                              FROM Nodes INNER JOIN Volumes ON (Nodes.NodeID = Volumes.NodeID)
                              WHERE
                              (
                                (Nodes.Status = '1') AND
                                (
                                 (Volumes.VolumeResponding = 'N') OR
                                 (Volumes.Status = 0) OR
                                 (Volumes.VolumePercentUsed IS NULL) OR
                                 (DATEDIFF(ss,  Volumes.LastSync, getdate())/Volumes.PollInterval >= 10))
                              )

                                • Re: Nodes failing SNMP polling
                                  mdriskell

                                  Good news is that looks better I spot checked a few and SNMP is failing....bad news is that it returned 130 rows.

                                  Let's see who's on call for the wintel group :)

                                    • Re: Nodes failing SNMP polling
                                      netlogix

                                      you could be nice and give him a batch file containing this the results of this:

                                      SELECT DISTINCT 'sc \\'+Nodes.Caption+' stop snmp' AS command
                                      FROM Nodes INNER JOIN Volumes ON (Nodes.NodeID = Volumes.NodeID)
                                      WHERE
                                      (
                                        (Nodes.Status = '1') AND
                                        (
                                         (Volumes.VolumeResponding = 'N') OR
                                         (Volumes.Status = 0) OR
                                         (Volumes.VolumePercentUsed IS NULL) OR
                                         (DATEDIFF(ss,  Volumes.LastSync, getdate())/Volumes.PollInterval >= 10))
                                      )
                                      SELECT DISTINCT 'sc \\'+Nodes.Caption+' start snmp' AS command
                                      FROM Nodes INNER JOIN Volumes ON (Nodes.NodeID = Volumes.NodeID)
                                      WHERE
                                      (
                                        (Nodes.Status = '1') AND
                                        (
                                         (Volumes.VolumeResponding = 'N') OR
                                         (Volumes.Status = 0) OR
                                         (Volumes.VolumePercentUsed IS NULL) OR
                                         (DATEDIFF(ss,  Volumes.LastSync, getdate())/Volumes.PollInterval >= 10))
                                      )

                                        • Re: Nodes failing SNMP polling
                                          mdriskell

                                          actually if I can get approval I can run that myself using PSEXEC.  Thanks man.

                                          Now back to my original complaint...A monitoring sofware that doesn't alert when a key monitoring component (SNMP) isn't functioning.

                                            • Re: Nodes failing SNMP polling
                                              netlogix

                                              I think the biggest draw back of Orion is also it's advantage.  Orion can do almost anything, but it is very basic until you get it fully set up and customized, but it comes out of the box with very little setup.  Although, it could be overwhelming if you had to set it all up right when you got it.  But it is a little annoying when you have to set it up to find issues only after it has happened to you once.  So difficult to say which way is better.

                                              Set up those alert I gave you and you'll get an email alert when snmp stops, but I have had times that something goes funky with the Orion server and I get a ton of emails.

                                                • Re: Nodes failing SNMP polling
                                                  mdriskell

                                                  Although, it could be overwhelming if you had to set it all up right when you got it.

                                                  Hmm...like replacing an existing Openview Installation by the end of November....

                                                   

                                                  As for the volume alert I'm finding that most of the volumes that are failing are ones that no longer exist and I simply need to remove them.  Once again if this were to happen with an interface it would show unknown.

                                                  I wasn't aware of this particular issue before because I managed primarily networking gear....if SNMP failed the interfaces went unknown and it was very easy to see you had a problem.  Now at my new employer I am managing over 2000 servers and not knowing SNMP is failing could be catastrophic.

                              • Re: Nodes failing SNMP polling
                                fe_iara

                                Hello Everyone, I saw in another post that field LastSystemUpTimePollUtc can say about snmp state, so I created this query that check this variable and desconsider nodes down, nodes in maintanance.

                                I created a custom SQL alert using this:

                                 

                                SELECT Nodes.NodeID, Nodes.Caption FROM Nodes

                                where nodeid in (select top 1 nodeid  from Nodes where ObjectSubType != 'ICMP' and UnManaged != 1 and Status != 2 and   (DATEDIFF(MINUTE, dateadd(hh, -2, LastSystemUpTimePollUtc), getdate()) > 60) order by LastSystemUpTimePollUtc)

                                 

                                It works for me.