11 Replies Latest reply on Aug 2, 2019 11:46 AM by mwb

    Node not polled in last 10 minutes alerts receiving frequently

    rawat1991

      Hi,

       

      I have Node not polled in last 10 minutes alerts enabled in our environment. For last few days I have started receiving around 700 alerts every minute due to one polling engine showing showing up as down and last database sync also having some issues.

       

      Can you please tell how this can be fixed?

       

      Thanks,

      Ankit

        • Re: Node not polled in last 10 minutes alerts receiving frequently
          adam.beedell

          I've been meaning to fix this in my own environment for a while. Havnt done it yet, couple paths open:

           

          1. Include a check in the alert that the polling engine is connected to the DB and polling normally
          2. Create a new alert that auto-fixes SLW when the database sync drops out (perhaps restarts collector service and a couple others)
          3. Create an alert that updates a custom property when database sync drops out, add a check to the alert that that custom property isnt filled in.
          4. Create an alert that pauses actions or alerting when SLW is broken

           

          Maybe all of the above

           

          You might also want to look out for when the clocks go back - depending how you phrase your alert the time between the last poll and now will be > 10 mins for everything

           

          If you work out anything clever, let me know

          • Re: Node not polled in last 10 minutes alerts receiving frequently
            monitoringlife

            I wrote a few dashboards to keep an eye on nodes/components/elements that stop getting polled.

             

            While not an alert, it might inspire someone

             

             

            "LastSystemUpTimePollUtc" is the key I look for to know when they stop collecting SNMP/WMI metrics.  Status is the ICMP status indicator.

             

            SQWL

            SELECT TOP 1000 NodeID, Caption, ObjectSubType, Status, ChildStatus, IPAddress, E0.LastSystemUpTimePollUtc, ToLocal(LastSync) as LastSync, PollInterval, EngineID, ToLocal(NextPoll) as NextPoll, ToLocal(NextRediscovery) as NextRediscovery,  SkippedPollingCycles, MinutesSinceLastSync, E0.Engine.DisplayName as PollingEngine, E0.DetailsUrl

            FROM Orion.Nodes as E0

            WHERE

                E0.UnManaged = 0 --node has not been unmanaged

                AND E0.LastSystemUpTimePollUtc < ADDDATE('Minute', -30, GETUTCDATE())

            Order by LastSystemUpTimePollUtc

            1 of 1 people found this helpful
            • Re: Node not polled in last 10 minutes alerts receiving frequently
              David Smith

              The problem isn't with the alert it's with your polling engine. The alert is correctly telling you that the nodes on that polling engine are not being correctly polled, because the polled data isn't being written back into the database.

               

              You need to repair the APE or move the Nodes to a polling engine that is working.

               

              As suggested above by adam.beedell using custom properties and alerting will be the best way to manage the situation.