13 Replies Latest reply on Oct 26, 2011 2:07 PM by lbyoung

    Orion Polling Not Working Correctly

    lbyoung

      Yesterday, it seemed like the Orion alerts were delayed by several hours. We saw this before and opened ticket 263939. I completed the steps again and now it looks like the ICMP scans are no longer working.  When I log into the Polling Engines, it shows ICMP is always 100% completed (8553 out of 8553) while the SNMP increments to 100% as expected.  The steps for ticket 263939 were:

      1) Go to Windows Services and make sure Message Queuing is started.
      2) Stop all Orion Services.
      3) Go to C:\Documents and Settings\All Users\Application Data\SolarWinds\Collector\Data and rename JobsTracker.sdf to JobsTracker.OLD and PollingController.sdf to PollingController.OLD
      4) Go to C:\Documents and Settings\All Users\Application Data\SolarWinds\JobEngine.v2\Data and rename JobEngine35.sdf to JobEngine35.OLD
      5) Go to C:\Documents and Settings\All Users\Application Data\SolarWinds\Installers
      6) Run JobEngine.v2.msi choose the remove option to uninstall.
      7) Run CollectorInstaller.msi and choose the remove option to uninstall.
      8) Run JobEngine.v2.msi and install this service.
      9) Run CollectorInstaller.msi and install this service.

      Today, Orion is reporting random nodes are down that were down earlier.  We had a site with a circuit issue so we lost connection to all of their devices (~30) from around 9AM-9:30AM and Orion is reporting that 2 of the devices went down at 11AM and the others never went down at all.

      Any help would be appreciated!

        • Re: Orion Polling Not Working Correctly
          mdriskell

          I have had problems with this in the past due to an overwhelmed SQL server.  TIP:  Last Poll Time in the Past

          I would check your message queues to see if they are backed up.  It could be solarwinds is polling but it is waiting on the DB to write the data.  Look under message queuing in computer management(Right click on my computer and choose manage) of your solarwinds server.  The solarwinds queues are under the private queues section

            • Re: Orion Polling Not Working Correctly
              lbyoung

              Thanks for the info.  It looks like the ICMP polling is not working.  Everytime I check the Polling Status, the ICMP is always the same:

              ICMP Status Polling Index: 8553 out of 8553

              ICMP Status Polls per second: 0

              ICMP Outstanding: 0

              ICMP Statistic Polling Index: 15406 out of 15406

              ICMP Statistic Polls per second: 0 

              The SNMP increments up to 100% then restarts, but the ICMP always stays the same.

                • Re: Orion Polling Not Working Correctly
                  mdriskell

                  If ICMP wasn't working you wouldn't get any node down/up events (regardless if they are in the past)

                  Try this pick a node and go to the node details page.

                  Look specifically at the polling details resource.  What is the next poll time and what is the last database update?  

                    • Re: Orion Polling Not Working Correctly
                      lbyoung

                      It is currently 1:55PM and it shows:

                      Poll Interval 600 secs

                      Next Poll 10:08AM

                      Rediscovery Interval: 60 Min

                      Next Rediscovery 10:55AM

                      Last Database Update: Today at 1:39PM

                       

                      I have performed the steps in the following post:

                      Last Database Update is in the past, Next Poll... More past'er'er replacing the 3 files and restarted the server after.

                       

                      Thanks again!!

                        • Re: Orion Polling Not Working Correctly
                          ET

                          Hi,

                          Is it nextPoll changing  or it gets stuck at 10:08? If it's moving, probably you have overfilled Message Queue where we keep polled data.

                          Try to open message queue and check how many messages are there.

                           

                          run compmgmt.msc

                          go to services and applications -> Message Queuing -> Private Queues

                          check queue "solarwinds/collector/processingqueue"

                           

                          if there's a lot of messages you can try purge it.

                          thanks

                          ET;

                            • Re: Orion Polling Not Working Correctly
                              lbyoung

                              The next Poll is changing.  I went ahead and Purged the queue (There were probably 100+ messages) and rebooted the server.  I waited about 10-15 minutes and logged back in and looked at the polling settings for a node (The current time is 8:08AM on 10/26):

                                Polling Interval 600 seconds 
                                Next Poll 10/25/2011 09:49 PM 
                               
                                Statistics Collection 10 minutes 
                                Enable 64 bit Counters No 
                               
                                Rediscovery Interval 60 minutes 
                                Next Rediscovery 10/25/2011 09:53 PM 
                               
                                Last Database Update 10/26/2011 07:30 AM 

                              Any ideas?  I certainly appreciate the help!

                                • Re: Orion Polling Not Working Correctly
                                  ET

                                  100 messages is really nothing, I expected something like like thousands. Your poll interval is 10 hours, a lot but .... overall nextPoll should not be in past.

                                  You can invoke PollNow on this node, and see if NextPoll is changed to correct value. Is this node currently Up or Down ( real status)?

                                   

                                  Do you have same time on poling engine? NextPoll is calculated from time on polling server.

                                   

                                  thanks

                                    • Re: Orion Polling Not Working Correctly
                                      lbyoung

                                      Current Time 11:04AM 


                                      The poll interval should be 10 min unless I'm reading something wrong.

                                      This is what I see for the node and it does not change if I do a Poll Now or Rediscover (both successful):

                                      Polling Interval 600 seconds 
                                        Next Poll 08:22 AM 
                                       
                                        Statistics Collection 10 minutes 
                                        Enable 64 bit Counters No 
                                       
                                        Rediscovery Interval 60 minutes 
                                        Next Rediscovery 08:53 AM 
                                       
                                        Last Database Update 10/26/2011 10:34 AM 

                                      The Polling engine is on the same server as the database and the time on the webpage homescreen is accurate.  I'm not sure where to locate the time for the polling engine specifically.

                                      Thanks again!

                                        • Re: Orion Polling Not Working Correctly
                                          mdriskell

                                          What does your disk IO look like?  Go to perfmon and look at your average disk queue length for your disks.  Look individually not total.

                                          The polling engine time is the system time of the polling engine.  I would suspect you are having performance issues with both the solarwinds server and the DB running on the same box.  How many elements are you monitoring?

                                            • Re: Orion Polling Not Working Correctly
                                              lbyoung

                                              Checking into the IO right now.  We are running the polling engine and database on the same server, but it is Virtual Windows 2008 with 10GB Memory and 3 Processors.  Orion is the only application running on this server.  We currently have 8553 elements and that number has been pretty consistent for the past 6 months.

                                                • Re: Orion Polling Not Working Correctly
                                                  mdriskell

                                                  8500 elements is right on the edge of the limit for one polling engine alone having the SQL server on that same server would definitely be pushing it...Disk IO is a likely bottleneck.

                                                  • Re: Orion Polling Not Working Correctly
                                                    byrona


                                                    Checking into the IO right now.  We are running the polling engine and database on the same server, but it is Virtual Windows 2008 with 10GB Memory and 3 Processors.  Orion is the only application running on this server.  We currently have 8553 elements and that number has been pretty consistent for the past 6 months.

                                                     



                                                    If your system is a virtual system, is it using local storage or is it on some sort of storage system such as NetAPP?  This will make a big difference in how you look at disk I/O.  Also, for monitoring Windows disk I/O (if you aren't already doing so) you should check out my post How To Monitor Windows Disk I/O.

                                                    For a deployment that large you really should separate your polling engine out from your database.  With a very well configured and optimized virtual environment you can run your database as a VM; otherwise you should run it as a physical system.

                                                    For polling performance indicators you should check the Polling Engines page in the WebUI and see how many outstanding polls your system has.  Also, in your database you can check the PollingCompletion column of the Engines table, the number should be in the 99% range, if not it suggests a problem.

                                                    Hope some of this helps!

                                                      • Re: Orion Polling Not Working Correctly
                                                        lbyoung

                                                        Thank you for the information.  We are looking into separating the database and polling engine to two servers, but I do not believe that is the issue.  We have not added any additional nodes in the past 3-4 weeks and this just started over the past weekend.  I was told by Orion support that as long as the polling Indexes complete without overlap, we were safe to keep the two services together.  What I do find to be unusual which I discussed earlier in the thread is that the ICMP polling never moves (it stays 100% complete) and only the SNMP polling increments to 100% complete then restarts.  I am not sure if this is a reporting error or an actual problem with the ICMP polling.