8 Replies Latest reply on Oct 28, 2010 8:46 AM by bwiechman

    Node status not accurate on home page

    bwiechman

      I have several nodes that are showing up in a warning state in the device listing on the home page. All report 100% packet loss:

      However pinging the device from the Orion box works without issues and the node details page does not show any current packet loss:

       

      I have cleared all events/alerts and the status remains the same.

      This issue is present for several nodes. Same symptoms for all.

        • Re: Node status not accurate on home page
          ET

          Have you tried [PollNow]? Does it correct node status?

          thanks

            • Re: Node status not accurate on home page
              bwiechman

              Yes I did. Same status displayed.

               

              I was digging through and I have child status rollup configured to "Show Worst Status (Interface Only)". However there are no open events/alerts on the device's interfaces either. I did clear two high bandwidth alerts that were triggered on the interfaces for the device this morning. They were triggered before I installed 10.1 RC, but would have been in the event list until earlier this morning when I cleared them. Could it be that the child interface status is not trickling up properly, or would those events not have changed the node status to a warning state in either case?

                • Re: Node status not accurate on home page
                  ET

                  Please on a ticket and attach collected diagnostics. We would like to investigate it.

                  Thanks a lot.

                    • Re: Node status not accurate on home page
                      bwiechman

                      Ticket 194151. Gathering and uploading diagnostics now.

                        • Re: Node status not accurate on home page

                          Hi Ben--

                          I see on NetSuite that development is looking into this now. Can you post the results when you get them for the benefit of the community.

                          Thanks,

                          M

                            • Re: Node status not accurate on home page
                              bwiechman

                              Development is still working through this fully, but this is what appears to happen from what I have seen.

                               

                              A small spike in packet loss causes the node to revert to a warning status for some reason.  I configured the packet loss thresholds higher than any recorded packet loss, however the nodes in this warning state did not return to a normal state at that point. After a certain period of time (greater than 24hours?) the node returns to a normal state. This behavior is actually not a bad thing as far as I am concerned - it highlights nodes that experienced a temporary issue, however how and why this is not explained as far as I can see. The packet loss the node experiences never triggers an event, never triggers and alert, and does not appear to be affected by modifying the global packet loss threshold.

                               

                              The small spike in packet loss also appears to cause the node details popup on the home page to report 100% packet loss, and also causes the node to be listed with 100% packet loss in the Top 10 type listings. Packet loss is reported at 100% despite the fact that the max reported packet loss I noted on any of the nodes that exhibited this behavior was 20% in any of the hourly/daily packet loss graphs. I presume that at least one ICMP response was not received during the graphed interval so technically I suppose there was 100% packet loss at some point. I don't know how Orion averages out a lost ICMP response to arrive at the X% packet loss value that is graphed. Also the nodes always showed the actual current latency to the node in the same popup on the home page.

                               

                              As a side note, some nodes eventually returned to a normal status, but still reported 100% packet loss in the Top 10 listings and in the home page popup.