12 Replies Latest reply on Sep 17, 2014 9:06 AM by rharland2012

    Dependency checking

    Andy McBride

      Hi all,

       

      I understand that Orion checks for the status of the parent when a child entity goes down. What I don't know is how it checks. Does it look at the status in the database or does it ping the parent to see if it is up?

       

      Thanks!

        • Re: Dependency checking
          Leon Adato

          While this may have changed slightly in recent versions, what I understood it to do was:

          1. Child node went down
          2. SolarWinds notes that it is the child of *something* and injects a delay of 1.5 polling cycles before doing anything
          3. after the delay, SolarWinds checks the child node's current state in the database (up/down/etc), and checks the parent node status in the database (up/down/etc)
          4. SolarWinds then updates the child node's status (down/unreachable)

           

          What SolarWinds does NOT do is initiate an additional poll to check the parent at the time the child node is seen to be down.

           

          Again, my information on this may be out of date. I would love anyone's comments to the contrary.

          • Re: Dependency checking
            Andy McBride

            Checking the DB to determine if a parent is down seems like a bad solution. If this is the case I would expect the "supperession" functon to work randomly.

            • Re: Dependency checking

              Up/Down status is reported via ICMP (ping). If it gets marked as down, then it will be listed in the database as Down in the Nodes Table.

                • Re: Dependency checking
                  Andy McBride

                  Yes, ICMP status is marked in the database. My quesstion is that then a child entity is down does Orion ping the parent for status or look at the last status record in the database. These are  two different things.

                    • Re: Dependency checking
                      Leon Adato

                      Andy: I know exactly what you are saying and it was my criticism as well - a "better" (ie: more accurate) method would be to say "oh, serverA is down? Let me ping it's parent right now. Nope, it's down too. OK. Mark ServerA as "unreachable". Now, let's go see about ServerA's parent's parent...."

                       

                      However, SW took another tack which was to insert an extra polling cycle delay (ie: 2 more minutes). That gives the regular polling cycle time to evaluate the parent node and get it's status before tagging the status of the child.

                       

                      I have to consistently remind my  colleagues (the network and server teams) that SolarWinds is not SmokePing. That said, it's upstream analysis is not as robust as I would like it.

                       

                      Once again - my information is about 18 months old. SW could have changed the process since then. I'm hoping someone "official" will comment.

                        • Re: Dependency checking
                          Andy McBride

                          That makes sense. Not that I like it a bunch.

                          • Re: Dependency checking
                            rharland2012

                            I know this is an old thread, but is this still the mechanism for dependency checking?

                            Thanks for any info.

                              • Re: Dependency checking
                                cobrien

                                I believe what Leon wrote is still accurate.  I'll confirm with dev and get back to you.

                                  • Re: Dependency checking
                                    rharland2012

                                    Thank you very much!

                                      • Re: Dependency checking
                                        cobrien

                                        One of our QA guys explained this to me.  It doesn't work as Leon described but instead works like so:

                                         

                                        • NodeA has a dependency to a parent Objects (node, interfaces or group)
                                        • NodeA didn’t respond ion the latest ICMP/SNMP status poll
                                        • Orion Poller starts the FastPolling – the node is marked Warning and during Warning Interval the node is being polled by Orion each 10 sec
                                        • If the Node still doesn’t respond the Orion checks is the Node is CHILD in some dependency, are all its parents DOWN (or unreachable).
                                        • If all its parents DOWN then the node is provided with the UNREACHABLE status
                                        • If not – the node gets the DOWN status.

                                         

                                        In general as I know the Dependency check (is child? Is parent?) done always (independently on is node responding or not) – it’s juts built-in to the Nodes.Status update.

                                        • Re: Dependency checking
                                          cobrien

                                          In addition to the above, there is a nuance when the parent goes into WARNING.  For example:

                                          1. NodeA is a parent of NodeB
                                          2. NodeB stopped responding and marked WARNING
                                          3. NodeB is poller more often (each 10 sec during 120 sec) and is kept in WARNING all that time
                                          4. NodeA in the meantime is also being polled each 120 sec.
                                          5. If NodeA also stops responding while NodeB is in WARNING, NodeA ALSO will be marked WARNING.
                                          6. So at this moment we have – NodeA (WARNING)->NodeB (WARNING).
                                          7. Until NodeA is not set to DOWN by Orion – NodeB will NOT be marked DOWN. Even if 120 sec (WARNING interval) has passed for NodeB – the NodeB has the WARNING status. This means Orion system really is waiting for the NodeA’s EXACT status.