3 Replies Latest reply on Jun 24, 2011 9:36 AM by bshopp

    Monitoring the Monitor - Part 2

    Donald_Francis

      I am posting this here instead of feature requests because well its more about a discussion on what people may want instead of an actual request.

      I have also posted about this in the past but it has come up for me again.

      I am not sure how many others besides me think it should be this way, but I fully believe that every portion of Orion should monitor itself.  I think there should be something under the hood that watches all services, pollers, modules, stats etc etc and let you know when something fails.

      Yes I know there are a couple built in alerts for nodes not polled but I do not think that is anywhere near enough, at best its just a stop gap.

       

      So to summarize, I would love to see an additional set of alerting apart from normal advanced alerts a "system health monitor" if you will.  being able to alert the Orion admin(s) of trouble on system would be a huge plus. 

      What do you guys think?

        • Re: Monitoring the Monitor - Part 2
          byrona

          I 110% agree that something like this is necessary to ensure proper function of the system.  A set of internal sanity checks that let me know if Orion or some part of Orion has kicked the bucket.

          Since this is a discussion...

          We were able to take advantage of a very good deal that SolarWinds had on the Fail Over Engines for customers that had previously paid for the Hot Standby System.  I have not yet setup the FoE but my understanding is that it has some of this functionality, does any body know if this is true?

          I realize and agree that the basic sanity checking should be part of the base system but wanted to see if there is at least some of that in FoE for the internal Orion stuff that occasionally silently fails.

            • Re: Monitoring the Monitor - Part 2
              netlogix

              I think the FoE has some health checking features.  FoE is actually another software called Neverfail, so that can meet the needs.

                I think that a second monitoring system is necessary (Who guards the guard?).  The first needs to watch the second and the second watch the first.  If Solarwinds had a secondary application that was installed on another system that can watch all the parts on Orion, but it uses a different SQL server.  Different everything.