7 Replies Latest reply on Jun 16, 2008 8:39 AM by videotech

    What do you do for monitoring & alerting redundancy?

      Have any of you setup a fully redundant monitoring & alerting infrastructure?  I'm not talking about the "hot-standby" setup either.  I'm trying to figure out the best way to setup our Orion environment so that it's fully redundant. This means most likely that we will have a 2nd instance of Orion at a different geographical location, pointing to it's own database.  But, the problem with this is you get dual alerts etc. So right now I'm stuck.  I'm trying to get creative to solve this problem but everything I have come up with so far has some sort of draw back attached to it.  We also own the Orion App monitor, so maybe that could be used to help this process somehow? Maybe have server #2 monitor the Orion services on server #1 and, in the event of any services failing, could start those services on server #2?

       
      Any of you doing anything similar out there?

       Thanks in advance!

      Steve

       

        • Re: What do you do for monitoring & alerting redundancy?

           No one has any suggestions?

            • Re: What do you do for monitoring & alerting redundancy?
              branfarm

              Hmm... that's definitely a tough one.  Seems like to be able to use the 2nd Orion installation to monitor the first, you would have to have the alerting engine running to trigger off anything else. The first thing that came to my mind was writing a batch file that started up the alerting engine -- of course, you'd have to have something to trigger the execution of the batch file.

            • Re: What do you do for monitoring & alerting redundancy?
              qle
              Yeah, I think you're both on the right track here.  As you said below:

              We also own the Orion App monitor, so maybe that could be used to help this process somehow? Maybe have server #2 monitor the Orion services on server #1 and, in the event of any services failing, could start those services on server #2?

              • Just configure the application monitor on server #2 to monitor the Orion services on server #1.
              • Configure an alert that triggers if certain (or all) Orion services are down on server #1.
              • Set the alert action to execute 'net start "SolarWinds Alerting Engine"' on server #2.
              • Similarly, set the reset action to execute 'net stop "SolarWinds Alerting Engine"'.
              Seems pretty simple in theory.
              • Re: What do you do for monitoring & alerting redundancy?
                qle
                Really? If that's the case, why is the following mentioned?

                Maybe have server #2 monitor the Orion services on server #1 and, in the event of any services failing, could start those services on server #2?

                It doesn't sound like the services on server #2 are running at the same time those on server #1 are running, unless I'm seriously mistaken here (which certainly wouldn't be first/last time).
                  • Re: What do you do for monitoring & alerting redundancy?
                    branfarm

                    Well, in his first post he mentions having a second instance of Orion running in a seperate location, and how that would result in duplicate alerts.


                     




                    This means most likely that we will have a 2nd instance of Orion at a different geographical location, pointing to it's own database.  But, the problem with this is you get dual alerts etc.


                     




                     The solution seems easy with Orion's ability to launch actions based on alerts, but I guess the question is how do you get the second instance to monitor the first, without sending out it's own alerts until the 1st instance is down?

                      • Re: What do you do for monitoring & alerting redundancy?

                        Hmm, any joy with this? I have/am trying to implement a similar redundancy. I am trying to implement "Site" redundancy, so that if our UK site went down, then our US site would take over.


                        There are various options for this, but one requirement would be that it would have to be automatic!


                        The problem is that there are 2 main applications here: SQL and Orion. The Orion Part can be taken care of with hot-standy servers, but the SQL part is where my problem lies. I looked at Clustering, Mirroring, and Replication.


                        I looked specifically at Mirroring with a witness server. The witness server is used for automatic failover. Mirroring also basically does replication, so that both database's are up to date.


                        So the idea is that if the "UK" site goes down, then the hot-standby servers would take over, and the mirrored database would be used instead of the "UK" database.


                        Another issue would be, how do the standby servers know what database to point to, will they still point to the "UK" database or the "US" database? I suppoes one potential solution would be to use a virtual IP address for the SQL database, so the one that is "UP" would have the virtual IP Address. So if the "UK" database ever went down then the "US" database would take over that IP address, that way the hot-standby pollers would use that Virtual IP address and poiunt to the "US" database. The whole process would be "Transparent"


                        Also what is the advantage/Disadvantage of :


                        Having seperate Pollers adding data to two seperate Database's?


                        Having a database mirrored to that we have "database consistancy"


                        What do you guys think?