17 Replies Latest reply on Oct 2, 2014 2:37 AM by robertcbrowning

    pause or stop Solarwinds while in maintenance?

    akhasheni

      A quick search didn't produce a quick answer: if I intend to do firmware, driver, Windows updates that will require a restart or two on the Solarwinds server, how do I pause the thing, i.e. stop Solarwinds until I am done, and then resume?

       

      Simply doing driver updates (no restart yet) already triggered a few bogus alerts.

       

      Thanks!

        • Re: pause or stop Solarwinds while in maintenance?
          akhasheni

          Orion Service Manager - "Shutdown Everything"?

           

          Screen Shot 2014-08-13 at 10.06.12 PM.png

          • Re: pause or stop Solarwinds while in maintenance?
            aaron.damyen

            Try unmanaging the device during your outage window.  This should prevent alerts from triggering against that device.

             

            For me, when upgrading the Solarwinds platform itself, I unmanage all devices I am monitoring for the entire outage window.  Well before the outage, I notify all IT staff that Orion will be having an upgrade and all monitoring will be halted.  My users don't receive false alerts and I can reboot as necessary without problems or pollers running in the background (nothing for them to do).  There is the risk of no alert being sent during this time.  However, with the systems I'm monitoring, we usually have to wait only 2 minutes for a user to call when something goes wrong (poorman's alerting ).

              • Re: pause or stop Solarwinds while in maintenance?
                aaron.damyen

                Ignore my solution.  This is actually causing loss of data during the time window and can mess up some of the graphs. adatole's solution is much more elegant.  Hopefully, Solarwinds will implement a maintenance feature and it will probably work similarly.

                 

                I intend to implement this soon and create a sample Powershell script for my server administrators for mass maintenance mode changes.  Thus, they can still manage the maintenance setting themselves, but with a quick running script for only the devices they need.  I also intend to reuse that script for automation with our Service Desk platform when pre-defined outages are happening.

              • Re: pause or stop Solarwinds while in maintenance?
                Leon Adato

                There's a few aspects to your question:

                 

                1) to gracefully "pause" SolarWinds, you can use the Service Management utility (Start, SolarWinds Orion, Advanced Features, Solarwinds Service Manager) - choosing "shutdown everything". Once the updates are done (assuming you haven't had a reboot, which automatically starts the poller services) you use "Start everything".

                 

                But it seems like your point is more about not re-triggering alerts, than it is about just stopping and starting SolarWinds.

                 

                At one end of the spectrum, you would need to have a separate event correlation engine that took in events from SolarWinds, and sent them to the alerting engine (or ticket system). The event engine would ensure that it didn't trigger the same alert for the same device, if there was already a ticket open in the first place. But not everyone has that. In fact, almost NOBODY has that.

                 

                A second option would be to set up a custom property - "SWMaint". Then select a significant node - maybe your database server or even the polling engine if you choose to monitor your polling engine WITH your polling engine (don't laugh, there are a couple of good reasons to do it!).

                 

                NOW... you have to update all of your alerts with a suppression rule

                     I'll wait for you to pick yourself up off the floor, because I am famous for saying NEVER to use the suppression tab. But in this case, it will actually do what you want.

                Every alert you create has to have the suppression tab set so that:

                Node = <my important node>

                SWMaint = Yes

                 

                Finally, set the SWMaint property to "yes" for your important node.

                 

                At this point, no alert will fire since the suppression conditions are true.

                 

                Do your update, wait for the dust to settle, and then set the SWMaint property to "NO". Alerts will now run as usual.

                  • Re: pause or stop Solarwinds while in maintenance?
                    akhasheni

                    Finally, set the SWMaint property to "yes" for your important node.

                     

                    At this point, no alert will fire since the suppression conditions are true.

                     

                    Do your update, wait for the dust to settle, and then set the SWMaint property to "NO". Alerts will now run as usual.

                    Thanks Leon. How is this technique different from just unmanaging select nodes?

                      • Re: pause or stop Solarwinds while in maintenance?
                        aaron.damyen

                        akhasheni, unmanaging a node prevents the poller from event attempting to contact the node.  Thus, there is no new information coming in concerning response time, application status or netflow details.  This causes gaps in your graphs.

                         

                        adatole's solution just disables the checking for alerts.  All the data is still collected, but the alerts just won't fire.  Using unmanage is like shutting down the power plant because you want to turn off the light in your house.  Turning off the alert by using suppression still allows all the data to come in and be analysed by the other processes.  Unmanaging the node disables all this other functionality too.

                         

                        I'd recommend attending the new Orion 102 class on Reporting and Alerting.  Understanding how alerting works (and suppression) will highlight the difference between the two methods.  You'll find the classes available for registration in your customer portal.

                          • Re: pause or stop Solarwinds while in maintenance?
                            akhasheni

                            akhasheni, unmanaging a node prevents the poller from event attempting to contact the node.  Thus, there is no new information coming in concerning response time, application status or netflow details.  This causes gaps in your graphs.

                            Not sure where this came from. What graphs do I want to keep while the Solarwinds server is in maintenance?

                            I'd recommend attending the new Orion 102 class on Reporting and Alerting.  Understanding how alerting works (and suppression) will highlight the difference between the two methods.  You'll find the classes available for registration in your customer portal.

                            The original question was about the maintenance of the Solarwinds server, not managed devices. Leon Adato's excellent suggestion applies to the latter, i.e. helps solve a different problem than mine. In my situation, unmanaging the server is quite a bit simpler and achieves the same goal.

                             

                            (Thanks for the suggestion to get the 102 training: already on my "to do" list.)

                        • Re: pause or stop Solarwinds while in maintenance?
                          cmatrask

                          Hi Leon,

                          Can I ask you a question? I've been told that if I am taking my main poller to do any maintenance, that I should take the SolarWinds services down on the additional pollers first. Then I can take down the main poller. This includes disabling alerting in the alert manager.

                           

                          I was told I should take do this specially if I am doing upgrades to the pollers also.

                           

                          Is this still true?

                           

                          If I don't have to take the other pollers down this would be less work! LOL!!!

                          Thanks much!
                          Cheryl

                        • Re: pause or stop Solarwinds while in maintenance?
                          cmatrask

                           

                          • Re: pause or stop Solarwinds while in maintenance?
                            robertcbrowning

                            One point to remember, whichever option is chosen, is that when the services eventually restart (by Start Everything or rebooting the server) AND suppressions/disables/etc are removed, the system will check whether any alerts should be triggered and it will trigger them.  If the action is to send an email AND the alert trigger condition was also valid before the outage, these emails will be duplicates of those already mailed.

                             

                            FYI, There's also a great thread on muting alerts on specific target nodes. see

                            Re: TIPS & TRICKS: Stop the madness! Avoiding alerts but continuing to pull statistics.