16 Replies Latest reply on Aug 20, 2009 5:22 PM by RhodanNZ

    Alert suppression and dependencies

      I am busy setting up ipmonitor v9 to monitor our environment. I need to know the correct way on how to suppress alerts.

      For example, I have a 2003 Server, where I am monitoring disk space, windows services and ping. When the server is rebooted or goes offline, I only want the ping alert to come through once when it goes down, and once when it is back up.

      How do I achieve this?


      Thanks

        • Re: Alert suppression and dependencies

          We are using WMI for the monitors. WOuld this affect how the dependency / alert supression functions?

          The reason I ask is because, say we have the ping monitor setup as a depency. We also have another monitor for a Windows service. The monitor for the Windows service checks in and cannot reach the server, therefore it fails and sends an alert because the ping monitor has not checked in yet to fail.

          How can I solve this?

            • Re: Alert suppression and dependencies
              Peter.Cooper

              Here's an document that covers the entire subject (it's for a previous version but the content is quite useful). Have a quick read on the section "Effective Timing and Notification Control Parameters". It discusses making the ping monitor test & fail faster than the other monitors (by polling at half the frequency of your service monitor).

              ipMonitor, Configure Groups and Dependencies Tutorial

              1 of 1 people found this helpful
                • Re: Alert suppression and dependencies
                  simonpt

                  Hi Peter

                  Do you recommend the values given in that document?  They're quite a bit smaller than the default values, aren't they?

                  Rgds, Simon

                    • Re: Alert suppression and dependencies

                      Thanks for the reply Peter. I have it working as per the document.

                      One question I have now is related to the Maximum Test Duration. What is the recommended setting for that? Does it affect the test times of up/warn/down etc. if it's sent too long / too short?

                        • Re: Alert suppression and dependencies
                          Peter.Cooper

                          Hello Rhodan,

                          What is the recommended setting for that?

                          Personally, I will only change Max Duration on monitors I have alerts on (and I care about). SNMP will often be 30 seconds, http-head 30 seconds, html pages around 120. Ping never needs to be touched. I leave user experience monitors as-is (because I want to give the system the best chance to prove it is doing its job).

                          If you find the notifications are not coming in at the intervals you expect, set Max Test Duration to the number of seconds you consider the underlying service "dead" at.

                          Some people interpret max duration as a tool to help couple performance with pass/fail. This is valid in some cases (such as a web service that some other application is consuming and fails in x seconds). i.e.: Windows Communication Foundation based software often fail at the 2 minute mark.

                          I prefer to just look at the reports than be woken up / interrupted while eating! ;)

                          Does it affect the test times of up/warn/down etc. if it's sent too long / too short?

                          Maximum Test Duration will often change timing characteristics during a "fail" scenario. If a device is very slow to respond (or decides doesn't) the polling frequency will slow down to (Max Test Duration + Delay between tests while Warn).

                          We included a "downtime simulator" that will help give a better picture. You can find this inside the Edit page for a Monitor. There is a setting for "Average test duration"... Changing that value will give you a decent picture of the behavior ipMonitor would exibit in this scenario.

                        • Re: Alert suppression and dependencies
                          Peter.Cooper

                          Hello Simon,

                          Do you recommend the values given in that document? 

                          Even though this document was created a while ago, I believe the recommended values in the document are perfect for lightweight dependency monitors. Any monitor that's responsible for alert suppression should reach "down" well before all the other monitors. The goal is to have dependency monitors reach "down" in half the time it takes the other monitors. This will eliminate race conditions. (Also, you're going to get a few minutes warning before the board starts showing lots of red.)

                          I don't think these values in the document are good for just any monitor though. The defaults are good the way they are, especially now that we do "unlimited" monitors. If a WMI or RPC monitor is going to be set as a dependency, start a new thread so we can discuss it on the forum... fast polling RPC & WMI should be avoided for a couple reasons.

                          I'll fill you in a little more on the context around the document I referenced (which is getting outdated).

                          It was originally commissioned after reviewing a ticket where ipMonitor host's switch was accidentally unplugged, went unnoticed, and caused a few a few hundred sms/gsm alerts to get sent out in a hurry. We spent some time helping the the customer configure up dependencies, and identified / configured a dependency on anything (IP based) that ipMontior was depending on to function. We thought the subject really worth spending time on, so we beefed up our documentation and hoped that the document format would lend itself to more self-help.

                          A while later, with v9, we did upgrade our wizards to suggest / automate adding ping dependency monitors... reducing the need for the document. It still comes in handy.

                        • Re: Alert suppression and dependencies

                          I need some further advice / suggestions.

                          My monitor polls every 5min. When it fails, it changes to warn, and retests 60 seconds later. If it fails again at that point, it changes to down, and sends an email alert.

                          I want the monitor to keep polling every 60 seconds, but to only continue alerting every 15min (not every 60 seconds).

                          Is that possible?

                            • Re: Alert suppression and dependencies
                              Peter.Cooper

                              Rhodan,

                              You sound like you have 1 failure per notification. If this is true, create a specific alert for this style of monitor... and change the alert sequence on the emails to:

                              1,15,30,45

                                • Re: Alert suppression and dependencies

                                  Yes our ping monitor is on 1 failure. So I would need an alert for all our ping monitors? What if two servers went down, would there be a possibilty of alerts being missed from one of them if they are using the same alert? Would I need an alert per monitor?

                                    • Re: Alert suppression and dependencies
                                      Peter.Cooper

                                      Would I need an alert per monitor?

                                      Just one alert for the ping monitors that have this timing behavior. Once you have the alert range set (and tested) the way you want it, remove the ping monitors from your other alerts. If you doubt your changes, use the downtime simulator to verify it's what you expect.

                                        • Re: Alert suppression and dependencies

                                          Thanks for the prompt reply.

                                          So the way I understand it:

                                          1. One alert setup for all ping monitors?

                                          2. For monitoring services, we'd use the same timings and accumulation (probably 2), could I use one alert for this as well?

                                          Thanks again

                                            • Re: Alert suppression and dependencies
                                              Peter.Cooper

                                              2. For monitoring services, we'd use the same timings and accumulation (probably 2), could I use one alert for this as well?

                                              I really don't recommend 1 failure per notification unless it's something like a humidity or temperature sensor (watching for hardware damage). Hiccups in the environment will cause the monitor to flap up/warn/down, generally annoying you. If you decide to put up with flapping, your team may get desensitized to notifications... and not hustle to act on a frozen air conditioner hose!

                                              1. One alert setup for all ping monitors?

                                              Just the ping monitors that have 1 failure per notification, polling at 5 min while up (1 minute otherwise).