This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NPM individual overrides not taking effect

Hi I'm new to thwack, npm and solarwinds products in general.

We've got basic monitoring setup for the devices and ports we want to monitor for uptime, utilization, etc, but I have several devices which have higher than average response times and am getting tired of the email alerts I keep getting that they are "exceptionally high" when they are historically normal for the devices.

One is a small remote Juniper SSG 5 netscreen firewall which is behind a T1 circuit that has been just shy of flakey from day one. It constantly has response times from ~250ms to ~1275ms and we've never been able to nail down where the problem is so we've just lived with it for 15 years.

Another is on site where NPM is running with our monitor/threat prevention appliance which is watching all traffic for all vlans and running threat assessment against everything. Its external switches interfaces shows response times are usually from ~500-1750ms. Not what we'd like to see but the vendor tells us it has no impact on actual performance of the unit. We're more interested on getting notifications of failures rather than response times because it is configured for fail-open so if it fails it's not the end of our world but we do need to see a notification.

I've selected the 'Override Orion General Thresholds' and set the Warning & Critical values to what I want, but it continues to blast me with emails for these nodes. Is there somewhere I have to check to even allow overrides or something?

Thanks emoticons_happy.png

  • I'm not following. Do you mean go to Alerts > Manage Alerts and what, turn it off in Action Manager?

    I don't want to turn off the alert I still want it to alert me for all my polled devices. I want my override to kick in.

  • I suggest getting a baseline of the response times in NPM for those specific device types and then create a copy of you're current alert and modify it to only include either devices of that type or from a certain vendor and have it alert on a range based around those ICMP response times.  Once this is working you could then modify the original alert and do an exclusion for the same devices.  There also may be an option to set warning and critical levels based on this info and the baselines but I'm not sure with regards to response time.

  • Also, one other thing to be aware of is there may be a bug in the version of NPM and Orion Core that you are running.  I had went up against a similar issue in a previous discussion with thresholds falsely alerting which I linked to below.

    Critical Value Reached (Percent Loss) Triggering Falsely

    If you are still running into problems after all that I suggest opening a ticket with the support team.  They are all pretty awesome and helpful.

  • I told you to check the values set in the alerts.

  • Thanks for the reply, Waiting on my boss for our SW Customer ID so I can register and open a case emoticons_wink.png

    We're running 11.5.25300.0 and the alerts are showing the correct details, the issue is that we're not able to override on the individual nodes.

    -----Original Message-----
    From: solarwindsNPM@noreply.nr [mailto:solarwindsNPM@noreply.nr]
    Sent: Thursday, November 12, 2015 10:25 AM
    To: Me
    Subject: ALERT: High Response Time for SSG5

    ALERT: SSG5 has exceptionaly high response time. Average Response Time is 203 ms and is varying from 20 ms to 372 ms.

    http://solarwindsNPM:8080/Orion/View.aspx?NetObject=N:12


    http://solarwindsNPM:8080/Orion/Netperfmon/AckAlert.aspx?AlertDefID=11


    -----Original Message-----
    From: solarwindsNPM@noreply.nr [mailto:solarwindsNPM@noreply.nr]
    Sent: Thursday, November 12, 2015 10:11 AM
    To: Me
    Subject: RESET: High Response Time for SSG5

    RESET: Average Response Time of SSG5 is 92 ms and is varying from 20 ms to 259 ms.

    http://solarwindsNPM:8080/Orion/View.aspx?NetObject=N:12

  • The alert is canned, the out of the box Alert for "High response time":

    Description: This alert will write to the SolarWinds event log when the average response time for a node goes above 200ms and when the average response time drops back down below 100ms after being above 200ms.

    Trigger: Node | Average Response Time | is greater than | 200 ms

    Reset: Node | Average Response Time | is less than | 100 ms

    I don't have a problem with the alert's values, I want to leave them like that, what I want to do is override 3 of our nodes specifically without having to create all new alerts and do exclusions/inclusions. I am assuming this is what the "override" does.

    11-12-2015 4-55-59 PM.png

  • My guess is that you can't override the canned alerts which is why I

    suggested making a copy and disabling the original.

    On Nov 12, 2015 9:38 PM, "dda23" <

  • Did you ever get this resolved. I'm trying to do the same thing. I told the node to use baseline for response time but it doesn't appear the alert I have for response time is abiding by it.

  • I have the same query if anyone got the bottom of this