5 Replies Latest reply on Jan 27, 2011 10:32 AM by Andy McBride

    Fighting against high latency alerts

      Hi all,

      I've been into an issue here lately using the NPM alerts. We have the "High reponse time monitoring" activated here and some devices would alternatively trigger the related alert. After trying to isolate the possible origins of this, it came out to be very specific device types that were lagging in sending the echo reply back to Orion: Cisco catalyst 2940/2950 series. Even a local 2940's latency could drop up to 380ms with only two switchs hops in between (One 4507R + HP blade 3120 catalyst) when average is <1ms.

      Cool but now I try to understand why. One first test is to block alert trigger if the condition does not exist for a certain time. But here comes the question : What timeslot should I select ? My test goes every 60 seconds. If I put a value less than 60 seconds, will it relaunch the test at timer expiration or keep the configured interval ?

      Another question is on Orion's internal clock for triggering events. I suspect these alerts to happen when some rediscovery / polling happens at the same time. If my latency test happens every minute, polling every 5 minutes, rediscovery every hour, how likely will these events happen at the same time ? Is it the same clock used for all these functions ? Since 2940/2950 have very low CPU availability, it might explain why I have intermittent latency drops exclusively on these devices.

      All input is very welcome here.