Maybe this will help.
Ok, this is great information.
However in our situation, how do we get just one agent offline within a specified timeframe, say only alert once in a 12 hour window.
The situation we had with the flakey VPN, was the agent went offline, then back online, about every 10 minutes. In the video, the reset correlation would be agent online, and in this case would reset constantly therefore still send out emails every 10 minutes.
Hmm.... interesting case. I'm waffling between whether this is a bug and expected behavior, not exists rules always get me confused
One thing you could try is adding
InternalAgentOffline.DetectionTime < InternalAgentOnline.DetectionTime
which will at least specify that the offline should come before the online.
I think a new correlation instance in memory is getting triggered for each Offline per machine, which may not exactly be what we expect.
Our thresholded rules do have the concept of "time over threshold" (which is something like "fire this rule if you see 10 of these in 30 seconds, then tell me again if the condition is still valid after 5 minutes"), but not on a single event.
Maybe I am not explaining the event very well, as I don't thing an InternalAgentOffline.DetectionTime < InternalAgentOnline.DetectionTime would help, as the offline was already preceding the online.
Think of it like this. A PC has an agent, the connection is unplugged, therefore triggering an offline event as there is no communication from the manager to the PC. A few minutes later the connection is plugged back in, bringing the agent back online. This therefore resets the conditions. About 10 minutes later the whole process repeats, thus triggering a new alet every 10 minutes or so.
What our understanding is with the rule the way it is setup, that the Correlation is if you se an agent offline after 60 minutes trigger an alert. However if it is back online within 60 minutes do not trigger an alert. Therefore we should only see the alert once every hour if it keeps cycling up and down.
I agree, it won't eliminate them - but once you're IN the chain of events there's a small chance it might reduce them.
LEM isn't deterministic unless you tell it to, so unless you specify the EventA.DetectionTime < EventB.DetectionTime thing they could happen in any order, so you could actually have an online that came in BEFORE the offline that cancels it out (or doesn't). That 60 minutes is a sliding window before/after the first event that starts the clock.
What you described is definitely the ideal case, and I'm still duking it out with the development team as to whether it's a bug or just an artifact of the behavior of the rule. I've found a couple of other NOT EXISTS bugs myself which don't work the way I'd expect so there's still a pretty solid chance it's not working as intended either.