Sorry to bump my own post
While the events log posted do not show this, I can see that in groups there are reports of this node (and others) as becoming unknown.
So what appears to be happening is that many of our nodes that are down are flipping between "unknown" and "down". When flipping back to "down" this is triggering a new email alert and it seems to happen on exactly the 15 minute mark - why?
We could change all of our alerts to say "not up" but these alerts have worked for years and suddenly now things have gone strange. We did start building dependencies recently so would that have broken things?
can you bump up the logging on the alert manager to see what it is saying?\
I think you are right.
There is probably some kind of implicit/explicit circular-dependencies at work here: Clearing an interface alert because a node is down, and a node-down alert that is being reset because an interface is down?
i.e. the implicit dependency of an interface on a node (changing an interface down to an unknown) is conflicting with the explicit dependency you have created (that changes a node from being down to unknown)
Thanks - and to make it more interesting, those nodes in question are all being restored at the moment so my ability to re-create this may be challenging.
There is definately something going on relating to dependencies in my opinion - the email alerts keep showing the same devices go from unknown to down and then back to unknown (and repeat). The interval of these status changes is exactly 15 minutes.