I have been having nodes go down occasionally, but my "node down" alert never gets triggered.
Maybe it didn't get triggered because it wasn't getting timeouts, but TTL expired instead?
Example: http://grab.by/3Wud
tylerlucas: So did you ever get any response or resolution/explanation as to why? I am taking a much closer look at some alerts I received early in the morning for our network and see this very exact thing. I never received an alert for Node1 which is a parent node of a dependency of a downstream Node2. However, I did receive an alert on Node2 which clearly shows a dependency on the upstream Node1. But when I look at Node1 I see in the ORION Events that I see the "Parent" Node1 actually reflects that ORION never received a response and that the node is "red" which to me means, DOWN, but shows the TTL expired during transit.
Interesting that the Parent Node1 never alerts, however the down stream device does stating it can't be reached. This proves that the system obviously does not think that Node1 truly is "Down" otherwise I would have received only 1 alert being on the Node1 (Parent) and not the child Node2.
Respectfully,
NetEng33