node has stop responding but does not down

Hi all, few days ago one of my node has stop responding (request time out) but the node does not goes down(according to the event log, i can't see the event mention node down). As a result of this, it does not trigger my alert mention that the node is down(since it was not down in event log). After that i had try to add my laptop as a node and disconnect and my alert works well and event log also logged that node down after stop responding. Below are my print screen of the event log for my first devices that will not goes down.


My question is why my first server will stop responding but will not change to down? Is that any issues about this?Hope u guys understand my poor english thanks.

  • Do you have any parent defined on your first server? If so, and this parent is down permanently or it was, than your node goes to UNREACHABLE state instead, because of this dependency.

  • True.  i would check the group dependencies to see if the server is a child.   Also see if the server has dual connections to the network.  See what interface the node is defined to 'ping'.  Sometimes, you may think it's on one interface, whereas it's looking at a different interface on the same node.

  • jiliewch2003,

    Is it just this one node or do you have this problem with all of your nodes? Did you change any of the standard polling intervals, thresholds (Settings - Orion Thresholds), or warning level (Settings - Polling Settings - Node Warning Level)? This is probably not the problem, but just curious.

    No reason to apologize for your English, it is great!


  • Hi all sorry for late reply. FYI, the node does not have any dependencies. It was standalone, and i had try to use my laptop act as a server and test the alert that i wanted to trigger. and it works fine.

    Mav, i does not change any setting that you mention, they all are remain default settings. It was kinda weird that the server does not goes down. Is there any way to troubleshoot this issue?? Thanks in advance dude

  • Well, backward investigation is very hard.  But from your events you posted I can see that node went Down.

    Node has stopped responding (reason)

    This event is fired in case Status of node is Down. Is problem that Node was not down, or that Alert wasn't triggered?

  • Hi, but the event does not show that the node goes down,it just mention that the node was stop responding. My problem is the alert will only trigger when te node goes down but from the event te node does not goes I need to know how can th node goes down n trigger the alert

  • Hi ET, as u can see that the node was stop responding and suppose to be down next. However, the event does not show the node was down, my concern was about why the node does not goes down?Cause other devices will goes down once the node had stop responding. My concern was is that any issue that cause the node does not goes down?(as it just stop responding and does not goes down in event so i assume the node does not goes down). If the node does not goes down, it will not trigger my alert that alert me when a node goes down, where this is my second concern.

    Really thanks a lot for all of your opinion.

  • I see what you are trying to show now. You are missing event "Node ... is Down." This event is suppressed in case previous status of node was "Unknown". That's the only situation.

    Are you able to reproduce this issue also in case node was "Up" previously? If so, I would open a support ticket so we can look closer into this.

    Alerts must work in any situation and I understand your concern here.

  • Hi ET, thanks for your understanding. That node that have this issue was unable to use for testing since it was a main server for the company. As you say that the event is suppressed in case of the status was unknown previously, but as i know that Unknown was only happen for those node that have dependencies, however this node does not configure for any dependencies. Is that possible that the server was hang then the node was not going down?

    FYI, this was one of my project, so currently already handover to customer, so i might not able to to test the issue. BUt i will try to ask my customer for testing and update this issue.



  • Yes, unknown is from dependencies, but also unknown state is in case node went from unmanaged to managed and we don't have first poll yet. 

    You can simulate down node with Network Emulator Toolkit, this you can install on your orion server and simulate that node doesn't respond on ICMP, SNMP, DNS, ... .