This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Agent Node Displayed in a Down Status When Up & Collecting Data

I am using the Orion Agent on a significant number of Windows systems at this point and one thing I notice on occasion is that the node will show up in a down status in Orion when it's actually up and all data-collections are functioning properly.  The only way to resolve this issue is to restart the agent on the node.

I am curious if other people are seeing this or if it's a known issue?

Parents
  • Hey byrona​, did you ever find a solution for this? I've got the same issue. Of about 80 agents deployed, 22 say the node is down but I'm still getting statistics...

    • Agents deployed on various servers (Windows 2008 R2, Windows 2012, Windows 2012 R2) - agent-initiated mode
    • Using "Agent status" for node status instead of ICMP
    • The system is collecting data from these servers (I can see volume, interface, CPU/memory statistics coming in at the correct intervals, no data gaps)
    • The availability of the system over a day is 0% because the Node Status is Down, but you can see the days statistics on the same page
    • The agent shows as Connected on the Manage Agents page but Agent status is Unknown
    • Seems to be random in what servers are affected; different subnets, different OS's, firewalls on/off
    • Removed and readded the nodes, no help
  • No, I haven't found what I would consider a solution.  When this happens I start by trying to restart the agent service on the node and see if that fixes the problem.  If that doesn't work I go into control panel, find the Agent (make sure you view by icons) and re-enter the Orion server information which seems to re-sync it.  If that doesn't work the last option is to uninstall/re-install the agent.  Like I said, I certainly don't consider any of these solutions, just work-arounds.

Reply
  • No, I haven't found what I would consider a solution.  When this happens I start by trying to restart the agent service on the node and see if that fixes the problem.  If that doesn't work I go into control panel, find the Agent (make sure you view by icons) and re-enter the Orion server information which seems to re-sync it.  If that doesn't work the last option is to uninstall/re-install the agent.  Like I said, I certainly don't consider any of these solutions, just work-arounds.

Children
  • Thanks byrona. These agents/servers haven't been cloned either. It's a brand new SolarWinds install and the agents have been newly deployed.

    Well, an update of sorts.

    I tried rebooting an agent on an endpoint server and that looked to have resolved the node status issue. I advised the server guys they'd need to restart the agent on the remaining servers. Before that happened though, we applied the latest NPM hotfix (which includes Orion HF2) and restarted the server. The good news? The agent nodes now all report with up status (without restarting those agents). The bad news? Every agent now reports a stupidly high response time (even the ones that were working fine before). My top 10 response time resource is full of agents with 30,000-28,000ms latency. emoticons_angry.png

    Edit: Scratch that. Most of them have come back to normal. However a set of agents now show Node Status is Up, but Response Time is -1ms. So they fill the top 10 lists for packet loss and response time but are still green/up.