cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 21

Agent Node Displayed in a Down Status When Up & Collecting Data

I am using the Orion Agent on a significant number of Windows systems at this point and one thing I notice on occasion is that the node will show up in a down status in Orion when it's actually up and all data-collections are functioning properly.  The only way to resolve this issue is to restart the agent on the node.

I am curious if other people are seeing this or if it's a known issue?

21 Replies
Level 11

Bump!

Same issue/symptoms occur on our side.

Thanks!

0 Kudos

What version are you currently running?

Hi aLTeReGo,

Version: NPM 12.2 (2017.3.5), NCM 7.7, SAM 6.6.0, SRM 6.6.0, VMAN 8.2.0.

Also submitted case with Agent diagnostics and Server Diagnostics: 00413386

Thanks!

0 Kudos

Without looking through your diagnostics I can't say definitively, but I am reasonably confident that the issue you are seeing was resolved in the 2018.4 release. Note that 2019.4 is now available, so it has been more than two years since you've upgraded.

I just had a similar situation where after we applied Microsoft patches last Thursday night the system showed down, but was still collecting data. I went in and made sure the Windows Firewall rule was set to Any on the protocol for the agent executable and then restarted the agent. Cleared it up. Since the system was rebooted on Friday a restart of the agent alone did not clear the problem.

0 Kudos
Level 14

I have a ticket open on this.

The Windows agent is junk. Lets just CTJ and admit it.

0 Kudos
Level 21

I now have SW Case #1179502 open on this issue.  I uploaded diagnostics from my primary polling engine as well as the APE that the nodes in question were attached to.  I was also able to get one of the agents set to debug mode while it was in the broken state and get some logs from it; those have also been uploaded.

Good luck. The agents here seem to have settled themselves after a while but it's definitely been a headache. Keeping an eye on it and will raise a support case to send logs through if they go haywire again.

0 Kudos

I look forward to you coming up here and finish resolving on our instance shuth

You know you love the large agent instances..... Agent initiated over a NAT especially

I had an issue yesterday where I couldn't get an alert to notice a custom property. It kept saying it's not in the format necessary. I double, triple checked everything and it was driving me nuts. Ended up restarting the Orion module engine on the primary, and ape. That resolved the issue -- bizarre! That though caused most of my agent managed nodes to report as down briefly. I listed the resources and changed status and response time to ICMP since they all allow ICMP. At some point I'll probably restart that service again to test if the nodes still falsely report as down or not..

Goodluck with support byrona​.

0 Kudos
Level 21

I just had a bunch of agents do this last night all at the same time.  Calling SW support now to open a ticket on this, the amount of time i am spending restarting agents is getting out of hand.

0 Kudos

I've seen this happen a few times, and quickly worked around it by either restarting the agent service or initiating a manual poll of the node. I only have a handful of agents however, all agent initiated. It does seem like it was related to reboots, reconfigurations etc., just a hunch though like you said . I'll keep an eye out if it happens again as I'm now on the latest versions with the most recent hotfixes.

0 Kudos
MVP
MVP

Hey byrona​, did you ever find a solution for this? I've got the same issue. Of about 80 agents deployed, 22 say the node is down but I'm still getting statistics...

  • Agents deployed on various servers (Windows 2008 R2, Windows 2012, Windows 2012 R2) - agent-initiated mode
  • Using "Agent status" for node status instead of ICMP
  • The system is collecting data from these servers (I can see volume, interface, CPU/memory statistics coming in at the correct intervals, no data gaps)
  • The availability of the system over a day is 0% because the Node Status is Down, but you can see the days statistics on the same page
  • The agent shows as Connected on the Manage Agents page but Agent status is Unknown
  • Seems to be random in what servers are affected; different subnets, different OS's, firewalls on/off
  • Removed and readded the nodes, no help

This condition can occur as a result of duplicate agents. E.G. The machine where the agent is installed is cloned and have the same SID/hostname. As the result, 2 agents have the same GUID and both are trying to connect to AMS.

Not sure if it's wroth noting or not but we do have over 300 agents deployed.

0 Kudos

What version of the Agent do you use? Does it happen when Poller computer or AMS service was restarted?

0 Kudos

While I can't say 100% it generally does seem often related to a node rebooting or a poller service restart.

Thanks for that information aLTeReGo​, in the many cases where I have seen this happen it's definitely not a case of cloning and duplicate SID/hostnames.  The nodes are just running fine then something happens (not sure what) and the next thing I know I have a node that is still getting data-collections but is being reported as down.  I think I may have even opened a case with SW support on this at one point but they just helped me resolve that specific case and provided to explanation as to why it keeps happening so at this point I just fix it myself.

0 Kudos

If you encounter this issue again, please open a case with support. You should also collect diagnostics from the Agent as soon as you notice the problem occurring. These should help us identify the root cause.

0 Kudos

No, I haven't found what I would consider a solution.  When this happens I start by trying to restart the agent service on the node and see if that fixes the problem.  If that doesn't work I go into control panel, find the Agent (make sure you view by icons) and re-enter the Orion server information which seems to re-sync it.  If that doesn't work the last option is to uninstall/re-install the agent.  Like I said, I certainly don't consider any of these solutions, just work-arounds.

0 Kudos