I have a group of agents, about 50-60, which go offline randomly. The agent is running, but in Agent Management it has communication issues, and the node shows as down. It's not a firewall or region issue, other agents communicate fine (1000's). The one thing I am finding is the C:\ProrgamData\Solarwinds directory is large (1.5 - 3+GB), and has a lot of files in it, 6,000-48,000. I'm experimenting with cleaning out old log files, and in many cases I just set the node to ICMP, uninstall the agent, re-install the agent, and put it back to Agent. Too early to say if this has fixed it long-term.
Is it possible the agent falls over / stops communicating when there are too many files in this folder? If it had an array limit, somewhere, with a dim for the array of, say, 5,000 files, and it encounters 5,000+ files, would that cause it to fall over.
Has anybody else seen this behavior? I haven't logged a call at this stage as RC2 is out and I've been testing that, and plan to go to that very soon after it is released, so obviously that will be a new agent. My agents are 2.2.860.0.
If I stop/start the service the agent will come back on-line, but goes off again a day or two later, it seems to be the same ones, hence the permanent fix seems to be a wipe/re-install, or I'm just trying a clean of the old log files on some. I was pretty shocked to see over 48,000 files in this folder on one machine.