I put "sending" in quotes, because the agents on about 20 domain controllers are all still running. At first boot up, all of the Windows domain controllers suddenly begin sending events to SEM again, and all seems well.
However, every few days, I notice that the "node health" pane on the dashboard shows that the "Last event" for ALL windows devices starts increasing from "a few seconds ago" up to minutes, then hours, then days. As soon as I log into the console and reboot the appliance, they all begin functioning again, immediately.
I've been fighting this for months with preventative reboots. Currently on version 2023.2 of SEM, but this has been a problem on every version since 2022.2 (our initial deployment). Cisco devices do not have this problem.
It appears that SEM may be closing the agent communication port off, perhaps due to memory use? Although, our appliance has a large amount of resources available (8GB memory, 2TB of storage) and they are still at less than 50% use. We average about 400 events-per-second with 30 nodes: 10 Cisco/ 20 Windows.
Any insights or similar experiences?