Hi,
I'm raising this topic, as I'm looking for the best way to manage SolarWinds Agent updates, considering the changes in version 2023.3.
Hopefully some of you had similar challenges and maybe came up with some good way of doing it.
As some of you may be aware, there were some critical bugs in SolarWinds Agent in the past. The worst ones we came across were the Agent causing high CPU usage on servers, which was killing the applications running on them and causing outage. It was quite difficult to explain to customers that the software that was supposed to improve uptime, was actually reducing it..
To mitigate this risk, our process so far was to update the Orion platform first, and then gradually update Agents, starting with dev servers first, then doing production few weeks later etc.
This worked well for few years, but in the last version (2023.3), we found that after the upgrade of the platform, previous versions of Windows-based Agents would no longer report any performance data (cpu, memory, disk). We found that it all works after updating the Agent to the latest version, but considering the above risks, we decided not to do that, and we have rolled-back the update.
I have since worked with SolarWinds Support, Success Manager and Product team representative, and they all confirmed that upgrading to 2023.3 requires immediate update of Agents to make them work. Developers also said that this is the only supported way in general (even for previous versions, although they worked ok). There is also no guarantee that this is the only time we'll have to update Agents immediately after platform updates, it could be that in future this process will be similar too..
We really think the Agents should be developed to support at least the N-1 version of the Platform, to give people time to update Agents at slower pace, and mitigate any bugs, but doesn't look like this is the case in the SW design.
I can also imagine there are companies out there with thousands of Agent-based servers, many of them business-critical, and I find it hard to believe that they would be updating all those agents immediately, without proper testing them first.
So my question is, considering the above, how do you manage your Agent updates to mitigate any bugs/risks?
Have you got any suggestions to the problem I described?
Thanks a lot, I appreciate any answers 
Kind regard,
Rob