if restarting services fixes the issue, I would say the issue is with the job engine v2 service. I would review if there is any application with credentials issues (they are always down or unknown) and try to fix them, as if there are applications that don't reply to SolarWinds in a timely manner, this will create a snowball effect that will break SolarWinds.
I would also review the MSMQ in case there is any particular queue creating the issues, and it's always good to check the sdf files and Orion Permission checker.:
Installation | Consultancy | Training | Licenses
Thanks Raul, most application checks are fine most of the time (Except SQL servers which always generally have issues).
Sometimes the MSMQ has a load of orphaned files so I get them cleared but haven't checked which ones are filling up so will do that next.
We have replaced the job Engine/Collector SDF files before which seems to fix it short term but the issue usually reoccurs within a few days.
Are you using 'AppInsight for SQL' ? Did you every consider changing the poling interval for few templates in the past?
We do yes, we have 92 nodes using that particular template, polling at 5 minute intervals.
I thought we increased this from 2 minutes but that was a long time ago so can't say for certain.
While everything is normal (I mean when you restart the services), do you have a lot of application's which are in unknown (I am talking about the genuine unknowns which are happening due to communication issues)?
The reason why I am asking this is, when we do not re-mediate communication issues between SolarWinds SAM and the end node (on which the application monitoring is enabled) the same results in msmq issues (which is as well related to sdf , I understand this has been cleaned up in the past in your case), but it keeps repeating this piles up again after a couple of days due to this very own reason (communication issue).
I might not be totally correct, but you could try a couple of things:
1. Fix/ re-mediate communication issues.
2. Change the polling interval for non-critical business applications.
3. Once the above 2 have been performed, go through sdf exercise again.
4. Clean restart
Hope it helps, thanks in advance.