1 of 1 people found this helpful
Do you have SAM and if so have you seen this article.
Yes we do have SAM, and thank you, I'll check it out now!
Alright, had a chance to look through the logs - it doesn't appear to be this, there's none of the RPC_E_CALL_CANCELED messages in there. However, I have found a few nodes polling on the wrong credentials, so it was worth a look in anyway. Thanks for the suggestion!
This is in case someone has this after me and can't find anything else that's relevant (there's a rather large red herring of terminal servers problems, patched January 2017)
We spoke with SW support who kindly pointed me towards some Tcpip parameter tuning - the article https://support.solarwinds.com/Success_Center/Orion_Platform/Knowledgebase_Articles/Tweaking_performance_of_Windows_Server.
This has helped, I guess, but didn't solve things. And it's better to do it anyway, so nothing lost.
We found we had a SQL cluster node that was [for lack of a better term] bricked, and there were a lot of RPC calls to it as apparently Solarwinds will, on seeing a down node, attempt to DDOS it in the hopes that it wakes up. We popped it in maintenance mode and that has, we think, stopped the high number of calls and it doesn't seem to have hurt it.
Other thing we've done: we had it on 16-64Gb of dynamic memory, with an average usage of about 11Gb. I thought that was fine, but since we've recently moved to dedicated hosts for our department it's come off the SCVMM cloud and we went nuts and gave it 64Gb static memory. Maybe dynamic memory wasn't helping either.
So I'm left with an intermittent issue that hasn't happened again (yet) and two support calls that refuse responsibility. Glad we're paying for Premier support
I'd be interested to know if anyone else gets/has this, and what they've done to mitigate it. I'm just glad it's running as a VM so when I hit reset it comes back in a couple of minutes...