This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Large number of deadlocks. How can I stop them?

Hi,

We have an issue. SolarWinds keeps deadlocking itself out. Our environment is 4 servers in two HA pools. One is primary and the other is ape pool. It's all VM's. The SWA keepalive will not update due to these deadlocks than all of a sudden I'm getting alerts saying that there has been no polling in over 10 minutes or no update to the database in 10 minutes. And this requires manual intervention because the service will not auto retry due to the deadlocks. Checking DPA, it stats that the deadlocks are from the administration service, the collector, and the information service. All of these seem to issue queries from all of the servers at once causing the deadlocks. When one victim is determine that server tries again deadlocking a different victim. That one retry' s and deadlocks another victim.  And the cycle basically causes a fatal error on all servers because they basically deadlocked each other out till it failed. 

I there a bug in 2020.2.4 that causes this? I can't understand it because it only happened after we upgraded from 2019. The system was stable in 2019. But due to the hacks we where forced to upgrade before we could bring our system back up. 

But it's been nothing but trouble since upgrading. Multiple calls to support without much of an answer from them. They find band aid fixes for us which only leads to more calls down the line. It looks like we keep scratching the service while the root of the issue remains there. 

I've spent over a month going through this pain with multiple calls and frustration with no fix. So many things have been tried but nothing has even touched this problem. It still remains. 

I've had Wintel involved, I've had storage involved, and our DBA's constantly reviewing the deadlocks and activity in the solarwinds DB. The answer is always the same. It's not infrastructure. It's not storage. It's the damn deadlocks. 

And our DBA's have confirmed that the deadlocks is solarwinds deadlocking itself out. Because the queries are from the code of solarwinds. Are procedures from solarwinds. I've been attempting to get a developer on the line to review this with them and get answers but support has been fighting on me telling me that it's an infrastructure issue when I proved it to them it's not. 

They don't understand that even though it's not 120% built on solarwinds recommendation that the system is working accordingly to expectation and has more than enough power behind it. They keep getting stuck on the infrastructure argument but never provide solid proof and evidence to show that what they are saying is the actual cause. This is why we hesitate to believe support.

We need solid answers. I've been running on 12 hour days reading logs and trying things to the point everything blurs. It shouldn't be this damn hard to solve a problem. Or get help. So I turn to thwack because I'm grasping at straws and really reaching. I don't know what else I can do if this is code. 

And lastly will 2020.2.5 fix the above? The release notes seem to hint that but don't spell that out. 

 can  you please help me? Or add someone who could give me some insight? I need help from someone that understands sql and tell me why this behavior is happening and what we can do to stop it. After a month of trying so many things and nothing stopping the stupid deadlocks I'm convinced its a bug in the code of 2020.2.4 but I need some confirmation from solarwinds. 

the problems only really happen during database maintenance. And while we have alot of deadlocks during the day they seem to work themselves out without failing. 

thanks!