Up until recently I had severe disk IO issues on our SQL server. About 2 weeks ago we upgraded the RAID and all of our performance related issues and disk IO is now well below normal thresholds. When we had the disk IO issues I would constantly see entries in the SolarWinds.Net log file that showed the following message for either SWService or AlertingEngine
Event 4001
Service was unable to open new database connection when requested.
Exception Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I always dismissed it due to the IO constraints and expected it to go away once IO was no longer an issue. Once the RAID was installed these messages ceased on my additional polling engines and on my primary polling engine I no longer get that event in relation to SWService but I still get it several times a day on the alerting engine.
I opened a case with SW and one of the preliminary suggestions was to modify the SWNetPerfMon.db file to increase the connection timeout. I have no issue with taking this step but I do have a few concerns.
Why would the alerting engine alone be having timeouts where no other aspect of the system is? If my DB is having timeout issues I would expect to see this across the board in all my polling engines (not specifically the Alerting Engine error as that is unique to the primary poller where alerting resides). I also would expect to see problems on my poller other than just the alerting engine.
If I make these changes should I do them on all three pollers?
Will these changes remain throughout upgrades/patches/hotfixes etc or will I always have to reapply the settings.
I have asked these questions of support and am waiting a response but I thought I'd throw it out to the community for further input.