nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
- SolarWinds Academy
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

SW Case # 406873 Windows Event 4001

mdriskell

Up until recently I had severe disk IO issues on our SQL server. About 2 weeks ago we upgraded the RAID and all of our performance related issues and disk IO is now well below normal thresholds. When we had the disk IO issues I would constantly see entries in the SolarWinds.Net log file that showed the following message for either SWService or AlertingEngine

Event 4001

Service was unable to open new database connection when requested.

Exception Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

I always dismissed it due to the IO constraints and expected it to go away once IO was no longer an issue. Once the RAID was installed these messages ceased on my additional polling engines and on my primary polling engine I no longer get that event in relation to SWService but I still get it several times a day on the alerting engine.

I opened a case with SW and one of the preliminary suggestions was to modify the SWNetPerfMon.db file to increase the connection timeout. I have no issue with taking this step but I do have a few concerns.

Why would the alerting engine alone be having timeouts where no other aspect of the system is? If my DB is having timeout issues I would expect to see this across the board in all my polling engines (not specifically the Alerting Engine error as that is unique to the primary poller where alerting resides). I also would expect to see problems on my poller other than just the alerting engine.

If I make these changes should I do them on all three pollers?

Will these changes remain throughout upgrades/patches/hotfixes etc or will I always have to reapply the settings.

I have asked these questions of support and am waiting a response but I thought I'd throw it out to the community for further input.

Find more posts tagged with

Accepted answers

All comments

mdriskell

To clarify here is my environment.

Three polling engines. SQL DB is on a separate server.

NPM 10.3.1

SAM 5.2.0 SP1

NCM 7.0.2

IPAM 3.0

SEUM 1.5.1

UDT 1.0.1 (installed but not currently used)

IPSLAMGR 3.5.1 (installed but not currently used)

mdriskell

Still can't get SW to identify why this is only impacting the alerting engine.

mdriskell

I increased the timeout as suggested to 600 which seems insanely high and I'm still getting the same error. This leads me to believe that it wasn't the timeout itself but rather the second part of the error regarding max pool size has been reached. I'm now having issues with alerts even triggering and this went from a mild annoyance to an actual problem.

I have a message into support but I'm afraid I may need the case escalated to someone with more knowledge surrounding the SQL connections being made by the servers.

orionshark

I am seeing the same 4001 event in my environment - did you find any solutions yet?