During the last week, my NPM 10.2 installation has started logging a couple errors every morning between 3-4am.
The first error is:
Event ID 4001 - Solarwinds.Orion.IpSla.BusinessLayer
Service was unable to open new database connection when requested.
Exception Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
The second error which occurs repeatedly every ~25 minutes after the above error:
Event ID 7031 - Service Control Manager
SolarWinds Orion Module Engine service terminated unexpectedly
Fault bucket , type 0
Event Name: CLR20r3
Response: Not available
Cab Id: 0
Problem signature:
P1: solarwinds.businesslayerhost.exe
P2: 2011.2.21.23
P3: 4f31f27e
P4: SolarWinds.Orion.Common
P5: 2.0.0.0
P6: 4c7c214f
P7: e4
P8: 13e
P9: System.InvalidOperationException
P10:
Once this condition occurs, NPM continues to run but I can no longer login to the web portal. Neither group nor individual accounts work. The login page just hangs indefinitely after entering credentials. If I have a browser session already open, I can continue to access the portal. Data collection and email alerting also appears to be unaffected so connectivity to the database still seems to be established. I have also setup a script which runs a sqlcmd to test connectivity to the database every 10 seconds and this test never fails. The database server is a separate box running SQL 2008 Enterprise and my DBA sees no issues with it.
To recover from this state, I have to reboot the NPM server. Restarting the SolarWinds services and IIS is not enough to fix the issue. Does anyone know what may be causing this?