nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
- SolarWinds Academy
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

Response time stats stop recording.

iunderwo

Crew,

I've got an odd problem I've been seeing. Every now and again, response time / packet loss statistics stop being recorded to the database...this is cured by restarting the NetPerfMon service, but I'm not sure what's causing the problems.

I'm using v7.1 and am using Enterprise SQL on another machine. Both have 2.5 GB RAM and located near each other. (Separated by redundant firewall/switch network)

// Ian Underwood - Service Management
// Level 3 Communications

Find more posts tagged with

Accepted answers

All comments

Network_Guru

I haven't seen this myself (although I do see multiple devices with "no response" as per Jonchill's post),
but usually there will be an indication of what's wrong in the task manager. Either a memory or CPU hog
will be running. I also check the event log on the server for anything meaningful.

There are also some new log files in the: ...\Program Files\SolarWinds\Network Performance Monitor V7
directory - PollDebug.txt

-=Cheers=-
NG

iunderwo

I don't have that debug file in my directory.

// Ian Underwood - Service Management
// Level 3 Communications

iunderwo,

You may want to check out the "Interact with Desktop" setting on the NPM Service!

Go to Start, Programs, Administrative, Services
Right click on the Solarwinds NPM Service and bring up properties
Click on the Logon tab.
Make sure the Allow Service to Interact with Desktop is UNCHECKED!
If it is checked, uncheck the box and restart the NPM service.

There was a flaw in the Wizard in previous versions that would install this service with Interact checked. With this checked, when the System Manager is closed, a cancel signal was leaked to the service. Under heavy load, the service would suspend polling.

Supposedly this was fixed in 7.1GD, but it could be as part of an upgrade or beta test that you might have been running, it was left checked.

BK
Nobody Special

iunderwo

BK,

I looked through services and I had interact unchecked. <shrug> Looking back at the statistics, it looks like they came to a stop after the daily maintenance was completed.

// Ian Underwood - Service Management
// Level 3 Communications

iunderwo

I noticed the same problem again this morning, except that all stats collections stopped. This is really starting to get me agitated.

I think SW needs a watchdog function for this so that it can either restart itself or page out in the event of this kind of failure so that troubleshooting information can be gathered.

// Ian Underwood - Service Management
// Level 3 Communications

Network_Guru

Have you checked your database nightly maintenance logs ?
Perhaps the polling service stops when the database maintenance fails and hangs or locks
the SQL database? You might try re-indexing each table as well, just to clean up the
database.

-=Cheers=-
NG

rwbuckn

Are you still having this problem? I was having it with 7.1 but it cleared up after upgrading to an early 7.2 beta. It's back now that I've upgraded to the production version of 7.2. I've been working with support but so far nothing. Was wondering if you found anything?

Network_Guru

Check the event viewer on the SQL server.
I just had this same issue this past week. Here is the warning message from my server:

Error: 1105, Severity: 17, State: 2
Could not allocate space for object 'InterfaceErrors' in database 'NetPerfMon' because the 'PRIMARY' filegroup is full.

Error: 9002, Severity: 17, State: 6
The log file for database 'NetPerfMon' is full. Back up the transaction log for the database to free up some log space.

Error: 1105, Severity: 17, State: 2
Could not allocate space for object 'InterfaceTraffic' in database 'NetPerfMon' because the 'PRIMARY' filegroup is full.

We have disabled transaction logging in the database, as well as increasing the Primary netperfmon database file size to
eliminate this issue. I lost a few days of data due to this (the icmp polling data was still being collected).
Our database is backed up every night, so the transaction logging is really not too critical, as the most data we would
lose if the database went south, is 24 hours .

-=Cheers=-
NG

iunderwo

Still happens, though I'm starting to reload the process daily at this point.

It kills me that this is the only part that stops...I still get my other system stats recorded, but the response time just shuts off. I've requested some kind of watchdogging from support@, mostly because there's no real way to set an alert to tell me a restart is required.

I suspect it's due to server load...as mine would bust at the seams if this were possible. (trying to get another poller to lighten the load). I've got over 12k elements.

// Ian Underwood - Service Management
// Level 3 Communications

Rob

I have had a similar issue where my graphs would have gaps in them and restarting the server would fix the problem. Unfortunatly a couple of days later exactly the same would happen again. Eventually we found the SQL server (Enterprise on seperate server) was running out of space as the DB grow function was not working correctly. To make matters worse if you increase the DB size the shrink job run weekly by the DB maintenance script resets this back down to a minimum again. We have had to set a job to run after the DB maintenance to increase the size of the DB until we can sort out the root cause. Solarwinds are working on this. Microsoft Knowledge Base Article - 305635.

Hope this helps - Rob.

iunderwo

I am happy to say that the server has been behaving more acceptably over the last few weeks after I removed about 1,000 nodes from the monitor from stuff we didn't actually use.

There is plenty of room on the drives for this since I have the DB and transaction log on separate drives...but I'll run the debug for a more extended period of time when this happens again and let you know what I've got. Hopefully SW/MS come out with a solution before the next release.

// Ian Underwood - Service Management
// Level 3 Communications