nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

Polling engine stops polling without providing any error

mdriskell

I've been having sporadic issues with one of my polling engines. I'm running NPM 10.2.2 and on my primary engine polling seems to just stop even though all services are operational. I have gone through these steps from this KB but the issue keeps occurring.

http://knowledgebase.solarwinds.com/kb/questions/2517/Collector+Data+Processor+and+Collector+Polling+Controller+start+and+stop+intermittently

I'm not seeing any services stop but the poller just ceases collecting data. I'm opening a case with support and will reference this thread.

Find more posts tagged with

Accepted answers

All comments

Can you post here your case number?

How big was collector SDF files before you reset them?

Thanks

mdriskell

I did it yesterday and don't remember the sizes.

As of right now the polling controller is around 200MB, the job tracker 75MB and the Job Engine V2 61 MB

mdriskell

Case # 350334

mdriskell

Running diags now for support...as soon as I'm done I'm going to replace all the SDF files as this seems to get the system running again.

wbrown

We've seen a similar issue on one of our pollers as well. NPM 10.2.2, core 2011.2.2
We rebuilt the box thinking the OS was corrupted but the issue did not disappear with the rebuild.

A consistent symptom is that we can try to stop all services but the Job Engine v2 never finishes stopping. Reboot of the server is required.

One thing we've noticed is that the issue seems to occur anytime that OS patches are applied to the server but the server isn't rebooted.

We haven't opened a ticket on this. We figured we'd wait until we migrate to NPM 10.3 so we can skip the answers of "it's fixed in the next version".

mdriskell

JobEngine V2 will not stop using the service manager for me either. I had to kill the process using task manager before doing my repairs.

mdriskell

Really hate that the system censors my use of the word K .I .L. L

wbrown

Hah! I think it made the post funnier when I inserted my favorite 4-letter word while reading.

Apparently not a unix-oriented vocabulary in the filters.

borgan

Pardon my interjection, but I would appreciate it if SolarWinds would produce an official white paper on this issue. The tendency of these SDF files to require periodic regeneration has been around a long time. I realize that there is a KB on the topic, but a white paper could also shed light on why this happens, and even how to predict it. Could those of us with SAM benefit from using FIle Size monitoring to keep an eye on how the SDF files grow and alert us when thy reach a dangerous size?

mdriskell

So to make matters even worse the repair of the Orion Core services has truncated my trap rules again. This marks the third time this has happened (did it on my last two NPM upgrades). Specifically the Trap Detail section of Table TrapRules gets truncated down to 30 characters which breaks most of the rules we have in place. I'm beyond frustrated right now as I have team after team calling to find out why they are getting traps that should be caught by the filters we have written. Working with my DBA to restore that table.

mdriskell

Agreed in fact I'm working on a script to do it for me when the need arises.

Hi,

there was leak in Collector SDFs in some special situations, but this was fixed in 10.2.2. Maybe (just maybe) it's possible that your upgrade wasn't finished fluently and that's why you can still see this type of issue. If you are on 10.2.2 and your SDF is over 500MB, please collect diagnostic + SDF and open case. I would love to look at it.

Otherwise it's caused by something else and I can only suggest open support ticket.

netlogix

I also have this issue and have been trying to figure out how to fix it. I haven't opened a support ticket yet because I want to get it working right away so I reboot and the problem with that is then it is harder to troubleshoot. This started happening regularly (2-3 times a month) with the last 3-4 months. I am in the process of rebuilding the Server OS and splitting stuff out to see if it is load based (moving syslog/traps to Kiwi, NCM to it's own server, getting the additional website, etc - I wish I could separate SAM from NPM) after I am done with that I was going to open a support case. Now, I am also going to watch this to see what else I can steel from you guys.

RogerWong

We have killed the filter so that it no longer censors the word "kill".