This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Anyone ever have trouble with NPM not monitoring itself well?

xpowels over 8 years ago

Here's a couple examples. Recently I found this when looking at a UPS:

Um, it's not September anymore. So what is happening is that Orion is no longer getting data back when polling this node. Am I getting an error? Does the Node have a red flashing box? Am I getting a daily report that something is amiss? No on all counts. This is frustrating beyond my ability to express it.

So, how many nodes have this issue? According to tech support, there is no way to run a report to find out. I suppose I could just pull up all 2000 of my nodes one at a time to check. Seriously.

Last night I had another issue that is along the same vein but somewhat different. There was some amount of service stopping and starting on my main server, and several nodes just stopped being polled. How many? Again, no way to know. They just appear green, but there are no statistics being collected. No Application stats in SAM. No CPU. No Memory. No Disk. No network stats including latency, which leads me to believe no polling (not even ping) was occurring. Error? Down or unknown node? Nope. All happy green. Nothing wrong here. *SIGH* Rebooting an additional poller fixed the issue, but I was just lucky to stumble upon it before the Thanksgiving holiday.

So there's the rant. Now let's talk solutions:

For UnDP issues, Orion should 1) Change the Node graphic to have a flashing red box like an interface was down (or make it a global check box option), and 2) Create a report to show Top 10/All node that have UnDP that have not updated in 12 hours or some other arbitrary value.

For the times when Orion mysteriously stops polling, I'm open to suggestions. Maybe each poller should have a separate process that checks all nodes on all pollers to be sure data is populating. Run in every hour, once a day, whatever. It's not complicated, just check and see if there is SOMETHING from a node in the last hour or so.

Am I alone in seeing these issues?

Top Replies

0 lynchnigel over 8 years ago

So the issue is on the server you have assessed? Is the devices settings still set up ok?
What's the polling method SNMP -WMI - Ping?
Are all the services on Orion ok ? have you checked that they are all running using the Orion Service manager?
Cancel
Vote Up 0 Vote Down

Cancel
0 xpowels over 8 years ago in reply to lynchnigel

In the first case, yes, everything is running perfectly. The problem is that I moved from a physical to a virtual platform and changed the IP of the server. The devices that are having the issue being polled need to change their SNMP settings to allow requests from the new IPs. Easy change. But, how do I determine which systems that I need to fix? Orion gives me no way to know what UnDP pollers are failing without manually look at the last successful poll of each system.
As for the other case, I'm not sure the root cause of the polling failure. When I find you haven't been polling for hours, I try to quickly get it fixed and don't sit on the phone for hours with support. A reboot got the polling going again, so that's done. The Windows server team were the ones who shut down the services in the first place to stop getting alerts during maintenance. I have corrected them by letting them know how to disable the alerts, but this isn't the only instance of the polling starting to fail.
Cancel
Vote Up 0 Vote Down

Cancel
0 lynchnigel over 8 years ago in reply to xpowels

IF you go into the node and edit the settings you can test the node polling for the device there so should help you determine if they are working.
Cancel
Vote Up 0 Vote Down

Cancel
0 silverbacksays over 8 years ago in reply to xpowels

What you need to do is analyse which nodes have not responded to SNMP since you changed the IP address. If you have the web reports in your version, constructing this report will give you the information you need to pinpoint which devices need their SNMP modified:
1. Click 'Reports' under the Home tab (default menu configuration assumed).
2. Click 'Manage Reports'
3. Create a new report.
4. In the first window, choose the advanced selector, and select 'nodes' as what you are reporting on. Then in the where, search for the field 'Last Database Sync', choose 'is less than' and then put in todays date, then click 'add to layout'.
5. Add in a custom table, edit it, selecting at least 'caption' and 'ip address' as the fields you want to list in the table.
6. Select the other report options as required.
When you run this report, it'll show you only the nodes which have not responded via SNMP today.
If you have issues with the report, PM me with your email address and I'll send you a template you can import into your environment
Cancel
Vote Up +1 Vote Down

Cancel
0 xpowels over 8 years ago in reply to lynchnigel

For 2000 nodes? Surely, you jest......
Cancel
Vote Up 0 Vote Down

Cancel
0 xpowels over 8 years ago in reply to silverbacksays

Well, when I tried it, I only got back nodes that were unmanaged. I guess since ping is still working, the database sync field must be getting updated. Great idea though!
Cancel
Vote Up 0 Vote Down

Cancel
0 silverbacksays over 8 years ago in reply to xpowels

Hmm.. that should have worked for SNMP, not just ping updates
Last time I did something similar was in NPM 10.x. It's possible they've altered the field names in 11.x . I'll have another think!
Cancel
Vote Up 0 Vote Down

Cancel
0 lynchnigel over 8 years ago in reply to xpowels

Creating a report first and checking, depending on how many there are adjusting your strategy to suit.
Cancel
Vote Up 0 Vote Down

Cancel
0 lynchnigel over 8 years ago in reply to silverbacksays

Theres a table called Orion.NPM.CustomPollerStatus with a DateTime field there
this is the select statement for it
SELECT CustomPollerAssignmentID, DateTime, Rate, Total, RawStatus, Status, RowID, Description
FROM Orion.NPM.CustomPollerStatus
Cancel
Vote Up 0 Vote Down

Cancel
0 lynchnigel over 8 years ago in reply to lynchnigel

Obviously you would have to marry that up with the custompoller name which 'I think' this will help
SELECT CustomPollerID, UniqueName, Description, OID, MIB, SNMPGetType, NetObjectPrefix, GroupName, PollerType, CustomPollerParserID, Format, Enabled, IncludeHistoricStatistics, Unit, TimeUnitID, TimeUnitQuantity, DefaultDisplayTimeUnitID, LastChange, PollInterval, ColumnNumber
FROM Orion.NPM.CustomPollers
Cancel
Vote Up 0 Vote Down

Cancel