There are a variety of possible explanations--I'll touch the easiest ones to identify, verify, and possibly correct:
- The snmp-string(s) in NPM, which are used by NCM's Jobs, may not match with the devices. Open one in NPM and Edit it, then test the snmp communications. If the test fails, you've found the problem. Your resolution options are to either update NPM's SNMP strings for the node(s) or add the appropriate snmp strings into the nodes.
- ACL's may have been applied to the nodes that do not them to be managed by SNMP, or not be managed by your Poller(s), or a combination of the two.
- Firewall rules may have been applied/added, or changed, and could be responsible for the failure in inventory/snmp traffic.
- Bandwidth congestion between the Poller(s) and the problem nodes might be preventing successful inventory completion. Look at your NTA information or the uplink ports between Poller(s) and nodes for the time frame in which the Inventory Job runs. Verify there is sufficient unused bandwidth between the devices.
- Server Administrators may have created or modified jobs that impact the resources on your Poller(s). I found this happening when a well-meaning SAN/Backup Administrator moved a server backup job's timing to overlap with my config backup and inventory jobs, resulting in a number of nodes not completing their Inventory. Check with them and adjust your job timing or their job timing so there's no overlap.
As a result of some of these causes being outside my control (and they were done without my knowledge), I examine the Hardware Health Overview graphic on the front of NPM every day. When there's an SNMP mismatch, or when a firewall rule prevents SNMP from working, nodes show up as "Undefined".
I keep the Inventory graphic at the top of the main NPM page, too, and I click on any yellow "slice" of the green pie to learn which devices have not successfully been inventoried every night. In some cases they are devices which have not been correctly configured for snmp inventory. In some cases the nodes in this slice are Unmanaged intentionally. And some are devices that were unavailable during the Inventory Job's window, perhaps due to a WAN provider maintenance window or unscheduled outage.
In each case, you should start out with looking carefully at the individual nodes' SNMP strings internally and also at their NPM SNMP settings. If the SNMP test in NPM's settings when editing the node is successful, then you can eliminate a lot of possibilities. There's no firewall rules affecting the flow, the SNMP strings in NPM match those on the node.
And you've made progress troubleshooting the issue. Next might coming checking the timing of jobs and their overlapping with outside Server Administration scheduled jobs.
Once you discover the problems (and there may be several different causes) please share what you've learned here, so others may look to you as a leader in this area and benefit from your expertise.
Thank you Rick,
I'll keep this thread updated as I troubleshoot per your advice.
1 of 1 people found this helpful
I found the answers.
In "My Dashboards/Configs/Configuration Management," the Inventory Status tab may show a history of manual jobs that may have been performed previously. To do a manual inventory job go into the Config Management tab, click the checkbox of the node, and click "update inventory" (circled in red down below). This will take a little time and bring you back to the Inventory Status tab until the update is done by reaching 100% or coming up red with a red note. This manual job is completely separate from the "Nightly Network Inventory" job that is scheduled; these jobs can be viewed under "My Dashboards/Configs/Reports" then I group by Category, click on Node Details, choose the check box for "Last Inventory of each Device," then click "View Report." On the next page there should be a column with dates showing the last date the nodes were inventoried which should be the night before according to the job. Now if a node is not inventoried the night before you can look at a report that will show why. This is located at "My Dashboards/Configs/Jobs." Find your "Nightly Network Inventory" job and on the right-hand side there should be a "History" column with a magnify glass (See Nightly Inventory Job picture below). Click on the magnify glass to view or save the log file; in my case I needed to save the log file since it was large and then I opened it by drag and dropping it into the Firefox web browser. Scroll down or find the date for the night before and look for the node that was not inventoried. I found that my node was not inventoried because in the log it showed "ERROR: SNMPv3 - Unknown UserName." I basically had to go into my device and make sure the username matched the one I inputted into Solarwinds for SNMP polling. Hope this helps anyone else you may have been confused about inventory.
Nightly Inventory Job
Good work troubleshooting this. Congratulations on another successful piece of detective work!