This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Monitoring devices with a lot of interfaces, unknown interfaces

I recently upgraded to NPM 9.1 SP5.  I have one device that has more than 850 monitored interfaces.  Since the upgrade, NPM has had a difficult time monitoring this device.  Whenever I add a new interface to monitor, the interface will show up in an 'unknown' state for several days, and then finally clear up.  I have removed the device from NPM and re-added it, but the problems persists.  I monitor about 70 other devices that have far less interfaces and they do not have this problem.  I did not experience this problem before the upgrade.  Are there any limitations on how many interfaces per device can be monitored?  Are there any stategies for monitoring devices with a large amount of interfaces?

  • my biggest interface load on a device is around 400 interfaces and i've never had this type of problem after updates - but your 850 is certainly a lot! you might just win a prize :)

    the first thing i suspect is how your database is performing. how is your database for Orion set up? is it on another machine and is it RAID 1-0? or local along with Orion & RAID 5?

    in the System Manager dropdown, Network Performance Monitoring Settings, did you change any Defaults for Interfaces on the tabs POLLING and STATISTICS. maybe these need bumping up a little higher.

    in Polling Status - NPM ENGINE STATUS do you have a lot of 'outstanding' polls hanging around? as you watch poll numbers change do they change-over pretty quickly or lag?



  • my biggest interface load on a device is around 400 interfaces and i've never had this type of problem after updates - but your 850 is certainly a lot! you might just win a prize emoticons_happy.png



    If we're giving out prizes here, I have one router that has 1100 interfaces on it!  Thanks to Juniper and the physical interface/logical interface separation, each interface has at least two ifIndexes associated with it.

    the first thing i suspect is how your database is performing. how is your database for Orion set up? is it on another machine and is it RAID 1-0? or local along with Orion & RAID 5?

    in the System Manager dropdown, Network Performance Monitoring Settings, did you change any Defaults for Interfaces on the tabs POLLING and STATISTICS. maybe these need bumping up a little higher.

    in Polling Status - NPM ENGINE STATUS do you have a lot of 'outstanding' polls hanging around? as you watch poll numbers change do they change-over pretty quickly or lag?



    I would agree with the assessment that it's probably your poller settings.  Use the "Polls Per Second Tuning" tool located in the Advanced Features of your Orion start menu group.  This will show you the recommended settings - I typically bump it up slightly higher than the recommended settings by about 10 or so in order to account for the future addition of monitored elements.  This way, I don't have to restart Orion each time I add a few devices.



  • the first thing i suspect is how your database is performing. how is your database for Orion set up? is it on another machine and is it RAID 1-0? or local along with Orion & RAID 5?



    I'm running the DB on a separate server, Raid 5.



    in Polling Status - NPM ENGINE STATUS do you have a lot of 'outstanding' polls hanging around? as you watch poll numbers change do they change-over pretty quickly or lag?



    I watched the status page for several minutes.  The polls seem to change-over within 1-5 seconds. 



    If we're giving out prizes here, I have one router that has 1100 interfaces on it!  Thanks to Juniper and the physical interface/logical interface separation, each interface has at least two ifIndexes associated with it.



    That is awesome!  Off the subject, do you have an easy way to locate and select new interfaces on that router?  I have found it very tedious to scroll through a list of 800+ interfaces to find and select the one I want to monitor. 



    I would agree with the assessment that it's probably your poller settings.  Use the "Polls Per Second Tuning" tool located in the Advanced Features of your Orion start menu group.  This will show you the recommended settings - I typically bump it up slightly higher than the recommended settings by about 10 or so in order to account for the future addition of monitored elements.  This way, I don't have to restart Orion each time I add a few devices.



    I'll try that.  It looks like my settings were about 10 below the recommended settings.  Thank You.

  • Wow! If no one beats 1100 interfaces on one device then I think Solarwinds should send you a T-Shirt or something!



  • That is awesome!  Off the subject, do you have an easy way to locate and select new interfaces on that router?  I have found it very tedious to scroll through a list of 800+ interfaces to find and select the one I want to monitor. 



    I wish I did.  I've requested that a filtering option be put in place during node addition so nodes with large numbers of interfaces can be dealt with easier.  As it is, if we add more MLPPP interfaces on our router, I have to sit and wait for a few minutes (don't even bother with the console System Manager here) for it to poll the entire router for all interfaces.

  • I'm running the DB on a separate server, Raid 5.

    We generally recommend using a RAID configuration that maximizes write capability, since Orion writes a LOT of data to the database. Separate server, though, is good! Especially if it's dedicated to the Orion DB.

  • I have a DSL termination router that has well over 10,000 interfaces, of which I monitor ~4500. Each DSL has an ATM and AAL5 interface, of which I only care for the AAL5. I also have this problem, where an interface displays as unknown for days. Our solution, was to place this router, and two smaller ones (~1500 each) onto a poller of their own. We have extremely lax polling intervals, and rediscover every 1440 minutes (24 hours)

    For the issue of finding new interfaces, my job is actually easy. I network discover the device daily and only chose to import AAL5 interfaces. If it already exists, it skips. If new, then it added it. Problem solved.

     

    Our database is running well, but not optimal, as it is raid 5. We have low disk queues so I assume the DB is not an issue, but rather the device being overloaded with polls. We have multiple other of this exact same router around the network with upwards of 400-600 that do not have this issue.

     

    I opened a support ticket with SW a long time ago, and was never given a real answer.

  • I dont think the bug mentioned by "am3e2445" has anything to do with the number of interfaces - I've seen this bug a few times on Orion 9.1 SP5 even on devices with only 50 interfaces - And I've also found that the only thing the corrects it is being patient (not several days - but overnight seems to cover it).

    I actually have this bug with a node right now - the Unit was rebooted due to a power outage and came back up with all interfaces listet as status "unknown" :|

  • I would agree with the assessment that it's probably your poller settings.  Use the "Polls Per Second Tuning" tool located in the Advanced Features of your Orion start menu group.  This will show you the recommended settings - I typically bump it up slightly higher than the recommended settings by about 10 or so in order to account for the future addition of monitored elements.  This way, I don't have to restart Orion each time I add a few devices.

    So far, things are looking very promising since using the "Polls Per Second Tuning" tool.  I made the adjustments yesterday morning and have added a few new interfaces.  They immediatly showed up in the correct state instead of the "Unknown" state.  I think this may have resolved the problem, but I'll give it a few more days to be certain.