I have had this issue for the last couple of weeks, in fact I have had this issue before that. In my situation, When I upgraded to v10.4.2 when Hardware health monitoring(HHM) came to be, because I monitor servers and network devices, my loading across my pollers was not right. Before, one balanced their pollers by shear number of monitored elements, now that HHM is here, you have to spread your network nodes monitored across your pollers(because of all the SNMP going on). In the console, manage, Group By Polling engine. choose a network device and goto summary, then look at your polling details resource, make sure the poller is talking to the DB Server as expected. (Especially if you have breaks in your graphs)Also you can look to see if the MSMSG Queue is processing messages. I found that restarting the ORION Services would fix the issue for a short time, but didnt solve the issue.
In my case, I had 2 pollers on VM's with only 1 core each, I increased to 2 cores and my resource issue went away and the nodes began talking to the DB Server properly.
My users were complaining about too many alerts, stating their servers have not went down although ORION said it did, I was getting packet loss all over the place. Because if ORION talked to the DB and had packet loss when it reported to the DB Server, the entire time the Poller does not talk to the DB Server was treated as packet loss, and ORION would report that the device went down.
Hope this helps
Hope you all are doing well.
Many thanks for your swift responses.
The issue started in six nodes at that time belongs to same branch. we have nearly forty branches connected to our HQ through MPLS .
In that down six nodes one is Cisco 1841 router and other five are Cisco catalyst switches connected to this router .
Solarwinds showed all six nodes down but actually from my PC while i tried to ping , only Cisco router showed down. other five switches are pinging successfully. after one hour issue is clear and upto now no issue is there.
i hope as the solarwinds has to go first though router only but at that time as the router is not reachable may be it showed other devices also down.
However you guys have any other idea why other nodes pinging from pc other than showing down.
I can see what you're saying.. I would be interested to know how the networks behind the 1841 get redistributed back into the MPLS core.. I am taking a guess here but I assume your network maybe setup something like the below where you have these switches reside on potentially three different networks?
If this is the case I would be inclined to see if you lost any Layer3 protocols that may be used for redistribution back into the MPLS core.. this could be isolated to the router on site.. are there any logs on the device to support whether that device lost visibility of the core? it maybe worth testing this using IPSLA on the router to constantly ping another router in the MPLS core and when it happens you can see if that stopped working aswell..
Let me know if this doesn’t make any sense as I am not the best at explaining my thoughts..
All the best mate
Thanks for your deeper information, For some network devices i don't have access,
I am also waiting for my next level engineers for their proper answer for the issue.
The information given by you i hope it will make easier to track the issue. as i will get it, i will get back to you.