Hi All,
I've searched for the answer but not really found anything that describes my scenario.
We've got Orion NPM 500 (10.3) and periodically (usually overnight) one or more of my nodes will trigger an alert I have that trips when a node is not being monitored by SNMP.
The alert is based off the LastSystemUptimePollUtc column in the database and when I check the underlying table the alert correctly reflects this value.
When this occurs the only fix seems to be either a restart of all Orion services or a reboot of the polling server, restarting the nodes doesn't seem to make any difference.
Whilst the alert exists I can still go to a node, list resources and get the correct resources back so the polling server can communicate with the node via SNMP. Clicking Poll Now or Rediscover makes no difference to the alert.
The nodes that have issues are primarily split over two sites but the nodes are not all identical, one site has a Windows 2008 R2 Server, a Cisco 3750 switch and an ASA5505 (accessible through the ASA via VPN). The other site has two Windows 2008 servers (behind a Draytek 2820, no VPN)
It's almost like there's a break in internet connectivity, Orion gets upset somehow and then refuses to acknowledge they're accessible via SNMP.
Has anybody seen this before or have any idea how it can be fixed?
Cheers,
Alex