Might want to try going to SP5 and see if the behavior continues
How many elements are you polling? I have seen this on larger implementations where we were pushing 10-12k elements on a polling engine... What seemed to happen was that during "Optimal" conditions everything ran fine with a 99% poll completion and Outstanding polls never topped 500. But during an incident Orion seemed to get choked up waiting on polls and retries to timeout and OutStanding polls went through the roof. The problem is that after the issue was resolved Orion was still hanging. No devices changed status and there was nothing to indicate there was a problem other than the CPU utilization at 100% of a single core (~12% on 8 core box),the memory useage was frozen as well, and of course if i drilled into the interface details i had no graphs.
All of sudden I am having the issue described above. We are running ORION 9.1SP4. What I know is that the server starts generating Event ID 1 over and over until the service is restarted. It quits detecting node status changes. this morning we had 4 servers down, which ORION was reporting as up.
This has now happened twice within the last week. I've had to reboot the server to rectify the issue. I am going to apply SP5 today, and see if that helps otherwise I will have to engage support. Anyone have any other ideas? Below is a screen shot of the error, and what I am monitoring. This service is running on a fairly new and powerful Blade server with 4 CPU's with 8 gigs of RAM. I am not certain this is performance related issue, but wouldn't rule it out. It is also utilizing a teamed network on a 10 gig backbone.
Network Elements 5064
Microsoft Windows Updates have a knack for the "all of the sudden" issues...
"In General" - running a repair on the installation, and reapplying the latest SP will rectify the issue.
However, if this doesn't work, please don't hesitate to contact Support.
We also suspected our TSM backups may be causing this. We had our tivoli admin scale down the files to just the essentials on that server, and so far it has been running like a champ. We also upgraded to SP5. The only files being backed up are the Cisco configs being stored by Network Configuration Manager.