5 Replies Latest reply on Apr 15, 2009 7:31 AM by sfdurham

    NetPerfMon hangs

      I've seen this issue posted addressed in previous thread, but not sure what came of it.   

      One of our Orion machines NetPerfmon process hangs at random times, maybe once every 3-5 days.  Often the Polling Status icon stays green, and there's little indication that its stopped working, at least from the web interface.  But running a sniffer clearly shows its no longer sending ICMP, etc.  

      In System Manager , Polling Status, the Packet Queue ICMP & SNMP outstanding climb to big numbers, and and then seems to freeze, and nothing changes except the clock timer.

      Stopping NetPerfmon and restarting fixes it, but then Orion wants to go out and do Baseline again.

      We have Orion v9.1 SP3

        • Re: NetPerfMon hangs
          bshopp

          Might want to try going to SP5 and see if the behavior continues

          • Re: NetPerfMon hangs
            ceige

            How many elements are you polling?  I have seen this on larger implementations where we were pushing 10-12k elements on a polling engine... What seemed to happen was that during "Optimal" conditions everything ran fine with a 99% poll completion and Outstanding polls never topped 500.  But during an incident Orion seemed to get choked up waiting on polls and retries to timeout and OutStanding polls went through the roof.  The problem is that after the issue was resolved Orion was still hanging.  No devices changed status and there was nothing to indicate there was a problem other than the CPU utilization at 100% of a single core (~12% on 8 core box),the memory useage was frozen as well, and of course if i drilled into the interface details i had no graphs. 

              • Re: NetPerfMon hangs

                All of sudden I am having the issue described above. We are running ORION 9.1SP4. What I know is that the server starts generating Event ID 1 over and over until the service is restarted. It quits detecting node status changes. this morning we had 4 servers down, which ORION was reporting as up.

                This has now happened twice within the last week. I've had to reboot the server to rectify the issue. I am going to apply SP5 today, and see if that helps otherwise I will have to engage support. Anyone have any other ideas? Below is a screen shot of the error, and what I am monitoring.  This service is running on a fairly new and powerful Blade server with 4 CPU's with 8 gigs of RAM. I am not certain this is performance related issue, but wouldn't rule it out. It is also utilizing a teamed network on a 10 gig backbone.

                Network Elements5064

                Elements Nodes618

                Nodes Interfaces3407

                Interfaces Volumes1039

                  • Re: NetPerfMon hangs
                    davbyrd

                    Microsoft Windows Updates have a knack for the "all of the sudden" issues...

                    "In General" - running a repair on the installation, and reapplying the latest SP will rectify the issue.

                     

                    However, if this doesn't work, please don't hesitate to contact Support.

                    HTH,

                    David

                      • Re: NetPerfMon hangs

                        We also suspected our TSM backups may be causing this. We had our tivoli admin scale down the files to just the essentials on that server, and so far it has been running like a champ. We also upgraded to SP5. The only files being backed up are the Cisco configs being stored by Network Configuration Manager.