28 Replies Latest reply on Jun 9, 2009 10:14 AM by jeff.stewart

    Orion v9 Website Timeouts and NetPerfMonService Service Hangs

    aLTeReGo

      Is anyone else experiencing website performance timeouts with Orion v9? I'm seeing this happen in the strangest places, such as the node details page, or volume details chart. I have no idea what would be causing the issue.


      I'm noticing issues where the NetPerfMonService service hangs and won't respond to a service stop command. Killing NetPerfMonService.exe in task manager and restarting the service seems to be the only way to bring it back. One symptom that the service is hung is System Manager won't open.

        • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
          aLTeReGo

          I should point out that the issue with website timeouts that I'm having is not with all nodes or volumes, but the nodes & volumes that don't work are very consistent. The strange thing is the web interface seems pretty peppy, I just can't get to some pages for seemingly no reason at all. The website error is fairly generic

          Orion Website Error

          An error has occurred with the Orion website.

          Additional Information

          System.Web.HttpException: Request timed out.
           
          As for the service hanging issue, that's a completely unrelated issue.
            • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
              tdanner

              Let's start with the website timeout issues. In \Inetpub\Solarwinds\web.config, find the <threshold value="WARN"/> line and change the value to DEBUG. Save it. Click around the website until you hit a timeout or two. That will capture more information about what the website is busy doing when it gets bogged down. Capture diagnostics and put web.config back the way it was.

              If you already have a ticket open about this, just attach these diagnostics and let the rep know to get me involved. If not, opening one is probably the easiest way to get the big file to me.

                  • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                    tdanner

                    I have the logs - I'm looking at this now.

                      • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                        tdanner

                        I found two website timeouts in the log during the period when the log level was turned up. In both cases, it was a couple of minutes into the execution of the same query:

                        Select * From CPULoad  Where NodeID=49 AND DateTime >= '07/01/2008 00:00:00' AND DateTime <= '07/01/2008 16:06:53' Order By DateTime

                        That's not a query that should normally be taking a long time to execute. The diagnostics are showing a healthy (not excessive) amount of data in the CPU history tables. Could you run that query using SQL Management Studio or the Orion Database Manager? Time how long it takes, roughly.

                          • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                            aLTeReGo

                            That query returned results almost instantly. Maybe 1 second?

                              • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                tdanner

                                Ah, I misread the log. It wasn't a couple of minutes into executing the query - it was a couple of minutes into what comes after that query: setting up the chart axes. The algorithm for auto-scaling the chart is getting confused and going into an infinite loop.

                                I have scheduled this bug for SP1.

                                As a workaround you could use List Resources in Web Node Management or System Manager to unassign the CPU&Memory poller from that node (since we apparently can't pull that info from that node anyway). Then run "DELETE FROM CPULoad WHERE NodeID=49" to get rid of the data.

                                  • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                    aLTeReGo

                                    As a workaround you could use List Resources in Web Node Management or System Manager to unassign the CPU&Memory poller from that node (since we apparently can't pull that info from that node anyway). Then run "DELETE FROM CPULoad WHERE NodeID=49" to get rid of the data.

                                    Wow that worked like a charm! Thanks for the help. I look forward to SP1.

                                      • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                        aLTeReGo

                                        Not to sound ungrateful but how would you like me to troubleshoot the NetPerfMonService service hangs?

                                          • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                            tdanner

                                            Oops - I forgot there were two problems in this thread!

                                            To debug the hang, I'm going to need a memory dump. You can create that using Microsoft's "ADPlus" tool, which is part of Debugging Tools for Windows. You can download this from http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

                                            Once that's installed, wait for NetPerfMonService to get stuck. The open a command window to C:\Program Files\Debugging Tools for Windows (x86) and run this:

                                            ADPlus -hang -pn NetPerfMonService.exe

                                            That will probably warn you about a bunch of stuff like your default vbscript interpreter (doesn't matter) and not having debug symbols configured (also doesn't matter - I can add them after the fact). Just ignore all these warnings and it will create the memory dump directory. That directory will have a few smallish files and one .dmp file whose size will be equal to the memory size of the NetPerfMonService process.

                                            Zip up the Hang_Mode__Date_... directory and send it through support.

                                            If you want, you can uninstall Debugging Tools for Windows at this point.

                                              • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                                richard.potts

                                                Hi, we have a very similar issue with the node details hanging, however we have identified a consistency - that being the Nortel Contivities. We have in excess of 2000 nodes, and all the Cisco/Nortel Baystacks/servers/Fortigates/etc work fine, but try to drill down into a Contivity and we see the same issue:-


                                                "System.Web.HttpException: Request timed out."


                                                Does the workaround referred to above consist of some SQL activity? (namely the part where you mention: "...Then run "DELETE FROM CPULoad WHERE NodeID=49" to get rid of the data.").


                                                We have 200+ of these devices, so hopefully that will help you help us!


                                                 


                                                Thanks



                                                 

                                                  • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                                    tdanner

                                                    "DELETE FROM CPULoad WHERE NodeID=49" worked for alterego because he was only having this problem with one node (the one with id 49) and because he first disabled CPU polling for that node (which would have just added more of the problem data).

                                                    Since you have 200+ devices with this issue, fixing it that way would be really tedious. You would have to identify the NodeID values for each one, use List Resources to remove the CPU poller, and then delete the CPULoad data for that node.

                                                    If you can live with it a little longer, you might be best off waiting for the service pack to fix this properly.

                                                      • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs

                                                        Hi folks,

                                                        Was having a read through this post and we have just upgraded from Orion NPM 8.5 to version 9.

                                                        I also applied the new SP1 update, but from the Web Console we cannot access any of the NODE MANAGEMENT facilities. The request times out after a minute and we get the following error:

                                                        Orion Website Error

                                                        An error has occurred with the Orion website.

                                                        Additional Information

                                                        System.Web.HttpException: Request timed out.

                                                         

                                                        I then applied the DEBUG and got the following:

                                                        Top of Form

                                                        Orion Website Error

                                                        An error has occurred with the Orion website.

                                                        Additional Information

                                                        System.TimeoutException: The open operation did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The socket transfer timed out after 00:01:00. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

                                                        at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)

                                                        at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)

                                                        --- End of inner exception stack trace ---

                                                        at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)

                                                        at System.ServiceModel.Channels.SocketConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)

                                                        at System.ServiceModel.Channels.DelegatingConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)

                                                        at System.ServiceModel.Channels.ConnectionUpgradeHelper.InitiateUpgrade(StreamUpgradeInitiator upgradeInitiator, IConnection& connection, ClientFramingDecoder decoder, IDefaultCommunicationTimeouts defaultTimeouts, TimeoutHelper& timeoutHelper)

                                                        at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreamble(IConnection connection, ArraySegment`1 preamble, TimeoutHelper& timeoutHelper)

                                                        at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.DuplexConnectionPoolHelper.AcceptPooledConnection(IConnection connection, TimeoutHelper& timeoutHelper)

                                                        at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)

                                                        --- End of inner exception stack trace ---

                                                        Server stack trace:

                                                        at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

                                                        Exception rethrown at [0]:

                                                        at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

                                                        at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

                                                        at System.ServiceModel.ICommunicationObject.Open(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.LayeredChannel`1.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.SecurityChannelFactory`1.ClientSecurityChannel`1.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

                                                        at System.ServiceModel.Security.SecuritySessionSecurityTokenProvider.DoOperation(SecuritySessionOperation operation, EndpointAddress target, Uri via, SecurityToken currentToken, TimeSpan timeout)

                                                        at System.ServiceModel.Security.SecuritySessionSecurityTokenProvider.GetTokenCore(TimeSpan timeout)

                                                        at System.IdentityModel.Selectors.SecurityTokenProvider.GetToken(TimeSpan timeout)

                                                        at System.ServiceModel.Security.SecuritySessionClientSettings`1.ClientSecuritySessionChannel.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

                                                        at System.ServiceModel.Channels.CommunicationObject.Open()

                                                        Exception rethrown at [1]:

                                                        at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

                                                        at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

                                                        at System.ServiceModel.ICommunicationObject.Open()

                                                        at SolarWinds.NPM.Common.NPMBusinessLayerProxy.Connect()

                                                        at SolarWinds.NPM.Common.NPMBusinessLayerProxy..ctor(String host, String port, HandleBusinessLayerException exceptionDelegate)

                                                        at SolarWinds.NPM.Common.NPMBusinessLayerProxy.TryToConnect(String host, String port, HandleBusinessLayerException exceptionHandler)

                                                        at SolarWinds.NPM.Common.NPMBusinessLayerProxy.CreateNPMBusinessLayerProxy(HandleBusinessLayerException exceptionHandler)

                                                        at Orion_Nodes_Controls_NodeGrouping.Reload()

                                                        at Orion_Nodes_Controls_NodeGrouping.Page_Load(Object sender, EventArgs e)

                                                        at System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e)

                                                        at System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e)

                                                        at System.Web.UI.Control.OnLoad(EventArgs e)

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Control.LoadRecursive()

                                                        at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

                                                        Bottom of Form

                                                         

                                                        Any suggestions or comments would be great.

                                                        Many thanks

                                                    • Re: Orion v9 Website Timeouts and NetPerfMonService Service Hangs
                                                      aLTeReGo

                                                      To debug the hang, I'm going to need a memory dump. You can create that using Microsoft's "ADPlus" tool, which is part of Debugging Tools for Windows. You can download this from http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

                                                      Once that's installed, wait for NetPerfMonService to get stuck. The open a command window to C:\Program Files\Debugging Tools for Windows (x86) and run this:

                                                      ADPlus -hang -pn NetPerfMonService.exe

                                                      That will probably warn you about a bunch of stuff like your default vbscript interpreter (doesn't matter) and not having debug symbols configured (also doesn't matter - I can add them after the fact). Just ignore all these warnings and it will create the memory dump directory. That directory will have a few smallish files and one .dmp file whose size will be equal to the memory size of the NetPerfMonService process.

                                                      Zip up the Hang_Mode__Date_... directory and send it through support.

                                                       



                                                      I opened support ticket case #51976 - "Solarwinds NetPerfMonService.exe Hangs" and uploaded the requested debug information via LeapFile. If there is anything else you need please don't hesitate to ask.