5 Replies Latest reply on Feb 13, 2013 11:43 AM by Network_Guru

    Hardware Health Monitoring - Issues

    jonchill

      I've just upgraded both of our NPM servers to 10.4 and first thing I did was add the hardware monitoring resources to our custom views but I seem to be getting mixed results.

       

      Switches

       

      Cisco 6509    Hardware Details - Power Supply V fine.   

                          Hardware Details - Temp chart error - Index was outside the bounds of the array.

                          Hardware Health - Power Supply V error -  Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.    

                          Hardware Health - Temp chart - Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.  

      Cisco 2950, 2960 and some 3750's - Nothing at all just but I have got in event summary 2 SAM Hardware types up and 2 SAM Hardware senor up 1 SAM Hardware up but nothing's being displayed.

      Cisco 3750x  Hardware Details - Temp graph fine

                          Hardware Health - Temp graph error Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'. 

       

      None of my switches show the Hardware Details like I'm now seeing for the servers.

       

      Servers

       

      As far as I can see none of the servers had the Hardware Health for Servers ticked in list resources, unlike the switches which automatically got ticked as part of the upgrade I presume. Is there anyway of automating the list resources Hardware Health to be ticked without going to each server individually?

       

      I now have this displaying on my servers once the resource was ticked but I don't have any fan, power or temp graphs. Although the Event Summary is now showing 33 SAM Hardware Sensor up, 7 SAM Hardware type up and 1 SAM Hardware up

       

      Hardware Details

       

      ServerHardware StatusUp
      Manufacturer Dell Inc.
      ModelPowerEdge M610
      Network deviceService Tag27RPF4J
      Last Poll Time11/22/2012 10:12:06 AM

       

       

      Just to add all the Health monitoring resources are selected for each of the custom views I have for our switches and servers.

       

      Can someone give me some help to resolve the above as its a good feature to have once it works correctly?

       

      Thanks

       

      Jon

        • Re: Hardware Health Monitoring - Issues
          jonchill

          Update - I've managed to now get some devices reporting correctly. I've changed my Cisco switch node details view to tabs with all chart health monitoring on a separate tab and it it seems to of resolved the issue for the majority of devices. There are still a few that are not displaying anything.

          • Re: Hardware Health Monitoring - Issues
            Network_Guru

            I upgraded to Orion 10.4.1 2 weekends ago and found out this past weekend that the Hardware Health monitoring was spiking the CPU on a pair of fully loaded Cisco 6513 chassis to 95% every 600 minutes (my node rediscovery polling interval). This had the affect of bringing down WCCP and bypassing our WAN accelerated Riverbed sessions.

             

            Based on this, I suggest there be a message added to the release notes that upgrading to this version can impact the CPU preformance on your network devices when polling for(rediscovering) Hardware Health monitors. As well, there should be an option to disable this globally, rather than having to go through 100's of nodes to disable it one at a time.

             

            Additionally, it seems the node rediscovery polling interval can only be set to a maximum of 999 minutes.

            For static environments like Data Centers, this is too aggressive. Can we not have the ability to set this in Hours or Days?

            I would run this only once a week at most, and possibly only once a month.
            Even better would be the option to disabble it completely and only run it manually when required.