This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Hardware Health Monitoring - Issues

I've just upgraded both of our NPM servers to 10.4 and first thing I did was add the hardware monitoring resources to our custom views but I seem to be getting mixed results.

Switches

Cisco 6509    Hardware Details - Power Supply V fine.   

                    Hardware Details - Temp chart error - Index was outside the bounds of the array.

                    Hardware Health - Power Supply V error -  Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.    

                    Hardware Health - Temp chart - Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.  

Cisco 2950, 2960 and some 3750's - Nothing at all just but I have got in event summary 2 SAM Hardware types up and 2 SAM Hardware senor up 1 SAM Hardware up but nothing's being displayed.

Cisco 3750x  Hardware Details - Temp graph fine

                    Hardware Health - Temp graph error Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'. 

None of my switches show the Hardware Details like I'm now seeing for the servers.

Servers

As far as I can see none of the servers had the Hardware Health for Servers ticked in list resources, unlike the switches which automatically got ticked as part of the upgrade I presume. Is there anyway of automating the list resources Hardware Health to be ticked without going to each server individually?

I now have this displaying on my servers once the resource was ticked but I don't have any fan, power or temp graphs. Although the Event Summary is now showing 33 SAM Hardware Sensor up, 7 SAM Hardware type up and 1 SAM Hardware up

Hardware Details

Hardware StatusUp
Manufacturer Dell Inc.
ModelPowerEdge M610
Network deviceService Tag27RPF4J
Last Poll Time11/22/2012 10:12:06 AM

Just to add all the Health monitoring resources are selected for each of the custom views I have for our switches and servers.

Can someone give me some help to resolve the above as its a good feature to have once it works correctly?

Thanks

Jon

  • Update - I've managed to now get some devices reporting correctly. I've changed my Cisco switch node details view to tabs with all chart health monitoring on a separate tab and it it seems to of resolved the issue for the majority of devices. There are still a few that are not displaying anything.

  • Hi,

    please can you create a support ticket for this issue.

    Thanks

    Dalibor

  • I upgraded to Orion 10.4.1 2 weekends ago and found out this past weekend that the Hardware Health monitoring was spiking the CPU on a pair of fully loaded Cisco 6513 chassis to 95% every 600 minutes (my node rediscovery polling interval). This had the affect of bringing down WCCP and bypassing our WAN accelerated Riverbed sessions. emoticons_shocked.png

    Based on this, I suggest there be a message added to the release notes that upgrading to this version can impact the CPU preformance on your network devices when polling for(rediscovering) Hardware Health monitors. As well, there should be an option to disable this globally, rather than having to go through 100's of nodes to disable it one at a time.

    Additionally, it seems the node rediscovery polling interval can only be set to a maximum of 999 minutes.

    For static environments like Data Centers, this is too aggressive. Can we not have the ability to set this in Hours or Days?

    I would run this only once a week at most, and possibly only once a month.
    Even better would be the option to disabble it completely and only run it manually when required.

  • Here are 2 examples of the CPU on a Cisco 6500 platform (Upgrade to 10.4.1 on Jan 26/27th):

    Last 30 days - every hour, fully loaded 6513 Chassis L2/L3:

    CPU2.jpg

    Last 30 days - every hour, 6509 L2 only:

    CPU1.jpg