cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 13

Hardware Health Monitoring - Issues

I've just upgraded both of our NPM servers to 10.4 and first thing I did was add the hardware monitoring resources to our custom views but I seem to be getting mixed results.

Switches

Cisco 6509    Hardware Details - Power Supply V fine.   

                    Hardware Details - Temp chart error - Index was outside the bounds of the array.

                    Hardware Health - Power Supply V error -  Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.    

                    Hardware Health - Temp chart - Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.  

Cisco 2950, 2960 and some 3750's - Nothing at all just but I have got in event summary 2 SAM Hardware types up and 2 SAM Hardware senor up 1 SAM Hardware up but nothing's being displayed.

Cisco 3750x  Hardware Details - Temp graph fine

                    Hardware Health - Temp graph error Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'. 

None of my switches show the Hardware Details like I'm now seeing for the servers.

Servers

As far as I can see none of the servers had the Hardware Health for Servers ticked in list resources, unlike the switches which automatically got ticked as part of the upgrade I presume. Is there anyway of automating the list resources Hardware Health to be ticked without going to each server individually?

I now have this displaying on my servers once the resource was ticked but I don't have any fan, power or temp graphs. Although the Event Summary is now showing 33 SAM Hardware Sensor up, 7 SAM Hardware type up and 1 SAM Hardware up

Hardware Details

ServerHardware StatusUp
Manufacturer Dell Inc.
ModelPowerEdge M610
Network deviceService Tag27RPF4J
Last Poll Time11/22/2012 10:12:06 AM

Just to add all the Health monitoring resources are selected for each of the custom views I have for our switches and servers.

Can someone give me some help to resolve the above as its a good feature to have once it works correctly?

Thanks

Jon

0 Kudos
5 Replies
Level 15

I upgraded to Orion 10.4.1 2 weekends ago and found out this past weekend that the Hardware Health monitoring was spiking the CPU on a pair of fully loaded Cisco 6513 chassis to 95% every 600 minutes (my node rediscovery polling interval). This had the affect of bringing down WCCP and bypassing our WAN accelerated Riverbed sessions.

Based on this, I suggest there be a message added to the release notes that upgrading to this version can impact the CPU preformance on your network devices when polling for(rediscovering) Hardware Health monitors. As well, there should be an option to disable this globally, rather than having to go through 100's of nodes to disable it one at a time.

Additionally, it seems the node rediscovery polling interval can only be set to a maximum of 999 minutes.

For static environments like Data Centers, this is too aggressive. Can we not have the ability to set this in Hours or Days?

I would run this only once a week at most, and possibly only once a month.
Even better would be the option to disabble it completely and only run it manually when required.

0 Kudos

Here are 2 examples of the CPU on a Cisco 6500 platform (Upgrade to 10.4.1 on Jan 26/27th):

Last 30 days - every hour, fully loaded 6513 Chassis L2/L3:

CPU2.jpg

Last 30 days - every hour, 6509 L2 only:

CPU1.jpg

0 Kudos
Level 13

Update - I've managed to now get some devices reporting correctly. I've changed my Cisco switch node details view to tabs with all chart health monitoring on a separate tab and it it seems to of resolved the issue for the majority of devices. There are still a few that are not displaying anything.

0 Kudos

bump

0 Kudos

Hi,

please can you create a support ticket for this issue.

Thanks

Dalibor

0 Kudos