I've just upgraded both of our NPM servers to 10.4 and first thing I did was add the hardware monitoring resources to our custom views but I seem to be getting mixed results.
Cisco 6509 Hardware Details - Power Supply V fine.
Hardware Details - Temp chart error - Index was outside the bounds of the array.
Hardware Health - Power Supply V error - Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.
Hardware Health - Temp chart - Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.
Cisco 2950, 2960 and some 3750's - Nothing at all just but I have got in event summary 2 SAM Hardware types up and 2 SAM Hardware senor up 1 SAM Hardware up but nothing's being displayed.
Cisco 3750x Hardware Details - Temp graph fine
Hardware Health - Temp graph error Unable to cast object of type 'SolarWinds.APM.Web.Models.HardwareInfoModel' to type 'SolarWinds.HardwareHealth.Web.Models.HardwareInfoModel'.
None of my switches show the Hardware Details like I'm now seeing for the servers.
As far as I can see none of the servers had the Hardware Health for Servers ticked in list resources, unlike the switches which automatically got ticked as part of the upgrade I presume. Is there anyway of automating the list resources Hardware Health to be ticked without going to each server individually?
I now have this displaying on my servers once the resource was ticked but I don't have any fan, power or temp graphs. Although the Event Summary is now showing 33 SAM Hardware Sensor up, 7 SAM Hardware type up and 1 SAM Hardware up
|Network device||Service Tag||27RPF4J|
|Last Poll Time||11/22/2012 10:12:06 AM|
Just to add all the Health monitoring resources are selected for each of the custom views I have for our switches and servers.
Can someone give me some help to resolve the above as its a good feature to have once it works correctly?
I upgraded to Orion 10.4.1 2 weekends ago and found out this past weekend that the Hardware Health monitoring was spiking the CPU on a pair of fully loaded Cisco 6513 chassis to 95% every 600 minutes (my node rediscovery polling interval). This had the affect of bringing down WCCP and bypassing our WAN accelerated Riverbed sessions.
Based on this, I suggest there be a message added to the release notes that upgrading to this version can impact the CPU preformance on your network devices when polling for(rediscovering) Hardware Health monitors. As well, there should be an option to disable this globally, rather than having to go through 100's of nodes to disable it one at a time.
Additionally, it seems the node rediscovery polling interval can only be set to a maximum of 999 minutes.
For static environments like Data Centers, this is too aggressive. Can we not have the ability to set this in Hours or Days?
I would run this only once a week at most, and possibly only once a month.
Even better would be the option to disabble it completely and only run it manually when required.
Here are 2 examples of the CPU on a Cisco 6500 platform (Upgrade to 10.4.1 on Jan 26/27th):
Last 30 days - every hour, fully loaded 6513 Chassis L2/L3:
Last 30 days - every hour, 6509 L2 only:
Update - I've managed to now get some devices reporting correctly. I've changed my Cisco switch node details view to tabs with all chart health monitoring on a separate tab and it it seems to of resolved the issue for the majority of devices. There are still a few that are not displaying anything.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.