I wanted to share the fix that I found for the problems I was having related to getting the hardware monitoring working on our HP ProLiant servers that I covered with SolarWinds under case #343875.
The problem we were having was that we had three different conditions with the hardware monitoring not working within Solarwinds on our HP Proliant servers. (All of them were DL G5/G6/G7/G8 series). Two of these conditions generating an error condition that you could see from within SolarWinds that showed up as Hardware Polling failded: Error 31005 (or Error 31040)
All of the issues are related to the HP Insight Management Agent either not being installed or not upgraded or compatible with the version of HP SIM or the WBEM provider that is installed on the server.
In our case we were on SIM version 7, so we needed to install the HP Insight Management Agent version 188.8.131.52. In some cases we still had version 8.70 installed and that is why it was not working. Once we upgraded to version 184.108.40.206, we no longer had a problem once we followed the re-add process below. Accept all of the default options, there is no need to change any of them.
In other situations we had version 7 of SIM installed, but we had no version of the HP Insight Management Agent installed at all. In these cases we installed the HP Insight Management Agent on the server and followed the re-add process below. - Accept all of the default options, there is no need to change any of them.
Still yet in order situations, we were getting a “Hardware polling failed: Error 31002” – In these cases we have found that it is best to uninstall the HP Insight Management Agent, then install the new version from the command line using the /s (for silent install) /f (for force) – the command would look like >x64.exe /s /f when installing the x64 bit edition of the HP Insight Management Agent. You will not have to set any options because the forced silent install, installs the agent with the default options.
After getting the proper version of the HP Insight Management Agent downloade and installed, you will need to follow the re-add hardware monitoring process below –
> Click on the server node
> Click Rediscover
> Click Poll Now
> Click List Resources
> Select the Hardware Monitoring option by placing a check mark in the box and then clicking submit to apply
> Clock Poll Now Again
You might have to refresh your browser window a few times depending on how long it takes to get to polling the node again, but once it does your hardware monitoring should show up again.
Caling HP support was pretty useless at helping me get to the bottom of this issue, but I suspect that others will see this problem and be looking for a way to fix it like I did.
I would be happy to answer what questions I can if you have other issues get the HP hardware monitoring to work as I think we have been through most of it over the last couple of days.
It would appear that Polling VMWare hosts directly is reccomended over polling them via VCenter. Once completed, this seems to resolve the issue for me. It does seem that the delay to get the information into VCenter is a lot longer than you would expect. Direct polling of the VMware host get you the most update information and it does seem to resolve PSU failure info.
Here is a better close up of just the warning and the hardware monitoring on the server -
As you can see here --- We are seeing a "Down" condition for the ESX Status and no details on the Power Supply as it is still showing green
In VCenter, the power supply is showing as the issue and the server isn't down because the power supply is reduntant. It simply appears to be more of a n issue with how the information from the polling and transfered to Solarwinds. It isn't getting the correct information from VCenter.
We have tested this issue on 6/7/8 generation HP Proliant servers, so it does not appear to be a generational issue.
I continue to work though this with Support, but has anyone pulled the cord out on one of the reduntant power supplies on your Proliant servers and experienced the same issue?
Also, in case anyone is wondering, I get the same results even if I poll the server directly, rather than gettin all of my information from VCenter.
Does anyone have additional additional feedback that might be helpful?
I just got done putting in two Proliant G6s for our Solarwinds servers. I get the same results as the picture above. I don't know if this is helpful, but I've noticed if you are building a Proliant server from scratch, there is a ton of server specific software to put on (lights out management, HP specific hardware monitoring SNMP software, etc.)
I went to HPs website and downloaded all the latest versions for my server and installed them. I'll be honest, I just put on everything I could and somehow magic happened and extra data appeared on Solarwinds
Actually if you read the first post in this thread, I think I isolated the software that needs loaded in order to get the hardware monitoring working with Solarwinds, but using SmartStart does seem to install all of the correct software as well.
I never tried the web site method, so I don't know if you installed anything extra or beyond what is needed or not. Could you tell us what software you saw on the web site and what you loaded to get it working?
I didn't use the boot up DVD that came with the server. I installed Server 2008 R2 Enterprise , then loaded the HP drivers and applications after I patched the server up with the latest updates. Here is a screenshot of what I installed on our SAM server, I'll also screenshot what pops up in Solarwinds. It looks like I'm also missing the memory info
We have the HP DL 360 G5 / G6 /G7 servers for the most part.We have a number of other HP models as well, but I do not know the model numbers of these units of hand.
On these I am able to get all hard ware info as you see below -
Yes, you are correct. I never noticed that before. I think... but now am not sure... that we had it at one point.
I am not running the latest build, still on SAM 5.0 with Orion Core 2.2.
I heard that the latest version does correct some problems with the hardwae monitoring. Is this true?
I have read release notes of 5.0.1 and one of the fixes was related to the missing memory info on some servers.
But, we are are running 5.0.1 and the problem persists .
Looks very strange to have a basic problem like this spread on so many cases.
eagolli and Marlbs, this is a known issue in SAM 5.0.x and has been logged as a bug under FB107943. Even better news is that we believe we have resolved this issue in the SAM 5.2 beta. If both or either of you are willing to sign up and participate in the SAM 5.2 beta to validate this issue is fixed in your environment we'd love to get your feedback.
What are the chances that I can get this approved and download and install today?
Also we just received new Proliant Gen8s that I would like to test before this gets out of RC in case it does not work.
It would appear that it seems to correct some of the probems, but we discovered that the PSU monitroing does not work with VMware Host servers. (Solarwinds ticket #378465)
We tried a simple of test of unplugging one of the reduntant power supplies and while it beleives that their is a problem, it does no fire of an alert.
Tested with HP Prolient Generation 5,6,7,8 servers and using VMWare 4.1 and 5. I have tried it polling both from VCenter to get the hardware data or the server directly and it seems to make no difference.
Just as a side note, it works correctly on servers running Windows or Linux (RedHat).
Can anyone else confirm that they are seeing this issue as well?
Thanks for sharing this info.
I wanted to actually ask you some additional info since you had to deal with several problems.
I have some Proliant HW in my environment and I'm having a consistent problem getting their MEMORY shown in the HW health.
I have checked the collector logs and actually don't find any of the errors mentioned (31xxx). Instead it says only kidn of this "Polling of Memory OID failed. [NodeID = 11, PollingMethod = SnmpHP] (System.Exception: No data)".
Solarwinds engineers claim that the system is not able to get response from some specific OIDs that make the whole thing work.
Since the problem, as I said is very consistent and actually happens in a bunch of DL380 G6 and BL460 G6 blades I am think we are facing some strange compatibility issue.
I have been working with one G6 blade and actually installed the latest HP agent but with no luck.
I wanted to check with you about Systems Insight Manager. As I understand it should not be a must to install as installing the SIM agent on every server that you need to monitor with Orion should suffice.
I see you from you comments like yo have instances of Insight Manger server installed on your environment. Do you have only one or you have several? Do you find it mandatory to have Insight Manager (and not only the agents) on the network?
thanks in advance,
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.