This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

IBM Director Common Agent - SNMP Hardware Status Polling Failing

I have a customer that is running a data center full of IBM servers.  They have installed the IBM Director Common Agent on their servers (6.3.3) as per the Troubleshooting Hardware Monitoring guide.  While the hardware monitoring always starts off fine, it seems to fail after a few hours.  All of the servers are X Series servers (X3550, if I remember correctly).

Has anyone else experienced a similar issue when trying to poll the hardware status from the IBM Common Agent?  Reporting from the Common Agent to the IBM Director Console works without issue and polling from the standard performance MIB does not stop, only polling from OIDs that are covered in the IBM-SYSTEM-ASSETID-MIB.

In order to return the server to proper monitoring of the IBM hardware (albeit only for a few hours) the client needs to restart the SNMP service on the server (restarting the Common Agent has no impact) and run a List Resources on the node.  I am working to verify the List Resources part as that one confuses me -- so there is a little more testing to come there.

I found a few links in IBM's forums that suggest similar issues, but they are all dated -- and with no resolutions posted!

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014637905

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014729610

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014511313&ps=25

Thanks,

Josh

  • Has anyone determined what is happening with the resources dissapearing as described in the above post? I have recently been tasked with adding IBM X series servers that have IBM Director 6.3.3 as well and I am experiencing the same phenomena.

    Thanks,

    John

  • This is a known issue with the IBM Common and Platform Windows agents. They stop responding to SNMP queries after some undefined period of time. The only known resolution is to restart the SNMP Service on the monitored server. Alternatively you can change the polling method for these servers in Orion from SNMP to WMI which does not suffer from this bug. I recommend contacting IBM support to report this issue. They may be able to provide you with a hotfix that addresses it. I suggest having at least two servers in this broken state before calling, should you need to prove to them that indeed it is broken. This will allow you still have one in this "broken" state after you've proven to them that restarting the SNMP service resolves the issue on one of the two servers.

  • Although we have not proven it conclusively, we believe that it is a firmware issue with the IBM X Series servers.  We are working with a client to verify with the X3550's.  We have a working server running BIOS GFE143AUS-1.13.  GFE141AUS-1.12 and GFE136AUS-1.09 do not appear to work.

    If I were you, I'd start by checking the firmware on your servers and making sure that they are up to date.  Let me know if that works out for youjchatmon.

  • I will inform my client and see if they can get the firmware updated.

    Thanks,

    JC

  • I'm having the same issue.  were you able to get this working??

  • Did you try the firmware upgrades as discussed above?  I believe that they will work though jchatmon has yet to confirm.

  • I am yet to get the customer to update and verify whether or not the BIOS fix will work. Due to fact they are production servers they are reluctant to bring them down, they are waiting for a scheduled maintenace period.

  • After reviewing the SNMP config file from my UNIX Admin, we realized that the VACM_VIEW and ACCESS were incorrectly configured.  Once I enabled those for the Group, I was able to see the device.

    Thnx

  • I ran across a posting that there are issues with SNMP stopping or losing connection and Solarwinds drops the Health Monitor. I had my customers tech. set a schedule to stop and start the SNMP services once an hour. It appears that that is taking care of our issue with the IBM Health Monitor in Solarwinds dropping off.

  • Yikes!  An hourly SNMP services restart will mimic a server reboot in the node details section.  Check the Last Boot line of the node you are running this service restart on and let me know if my assumptions are correct.