32 Replies Latest reply on Oct 25, 2012 10:31 AM by xpowels

    SAM 5.2.0 disk degraded ESXi 5

    amtate42

      Hi Folks,

       

      I've just upgraded SAM to 5.2.0 and all my esx hosts are displaying hardware status as critical because of an apparent degraded disk. I've looked at my ESX hosts and it is the local HP serial attached SCSI disk on the host. In vCenter there is an alert regarding a change in IO for this disk which I have looked into and it is possible there was a bottleneck causing some IO latency. The alert occurred a number of times this morning but hasn't appeared now for around 4 hours. Being that I haven't selected any of the ESX volumes to be monitored via SAM it must be receiving these alerts via SNMP (which I have configured for all my hosts,) unless SAM auto monitors volume status in the new version. I am wondering if the hardware status will reset itself or is there a way to reset it manually now the alerts have stopped in vCentre?

       

      Regards

      Andrew

        • Re: SAM 5.2.0 disk degraded ESXi 5
          amtate42

          It doesn't appear to be traps from the esx hosts as I checked through them all and they contain no errors. It must be the new version of SAM that is monitoring this disk, I've listed all the resources on each host and it hasn't got any volumes selected. So I'm wondering whether there is a way for this alert to reset or thresholds to be defined or if anyone else has seen this? With more looking around on the ESX KB's the change in IO alert seems to be fairly common and it looks like I might need to upgrade to 5.1 to be about it configure the IO options for some disks.

           

          Regards

          Andrew 

            • Re: SAM 5.2.0 disk degraded ESXi 5
              aminloopes

              Hi,

              when I've just upgraded SAM to 5.2.0 I have same problem with Esxi host on 5.0 and 5.1 ,HP Serial Attached SCSI Disk Degraded!!!Critical Alert.....

              But in esxi 4.1 there isn`t problem....!!!

              If You find solution for this problem please share it with me....

              Thank you Andrew

              Warm Regards.

              A.Rezaeinezhad.

                • Re: SAM 5.2.0 disk degraded ESXi 5
                  amtate42

                  Hi there,

                   

                  I did not find a solution as such, I have not had any hardware events or alerts for a week or so from esx so I removed and re-added my esx host to SolarWinds again and it hasn't re-detected a degraded disk. To ensure your disks aren't degraded you can install the HP ESXi offline bundle to enable you to look at the local storage on the ESXi HP hosts. Or check iLo to make sure your disks status is ok. It's a bit odd and I am going to keep an eye on it so let me know if you find anything else.

                   

                  Regards

                  Andrew 

              • Re: SAM 5.2.0 disk degraded ESXi 5
                mbowers3

                I have the same issue after upgrade to SAM 5.2 all my Dell PowerEdge M710HD servers are showing degraded disk but they are not confirmed by logging into the controller.

                • Re: SAM 5.2.0 disk degraded ESXi 5
                  Sohail Bhamani

                  That data is being collected via the VMware API.  The Server Health and Hardware monitoring feature is what is polling this using the api.  Many times in various customer environments, I have noticed that the RAID controllers will have some log files that need to be cleared in order for this to clear.  I have noticed this usually with Dell PERC cards, but I do not see why it would be much different than any RAID card.

                   

                  Have you guys checked the RAID cards logs to ensure they are not filled up or containing any messages?

                   

                  Sohail Bhamani

                  Loop1 Systems

                  http://www.loop1systems.com

                  • Re: SAM 5.2.0 disk degraded ESXi 5
                    aminloopes

                    According To Mr.Andrew Tate and Mr.mbowers3

                    I have same problem with sam 5.2 with ESXi 5.0 and ESXI 5.1,

                    My host Health Status on Esxi is OK!!!There isn`t hardware Problem,

                    in Esxi 4.1 Everything was ok on Sam 5.2 but in new Version of Esxi (5.0&5.1) disk degraded Alert Appear!!!!

                    Please Help to resolve mention Problem

                    Thank you in Advance

                      • Re: SAM 5.2.0 disk degraded ESXi 5
                        aLTeReGo

                        If you've verified there are no problems through the vCenter client as pictured above then I recommend opening a case with support so we can investigate further.

                          • Re: SAM 5.2.0 disk degraded ESXi 5
                            Marlbs

                            This sounds like it is very simular to the problem I was having with the hardware monitoring of my ESX hosts Alter.. doesn't it??

                             

                            Are you polling ESX directly rather than polling VCenter?

                             

                            With our HP hardware we found that if were polling VCenter it would not alert correctly. We had to be polling the ESX Host directly.

                             

                            I would see if there is any difference if you switch the polling from VCenter to direct.

                              • Re: SAM 5.2.0 disk degraded ESXi 5
                                aLTeReGo

                                Marlbs is correct. I would definitely suggest you try both methods (direct and through vCenter). Most customers typically manage their hosts through the vCenter, but there are known delays in how quickly some information is propagated up from the ESX host to the vCenter, which can cause delays in hardware health alerts; which I believe was Marlbs issue if I remember correctly. Try polling these servers directly through the Virtualization Settings to see if this resolves the issue.

                                 

                                VMware Poll Directly.png

                                  • Re: SAM 5.2.0 disk degraded ESXi 5
                                    Marlbs

                                    You are dead on Alterego... that was the issue and that is how we addressed it.

                                     

                                    I also might add that it does seem that HP in many cases does take it's sweet time in "making a determination" so to speak on what the problem is and which is the correct alert for the problem.

                                     

                                    In some cases the Proliant may think it has a problem, but while the software evaluates the problem through it's internal polling process, it has not yet reported it as a valid issue to the HP software so that when polled by VCenter or polled directly by Solarwinds it still looks like no problem till the problem is actually showing as real when it is polled next time. Depending on the length of the polling cycle and how you are configured, it can take some time for you to see the alert in Solarwinds and when the alert is resolved for it to disappear from Solarwinds.

                                     

                                    That is my two cents worth on the topic.

                            • Re: SAM 5.2.0 disk degraded ESXi 5
                              herwig

                              same here, HP Proliant Blade Servers (ProLiantBL460cG7) monitored with SAM 5 via ESXi 5.x hosts show hardware degraded for internal SCSi disk but no problem reported on ESX console or HP array manager.

                              By the way we also have ProLiantBL460cG1 Blade hardware, where this degraded condition is not reported by SAM!

                               

                              ProLiantBL460cG7_SCSIDisk_degraded.png

                                • Re: SAM 5.2.0 disk degraded ESXi 5
                                  Marlbs

                                  Are you polling it directly or via Vcenter? I did notice some strange behavior with our HP Blade Servers as well with the older version, but the Blade Servers we set to be decomissioned before I could follow up to figure out what was going on with it. At that time we would have been polling the Blade Center via Vcenter and I would have liked to have seen us poll it directly and see if that would have resolved the issue.

                                   

                                  I also wonder in this case what would actually happen if you replaced the disk with another?? Would it return to "healthy" in Solarwinds or would it just go "Degraded" again??

                                   

                                  If this were a Windows or Linux box I would suggest that the issue would be with the HP hardware monitoring software for the OS, but in this situation I am going to say that if you poll the blade center directly, rather than via Vcenter, then I think we might be on to something.

                                   

                                  After spending a lot of time on this... I have started to question how good the HP hardware monitoring actually is. It could be that depending on the hardware, it is far more buggy than we realize. (Just a theory here... no proof at this point.)

                                • Re: SAM 5.2.0 disk degraded ESXi 5
                                  jleecong

                                  Same issue as others have mentioned. Solarwinds reports degraded disks (Critical Alert) but ESX/vCenter don't show any issues. Issues with both Dell and HP and HBA/Internal attached devices.

                                   

                                  Support case opened, Thanks!

                                  • Re: SAM 5.2.0 disk degraded ESXi 5
                                    xpowels

                                    Workaround mentioned in thread problem on sam 5.2 with Esxi 5.1 and Esxi 5.0 host for Hardware Health!!!!

                                     

                                    Worked for me (though I have opened a ticket as well).

                                    • Re: SAM 5.2.0 disk degraded ESXi 5
                                      tonym1216

                                      I have the same problem. If I change it to Vcenter polling I lose serial #'s

                                      • Re: SAM 5.2.0 disk degraded ESXi 5
                                        mbowers3

                                        If I change it to poll through vCenter it doesn't pull memory and cpu stats for each esxi host. I have a ticket open but not getting much help.

                                        • Re: SAM 5.2.0 disk degraded ESXi 5
                                          jleecong

                                          I changed the polling method to vCenter and now my HP Blades that were reporting internal SCSI Disk issues are fixed however I still have a number of fail Dell/HBA disks in Solarwinds that do not show up in vCenter.

                                           

                                          DiskIssue.jpg

                                           

                                          We did migrate SANs, so at one time these were live disks, but these have since been unmounted and deleted from vCenter.

                                          • Re: SAM 5.2.0 disk degraded ESXi 5
                                            naeemfirdous

                                            I am facing the same issue after upgrading SAM to latest version i am seeing this dell disk as degraded even though its normal in Vmware. Also the alert message that i received recognizes the DELL machine as HP server as ProLiant DL360 G7.

                                             

                                            status 2.JPGstatus 3.JPGdisk status.JPG

                                            The Disk, Dell Serial Attached SCSI Disk (naa.600508e000000000dc436d24e0bcad04) on kwtdevhost005.alshaya.com has a current status of Critical.


                                              Device Details

                                             

                                            Manufacturer:

                                            HP

                                             

                                            Model Number:

                                            ProLiant DL360 G7

                                            Serial Number:

                                            CZJ121031T

                                            • Re: SAM 5.2.0 disk degraded ESXi 5
                                              DONDERKA

                                              Hi,

                                               

                                              There is already Service Pack RC1 available and we are sending it to customers who have a support ticket opened for this bug. If you didn't receive it, please request it through the support ticket.

                                               

                                              Dalibor