17 Replies Latest reply on Jan 24, 2013 5:41 PM by dsbalcau

    HELP!Cannot maintain connectivity to IBM XIV

    patriot

      We are using STM to monitor an IBM XIV array and cannot maintain agent connectivity to it. IBM recently performed a CIM reset and STM was suddenly able to connect to the root/ibm namespace, but that lasted only about 24 hours. The SMI-S log of the Agent is full of "unable to connect" messages.

       

      Has anyone seen this behavior? This is a critical issue though I suspect that the fault may not lie with STM, since it is able to poll other vendor arrays we have. Any help will be greatly appreciated.

        • Re: HELP!Cannot maintain connectivity to IBM XIV
          cshanep

          Patriot,

           

          I suspect the problem is with the smi-s provider running on the XIV, it may have problems with garbage collection or something like that. I have seen the same problem with other vendors (HP, HDS).

           

          One way around this is to set the synchronous flag in the smi-s modules xml, this tells the agent to only connect to the smi-s provider one thread at a time (The smi-s module uses 3 threads (performance, asset and storage) to gather data and all three could hit the provider at the same time without that tag set to true).

           

          1. Go to the agent install directory/systemic/mod.sys.smis.smis_1.0.

          2. Open the mod.sys.smis.xml.

          3. Between the device tag for that particular array, add this tag: <synchronous>true</synchronous>.

                    To find the deviceId for the server array you are working with, do the following:

                         1. Open the web console.

                         2. Go To Settings > All Resources page (Note it may be called All Devices depending on the version of STM you are running).   

                         3. Hover over the Edit icon for the XIV you are working with.

                                   The deviceId will be between the parenthesis in the browser status bar below. It looks something like javascript : editDevice('38').

                 So my device tag will look something like <device38>.

          4. Stop the agent service.

          5. Clear out the agent install directory/systemic/mod.sys.smis.smis_1.0/data directory.

          6. Start the agent.

          7. Have IBM perform another CIM reset.

           

          Let us know if the problem is fixed for greater than 24 hours.

           

          -Shane

          http://www.loop1systems.com

          • Re: HELP!Cannot maintain connectivity to IBM XIV
            bmrad

            Patriot,

             

            Were you able to fix this issue?  Note that the IBM XIV provider had a known stability issue that was fixed in 10.2.4.e for Gen 2 hardware. 

             

            Brian

            • Re: HELP!Cannot maintain connectivity to IBM XIV
              jkmills

              Warning


              We experienced the exact same issues, the monitoring from SolarWinds Storage Manager was causing the cimserver process to fail. This was the case on microcode 10.2.4b and 10.2.4c1. Last Friday IBM Level 2 support logged into our XIV (Gen 2) and managed to restart the cimserver process. The repeated failures and restarting of the agent caused a memory issue on the module and it failed completely later that same day.

               

              If you are still having issues I would recommend:

              1. Stop monitoring with Solarwinds immeadiatly

              2. Contact your IBM TA regarding upgrading to Microcode 10.2.4e

               

              We will be doing our upgrade in the next 48 hours and I will report back.

              • Re: HELP!Cannot maintain connectivity to IBM XIV
                bmrad

                Note - Storage Manager thru 5.2 has performance for the IBM XIV turned off by default.  Once you upgrade to 10.2.4.e, make sure you turn on the performance by clicking on the wrench.

                 

                Brian

                  • Re: HELP!Cannot maintain connectivity to IBM XIV
                    jkmills

                    We upgraded to 10.2.4.e last night and everything went well with the upgrade. I have started monitoring again today with SW-STM and we have already failed the CIM agent on two out of three interface modules. I am not going to attempt to point it at the last module at this point, but there is still something seriously wrong here.

                      • Re: HELP!Cannot maintain connectivity to IBM XIV
                        bmrad

                        If you turn off performance, you should be able monitor the XIV without issue.  Please contact IBM and report the issue with the provider, I will ping my IBM contacts as well.

                         

                        Brian

                          • Re: HELP!Cannot maintain connectivity to IBM XIV
                            jkmills

                            That's easy to say, but this is turning into a serious problem. We purchased this product to monitor the performance of our arrays and were sold it on the premise that it would work with the code upgrade to 10.2.4.e

                             

                            Beyond that, the CIM errors have now cost us with equipment failures, data corruption and downtime on production systems. All we get from SW is "Call IBM" and IBM tells us their system works. We didn't pay this kind of money for a system that "kinda" works, and monitors sometimes. I am very frustrated at this point.

                      • Re: HELP!Cannot maintain connectivity to IBM XIV
                        ocgov2012

                        Patriot, and all

                         

                        Their is a problem in the IBM code as well as the Solarwinds product. I've experienced all the same problems with monitoring our Gen2 XIV on 10.2.4e. I've had the problem escalated to the Tucson team as well as the development team in Israel. IBM is stating the will have fix to help the problem in their next scheduled release (end of summer) working to get something sooner. The development team is also working with the Solarwinds team to fix their end as IBM is saying they interpreted the CIM standard differently and that is what causes the process to die.

                         

                        I'm currently trying to get an update from Solarwinds as when we can expect a fix from them as well. Thought I give everyone an update and let everyone know it appears to be a bug on both sides. I'll keep you posted as I here more.

                          • Re: HELP!Cannot maintain connectivity to IBM XIV
                            jkmills

                            I have to say that I am glad to hear that I am not the only one. Now hopefully there can be progress in getting this resolved. From what I have heard (which is only a couple of cases) it seems to be Gen 2 specific.

                             

                            Ocgov2012, Did you experience module crashes as a result of either the initial issue or IBM support restarting the CIM process? On two separate occasions we had the CIM process fail, IBM remoted in and the module would fail within 24 hours. The remote sessions were days if not weeks after the CIM process failure so it seems to be more tied to the restart than the initial issue.

                             

                            Also, we experienced the same issues under 10.2.4b and 10.2.4c1.

                              • Re: HELP!Cannot maintain connectivity to IBM XIV
                                ocgov2012

                                Jkmills,

                                 

                                                I had more details but those were the highlights. I did have the same problems as you experienced when IBM was initially stated working with us and logged into our XIV and had us turn on some debugging, it caused a two day outage to access the management console through any of the ports. Our CE even came in and couldn’t even get in via the laptop port. Turns out we created two separate issues once we started debugging the CIM server problems. We final got to a l3 support  person for the Tucson office that was able to dial into the XIV and at least free up the laptop port to allow the CE to work with him to get the management console working.  It turns out the we were wasting time recycling the modules and at the end of the day the Israel team was able to figure out the network problem was related to the VPN configurations not being setup properly. I thought that was king of odd but in the 2 years that we had our XIV we never had a management module problem were we needed to see if all three were working. Turns out we only had one port working from the start and their debug levels caused the crash of our only one that worked.  We’ve now tested and all three ports are working and are awaiting IBM/Solarwinds to get me some patches.

                                 

                                                As I mentioned I’ve been having opportunities in getting IBM to get me something prior to end of summer. I will also push Solarwinds as well. Hope this helps but I feel your frustration as I needed to escalate this problem as IBM was pushing off to Solarwinds at first

                                 

                                 

                                 

                                James McNamara

                            • Re: HELP!Cannot maintain connectivity to IBM XIV
                              ocgov2012

                              All,

                                   I've got an update from Solarwinds and it looks like IBM has all the probelm's, as it appears Solarwinds will not be putting out an update. It turns out the bug is in both the Gen2/3 versions of the XIV code. The following is what I recieved from Solarwinds development in working with IBM:

                               

                              This is what I was told….

                              I dont have anything official from IBM, but here is what I know:

                              ·         IBM provider has a bug, it will be fixed in Q3 for Gen3 hardware, Q4 for Gen2 hardware

                              ·         IBM has not communicated to customers in general

                              ·         IBM SMI-S Dev confirmed today (6/25) no known issues with STM software.  Right now there are no plans to make changes to STM for this issue.

                              Brian

                               

                              Hope this helps, so if everyone puts pressure on IBM XIV support maybe we can get them to release something sooner then waiting for the 4th qtr (Gen2 XIV) ....

                               

                              • Re: HELP!Cannot maintain connectivity to IBM XIV
                                jkmills

                                My IBM rep has given similar information that the fix is set for release in 10.2.4.f which is targeted for release in October.

                                • Re: HELP!Cannot maintain connectivity to IBM XIV
                                  ocgov2012

                                  Just got an official update from my IBM XIV CE and it appears they are behind in addressing this bug, he stated that 11/29/12 it will go to test and then early next year for release (will be in 10.2.4.f code for Gen2)

                                    • Re: HELP!Cannot maintain connectivity to IBM XIV
                                      dsbalcau

                                      Hi All,

                                       

                                      Great News!!!

                                       

                                      Gen 2 Hardware: No official word from IBM yet, but we did see that IBM has a 10.2.4f version for Gen2 hardware out in the wild. I took at the release notes and it says it contains a fix for a memory leak in the CIM agent, which is what was causing communication between Storage Manager and the XIV to stop working. From what I've heard, 10.2.4f is only being shipped to customers by request only, so contact your IBM rep if you want to get ahold of the code.

                                       

                                      Gen 3 Hardware: I have also heard confirmed reports from our customers that the fix for IBM XIV Gen3 hardware that IBM released earlier this year has resolved the Storage Manager/XIV communications issues on Gen3 XIV hardware. If anyone on this thread can confirm, that would also be greatly appreciated.

                                       

                                      Thanks!