cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 10

SRM & Dell Compellent - polling times?

Hey everyone!

We're still working on our implementation of Orion, and SRM seems to be the one throwing us for a loop the most. It took us a while, but we eventually got SRM to read our Dell Compellent environment.

Now that we've done that, we have added our 7 arrays to the Orion environment, which are all being polled through our single Unisphere appliance. We're still working out strange "Access denied" errors, which we have a ticket open with Dell to investigate.

 

My big question is - how long does it take your storage pollers to complete? On just one of our DR arrays, we are seeing the following:
performance/hardware health takes an average of 40 minutes (this usually defaults to polling every 15 minutes)
"Default" (which corresponds to the Capacity polling as far as I can tell) takes an average of 2 hours 15 minutes (default polls every 6 hours)
Topology polling takes about 4-6 minutes (again this one is every 6 hours).

I have to assume other people aren't seeing their arrays take this long to poll. Heck, I'd be curious how other storage arrays stack up to the Compellent. These are brand new environments, having been set up in January. The Unisphere seems to have no problems talking to it. But our Orion instance just likes to say "array unavailable" all the time.

0 Kudos
12 Replies
Level 10

Yay, month and a half later and still no progress, really!

We've done the following:

  1. Completely rebuilt our DSM from scratch, including a new database.
  2. Removed all the arrays except our biggest one as a testing bed
  3. Upgraded our test environment to 2020.2 to try and fix some MSMQ issues
  4. did side by side comparisons of Prod and Test - at a time when the array being tested against has no volumes or activity.

Even with all this, our numbers are still seemingly super slow.

array_timestamps.jpg

(Note: The topology data is empty because there are no assigned volumes, and I discovered that after making this screenshot.)

 

Is anyone else out there using the Compellent monitoring in their production environment? If so, how many disks/volumes/arrays are you monitoring, and how long is it taking? At this point, I really need a point of comparison to someone else who has a comparable environment to ours. The testing equipment we're using has 1 array, 2 pools and 298 disks. Our entire environment has around 1300 disks. and 7-8 arrays.

Hi @ahbrook,

So good to hear someone with the exact same issue that we have!

We've been working with Dell for quite some time regarding DSM issues involving the tiebreaker for LV's. We've only recently migrated this to the virtual appliance version of the DSM 2019.1.

For SRM polling, our 10 SC arrays constantly time out with inconsistent polling returns (topology, hardware,statistics etc). Currently running Orion platform 2018.4 HF3 with SRM 6.8.0. Like yourself i've also setup a standalone 2020 instance with the latest SRM and experienced the same results (even installed it on the SRM polling engine to eliminate network latency). SC SCOS version currently used is 7.3.11.28.

Your post led me to look into old requests and naturally located this one -> https://support.solarwinds.com/SuccessCenter/s/article/Dell-Compellent-polling-issue?language=en_US However it looks like the new DSM doesn't use 'Pegasus' anymore.

I have noticed that when I poll only 1 array, the results are successful (hardware health, topology, statistics). From a previous case with Solarwinds support, the analysis was that the DSM could not handle the polling requests from SRM.

Please let me know how you get on with further testing or analysis from Dell or Solarwinds, i'd be forever grateful!

Thanks!!

@viveashean 

 

I am so glad I'm not going crazy! 😄 We had asked our account rep to find someone with a similar environment so we could do some comparisons, but so far nothing has come up. 

If you don't mind saying, how many disks is your environment?We have found that smaller arrays do complete in reasonable time, but naturally our main arrays are not small. 🙂 

 

Did you ever try adjusting the amount of polling threads? We found a setting involved in the # of simultaneous queries, but adjusting it didn't seem to help much. 

Right now, we've been exploring the database looking for expensive queries. However, our DBAs so far have not found anything out of the ordinary. The isolated environment is helping us track down things... and this may provide the hint to go to our Dell representatives with. If the problem is inside the load balancing and how it communicates, that would explain a lot. Though I do wonder why the Microsoft SCVMM doesn't have this same issue. 

Hi @ahbrook 

Definitely not crazy, i'd all but given up on my hopes and dreams of having a consolidated view for performance & capacity management of my storage arrays. However recently decided to give it one more shot and although I've run into the same issues, I think i may have found a workaround.

My environment is comprised of 20 Arrays (mixed bag of Dell SC's, Powervault's & some new flashy (no pun i swear it) PureArrays). We have approximately 1280 disks being monitored at present with a few more arrays yet to be added.

The Dell's are the only ones which rely on the DSM for polling, we've never had an issue with the Powervaults (which actually use the main Orion polling engine as the provider). Pure has SMI-S built into the array itself, and has not missed a beat since adding them.

I, like you, studied the SRM logs painfully to see glimpses of where my issues root cause could be located, however what i noticed was that when adding SC's and then inspecting the corresponding logs, i'd see polling timeouts.

Over the weekend, i have managed some success, this is the process i followed;

  1. Setup a SRM scalability poller (polling engine specifically for SRM) - 4vCPU/24GB Ram
  2. Installed DSM 2019.1/2 with the local db (i.e. 30 days of metrics)
  3. Configured SMI-S on the DSM and setup the account for monitoring
  4. Added 1 array to the DSM
  5. Added the array in Orion *during the scanning process, watched the SRM.Pollers.Jobs log file to confirm successful polling, during the addition process, watched the SRM.Pollers.Queries & SRM.Pollers.StorageArrayJob_xxx_default/topology/perforance.hardwarehealth to confirm polling had finished
  6. Configured the polling frequency on the array for Capacity - 720min / Performance - 1440min / Topology interval - 1440min
  7. Added the next array repeating the process (steps 4-6)

I'm only really concerned about hardware health and capacity as I can monitor performance via the alternate DSM (virtual appliance) and cloudIQ etc. My SC's are for backup and file storage (clustered file server roles) which makes performance a lesser priority for me. Hence why the polling frequency is set the way it is.

I can confirm my largest array takes about 20min to finish polling for performance/hardware health and all my other arrays are polling fine (no timeouts etc.)

It looks like it's definitely the polling time of DSM from the monitored array's itself, rather than SRM having an issue with resourcing. I've had issues with the tiebreaker service (uses port 443/3033) being on the same DSM as the SRM monitored one, hence my decision to add another DSM. Dell has also mentioned that the max latency between the DSM and it's array's should be less than 10ms.

Thanks and good luck!

@viveasheanone other thing - you mention adding arrays 1 by 1. Does the SRM "add arrays" page let you do that? When I select and scan the DSM, it lists all the arrays assigned to it, and it gives bars/highlights as if you can choose which arrays to monitor through the DSM... but even when I select a single array, all of them get added. If I want to limit things I have to add them all, then delete those I don't want.. which is a pain as it artifically bumps up our SRM node counter and I don't know if anything gets loaded in the database or not.

 

0 Kudos

Hi @ahbrook 

You are correct, SRM (well 6.8.0 at least) doesn't let you only select certain arrays for monitoring when it scans a provider. To get around this I only added arrays to the Dell DSM one at a time.

I also found that when using this process, I couldn't use the same provider and 'rescan' for more arrays. I now have duplicate entries of the same provider 😞

Error message when doing this is;

"Required field 'Provider IP Address or Hostname' cannot be empty"

I'm planning the upgrade from 2018.4 HF3 -> 2020.2 next month, hopefully we're still polling after that.

Cheers!

0 Kudos

@jvbJust flagging you for this because it sounds like a lot of new information is coming to light and you were keeping an eye on things. 🙂

0 Kudos
Product Manager
Product Manager

Thanks @ahbrook I am reading through all of it now.

0 Kudos

Hey @viveashean !

I wanted to give you a quick update.

Okay, after a bit of delays we finally got on a call with Solarwinds support/development and Dell Compellent support.

Long story short: Dell admitted the problem is 100% in their software. They just can't handle the kinds of requests that Orion makes using SMI-S -- at least not without significant restructuring or the like. And because Solarwinds is not officially supported, they aren't going to make those changes.

Both teams had the same suggestion: Assign a single collector to an array. This can be a small instance off to the side, and management can still be done by a large, centralized collector, but any more than 3 arrays on a given collector will cause problems. And there's also the question of the # of disks - if that is too high, we'll see problems too.

We're going to try experimenting with that and report back to the support ticket. But in the meantime, we're also going to be reaching out to our account executive and other resources to let them know this is an issue, and will continue to be an issue going forward. Our hope is that Solarwinds and Dell can partner together officially to make a better product for everyone involved. But that doesn't help us short term, so we're going to try the small collector idea.

0 Kudos
Level 10

I'm still digging into this...

The more I dig, the more I wonder if our indexes on the Compellent's database aren't optimized. When trying to browse with the CIM browser, we're seeing anywhere from 30 seconds to 10 minutes to open a resource, depending on what I click. I do have a ticket open with Solarwinds support and with Dell support on this...

 

 

0 Kudos
Product Manager
Product Manager

I think you are on the right track having a ticket open on both sides. Seems like something isn't quite right there but not certain exactly where. Feel free to ping the SolarWinds ticket number back here or in PM and I'll keep an eye on it.

Salesforce Ticket # 00575056

JIRA ticket # CUST-68935

0 Kudos