Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

SRM & Dell Compellent - polling times?

Hey everyone!

We're still working on our implementation of Orion, and SRM seems to be the one throwing us for a loop the most. It took us a while, but we eventually got SRM to read our Dell Compellent environment.

Now that we've done that, we have added our 7 arrays to the Orion environment, which are all being polled through our single Unisphere appliance. We're still working out strange "Access denied" errors, which we have a ticket open with Dell to investigate.

My big question is - how long does it take your storage pollers to complete? On just one of our DR arrays, we are seeing the following:
performance/hardware health takes an average of 40 minutes (this usually defaults to polling every 15 minutes)
"Default" (which corresponds to the Capacity polling as far as I can tell) takes an average of 2 hours 15 minutes (default polls every 6 hours)
Topology polling takes about 4-6 minutes (again this one is every 6 hours).

I have to assume other people aren't seeing their arrays take this long to poll. Heck, I'd be curious how other storage arrays stack up to the Compellent. These are brand new environments, having been set up in January. The Unisphere seems to have no problems talking to it. But our Orion instance just likes to say "array unavailable" all the time.

Find more posts tagged with

Accepted answers

All comments

ahbrook

I'm still digging into this...

The more I dig, the more I wonder if our indexes on the Compellent's database aren't optimized. When trying to browse with the CIM browser, we're seeing anywhere from 30 seconds to 10 minutes to open a resource, depending on what I click. I do have a ticket open with Solarwinds support and with Dell support on this...

jvb

I think you are on the right track having a ticket open on both sides. Seems like something isn't quite right there but not certain exactly where. Feel free to ping the SolarWinds ticket number back here or in PM and I'll keep an eye on it.

ahbrook

Yay, month and a half later and still no progress, really!

We've done the following:

Completely rebuilt our DSM from scratch, including a new database.
Removed all the arrays except our biggest one as a testing bed
Upgraded our test environment to 2020.2 to try and fix some MSMQ issues
did side by side comparisons of Prod and Test - at a time when the array being tested against has no volumes or activity.

Even with all this, our numbers are still seemingly super slow.

(Note: The topology data is empty because there are no assigned volumes, and I discovered that after making this screenshot.)

Is anyone else out there using the Compellent monitoring in their production environment? If so, how many disks/volumes/arrays are you monitoring, and how long is it taking? At this point, I really need a point of comparison to someone else who has a comparable environment to ours. The testing equipment we're using has 1 array, 2 pools and 298 disks. Our entire environment has around 1300 disks. and 7-8 arrays.

viveashean

Hi @ahbrook,

So good to hear someone with the exact same issue that we have!

We've been working with Dell for quite some time regarding DSM issues involving the tiebreaker for LV's. We've only recently migrated this to the virtual appliance version of the DSM 2019.1.

For SRM polling, our 10 SC arrays constantly time out with inconsistent polling returns (topology, hardware,statistics etc). Currently running Orion platform 2018.4 HF3 with SRM 6.8.0. Like yourself i've also setup a standalone 2020 instance with the latest SRM and experienced the same results (even installed it on the SRM polling engine to eliminate network latency). SC SCOS version currently used is 7.3.11.28.

Your post led me to look into old requests and naturally located this one -> https://support.solarwinds.com/SuccessCenter/s/article/Dell-Compellent-polling-issue?language=en_US However it looks like the new DSM doesn't use 'Pegasus' anymore.

I have noticed that when I poll only 1 array, the results are successful (hardware health, topology, statistics). From a previous case with Solarwinds support, the analysis was that the DSM could not handle the polling requests from SRM.

Please let me know how you get on with further testing or analysis from Dell or Solarwinds, i'd be forever grateful!

Thanks!!

ahbrook

@viveashean

I am so glad I'm not going crazy! We had asked our account rep to find someone with a similar environment so we could do some comparisons, but so far nothing has come up.

If you don't mind saying, how many disks is your environment?We have found that smaller arrays do complete in reasonable time, but naturally our main arrays are not small.

Did you ever try adjusting the amount of polling threads? We found a setting involved in the # of simultaneous queries, but adjusting it didn't seem to help much.

Right now, we've been exploring the database looking for expensive queries. However, our DBAs so far have not found anything out of the ordinary. The isolated environment is helping us track down things... and this may provide the hint to go to our Dell representatives with. If the problem is inside the load balancing and how it communicates, that would explain a lot. Though I do wonder why the Microsoft SCVMM doesn't have this same issue.

Milton.Harris

Salesforce Ticket # 00575056

JIRA ticket # CUST-68935

viveashean

Hi @ahbrook

Definitely not crazy, i'd all but given up on my hopes and dreams of having a consolidated view for performance & capacity management of my storage arrays. However recently decided to give it one more shot and although I've run into the same issues, I think i may have found a workaround.

My environment is comprised of 20 Arrays (mixed bag of Dell SC's, Powervault's & some new flashy (no pun i swear it) PureArrays). We have approximately 1280 disks being monitored at present with a few more arrays yet to be added.

The Dell's are the only ones which rely on the DSM for polling, we've never had an issue with the Powervaults (which actually use the main Orion polling engine as the provider). Pure has SMI-S built into the array itself, and has not missed a beat since adding them.

I, like you, studied the SRM logs painfully to see glimpses of where my issues root cause could be located, however what i noticed was that when adding SC's and then inspecting the corresponding logs, i'd see polling timeouts.

Over the weekend, i have managed some success, this is the process i followed;

Setup a SRM scalability poller (polling engine specifically for SRM) - 4vCPU/24GB Ram
Installed DSM 2019.1/2 with the local db (i.e. 30 days of metrics)
Configured SMI-S on the DSM and setup the account for monitoring
Added 1 array to the DSM
Added the array in Orion *during the scanning process, watched the SRM.Pollers.Jobs log file to confirm successful polling, during the addition process, watched the SRM.Pollers.Queries & SRM.Pollers.StorageArrayJob_xxx_default/topology/perforance.hardwarehealth to confirm polling had finished
Configured the polling frequency on the array for Capacity - 720min / Performance - 1440min / Topology interval - 1440min
Added the next array repeating the process (steps 4-6)

I'm only really concerned about hardware health and capacity as I can monitor performance via the alternate DSM (virtual appliance) and cloudIQ etc. My SC's are for backup and file storage (clustered file server roles) which makes performance a lesser priority for me. Hence why the polling frequency is set the way it is.

I can confirm my largest array takes about 20min to finish polling for performance/hardware health and all my other arrays are polling fine (no timeouts etc.)

It looks like it's definitely the polling time of DSM from the monitored array's itself, rather than SRM having an issue with resourcing. I've had issues with the tiebreaker service (uses port 443/3033) being on the same DSM as the SRM monitored one, hence my decision to add another DSM. Dell has also mentioned that the max latency between the DSM and it's array's should be less than 10ms.

Thanks and good luck!

ahbrook

@jvbJust flagging you for this because it sounds like a lot of new information is coming to light and you were keeping an eye on things.

jvb

Thanks @ahbrook I am reading through all of it now.

ahbrook

Hey @viveashean !

I wanted to give you a quick update.

Okay, after a bit of delays we finally got on a call with Solarwinds support/development and Dell Compellent support.

Long story short: Dell admitted the problem is 100% in their software. They just can't handle the kinds of requests that Orion makes using SMI-S -- at least not without significant restructuring or the like. And because Solarwinds is not officially supported, they aren't going to make those changes.

Both teams had the same suggestion: Assign a single collector to an array. This can be a small instance off to the side, and management can still be done by a large, centralized collector, but any more than 3 arrays on a given collector will cause problems. And there's also the question of the # of disks - if that is too high, we'll see problems too.

We're going to try experimenting with that and report back to the support ticket. But in the meantime, we're also going to be reaching out to our account executive and other resources to let them know this is an issue, and will continue to be an issue going forward. Our hope is that Solarwinds and Dell can partner together officially to make a better product for everyone involved. But that doesn't help us short term, so we're going to try the small collector idea.