Monitoring NetApp in SAM: PowerShell vs SNMP?

Hello Thwack Community - 

I'm working with my team to setup NetApp monitoring. Before we look into leveraging SRM, we'd like to see how much we can achieve with monitoring NetApp via SAM. 

I've been browsing Thwack and was pleased to discover that NetApp has a PowerShell toolkit that can be leveraged.

Has anyone setup NetApp monitoring in SAM and would know the pros / cons of using PowerShell vs SNMP? Since PowerShell hits the NetApp API, is it safe to assume that PowerShell would pull in "better" data than SNMP traps from NetApp? 

Thank you in advance! 

  • SNMP is a bit of a nightmare without something to process it. You can get a top-level capacity readout, or some traps when issues occur if you're lucky (Having several pitfalls here at the mo on the netapp end, inc sending from unexpected IPs). An equivalent spec to SRM wouldnt be pratical with UNDP or SAM SNMP alone.

    With powershell the template you might find on here is a bit out of date but can be adjusted a bit for the newer netapp module. I've found the module command for antivirus doesnt work properly.

    API polling is alright depending what exactly you're trying to do. I tried to do one for a netapp/emc unity recently and it requires a cookie that's not well documented, very annoying.

    The powershell SAM monitor wants you to pull back a small, actionable amount of data, so not ideal for like "list all the cifs" or "give me all the discs", but "Count the bad disks, and give me their serial" would work

    You can do SSH over the sam powershell module too which means you have access to the command line functions.

    You can also use the solarwinds API/SQL and the netapp api/module to bulk-pull data and stick it in a table for later use.

    It's all doable but can be a pain.

  • From my own experience, I can say that PowerShell is easier to set up and use.

  • In the past I have gone pretty heavy into the side of using SAM powershell monitors and had a model I used for getting past the issues like the one raised above about not being able to "list all the cifs." 

    The way I would do it is to set things up was that I would generally have a template with 3 components. Component1 was a script that would be whatever api call I needed to output a statistic of "count all CIFS" and a message would be a comma separated string with the identifiers of all the children. 

    Component2 in the template would be an example of how I would pull down all the stats I care about for a single item in the CIFS table with placeholders for the actual name/ID.  This component is disabled because we don't actually want it to run as-is.

    Component3 was a script that queries Orion API and basically loops through each value in component1 and clones component2 and replaces the placeholders and enables that.  Depending on the use case I might have cleanup logic in here to automatically remove components that stop showing up in the output of component1 so things always stay current, but sometimes you wanted to track/alert when things dropped off the list.

    So what would happen is the template gets assigned (I almost always had some kind of logic to automatically assign templates as well), initially it just has component 1 running, then during the second polling interval it creates the dynamic number of child components and by the 3rd polling interval I have an app monitor that automatically has however many components I needed to be able to monitor the whole array all child objects statistics and metrics. 

    In a loose sense this is also pretty similar to how the Appinsight templates work, they query a thing to inventory of what components we need to build, then it creates whatever components it needs based on that inventory.

    I also had less involved versions of this where I would write a script that just parsed all the outputs of an array inside the script itself, and had logic to extract out any individual bad records and trigger an alert that covered the whole array.  This method was better for cases where I didn't really need a lot of detail about the child objects, just a high level check that "everything in this table is good" or "something on this table is bad."  This is way more efficient in terms of writes to the Orion database and was ideal for cases where the targeted system already tracked all it's own performance metrics in their own DB and I just needed to raise an alarm to tell an engineer "hey something looks wrong here, you should log into the array and investigate this item"

  • Thank you for your helpful responses