Thresholds and Notifications in Storage Manager (Profiler)

As many of you know, Storage Manager (Profiler) gets a plethora of data, maybe even too much - but many of you ask about how to set thresholds and alerts so you can be notified when something is amiss.  In Profiler, getting  alerts involves three steps:

  1. Building a rule, which includes a threshold on the metric of interest
  2. Assign it to a policy (ie, the set of resources you want to monitor) and push it out
  3. Setting the Notification to alert you via email when the trap is received.

For the threshold, lets focus on performance metrics right now -  although you can do storage and asset change thresholds as well.

Go to Settings > All Rules > Add New Rule.  From the list of  choices, choose Threshold Rule.  You should see the following screen:

greenshot_2011-01-21_02-34-56.jpg

Some quick definitions:

  • Section - basically the scope of resources this rule would apply to  (Ex: NetApp)
  • Category - the types of metrics applicable to that section (Ex: LUN Performance)
  • Instances (if applicable) - the instances of the metric we are monitoring (All instances)
  • Condition - the threshold on the metric. (Average Latency (ms) > 20)
  • Duration - how long the condition has to be met before the threshold is triggered (0 Min)
  • Choose Action - choose one action (Send Trap)

greenshot_2011-01-21_02-36-57.jpg

So what we are telling Profiler in this example is to send us a trap  whenever any instance of a NetApp LUN has average latency greater than  20ms. Before moving to the next step, a couple of cool things:

  • When you set Profiler to Any Instances, new objects are covered  automatically.  If you create a new LUN, Profiler will automatically  apply the rule to that instance.
  • You can pick one or more instance - so you can get very particular if you need to.
  • The duration allows you to filter out noise, so you don't get alerted on every little spike.

So, you have your rule, now you have to apply it.  In Profiler, you  do that via policies - which are just a collection of resources of the  same type that you configure at the same time.  Every resource type has a  Default Policy, and that is the one we will use today.

Go to Settings > Policies and click the edit icon for Default NetApp Filer Policy (let us stick with NetApp for this example)

Click Rules and you will see a list of rules that are available to be  assigned, or already assigned to the policy.  Note there are default  rules already assigned to identify problems for you.  To assign a rule  to the policy, click the rule and press the down arrow, and then press  the Save button.

greenshot_2011-01-21_02-38-24.jpg

Now the rules is assigned to the policy - but - make sure you press  the Push button to update the configuration on the agents monitoring the  NetApp Filers.

So now, if a condition were met, the agent would send a trap to the  Profiler server and you would see the trap in the Event Monitor. You  could then manage the event

However, if you want to receive an email for that event, you need to  turn on notifications.  Notifications are turned on per user, so go to  Settings > Users and click the edit icon for your login.  If you have  defined an email address, you will see a "Notifications" section.   Click the Add button in this section.

Now you can add a notification for on resource or a group of  resources.  Choose "Groups" and then choose "All Devices".  You can then  pick the trap severity you want to be notified, and one or more email  addresses to send the email to.

greenshot_2011-01-21_02-40-45.jpg

Whew! That was a few too many steps (hint, we will make this better in the future) - but now I can safely sleep knowing that I will be notified if I have a problem.

As a bonus, I'll throw in a few notes about managing the events on the Event Monitor:

  • Events that occur over and over again for the same object only notify you on the first occurrence, but maintain a count thereafter (hence the count column in the Event Monitor)
  • You can acknowledge and clear traps thru the event monitor
  • In the Setting > Server Setup > Server, you can turn the automatic clearing of events after a certain amount of time.

Thanks for listening - and as always, if you have thoughts or feedback, we would love to hear it.

Thwack - Symbolize TM, R, and C