cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Alerting for CPU & Memory on Individual Switchs in a Stack

Alerting for CPU & Memory on Individual Switchs in a Stack

We have reticently discovered that when Orion triggers a CPU or Memory Alert on Switch Stack, it is using the average CPU or Memory for all switches in the Stack.  So if you have one switch running at 99% and four switches running at 10%, you will never be alerted to an issue.   When looking at the stack details,  Orion is able to see the individual values for CPU and Memory but there does not seem to be any easy way to create an alert for these.  I have recently gone through some troubleshooting with SolarWinds support and they confirmed.

I would like to see an easy way to build an alert for Switch Stack Members that lets you alert on the CPU and Memory values of the individual Switches.... Along with being able to identify which switch in the stack the values are associated with..

12 Comments

This looks like a job for a custom Universal Device Poller (UnDP).

Orion is already pulling the cpu and memory values for the individual switches (as you can see if you view the stack details on a Node).... I don't see why I would need to pull this info again via a (UnDP) and double the space in the database... then go through the process of building an alert on the custom poller. I just want to be able to alert on the data that is already existing in a an easy manner... There is an option when building alerts for Stack Member details... being able to alert on the member CPU and Memory should be included in this.  Having to do a bunch of one of solutions might be fine in a smaller environment, but wont scale well (multiple instances of Orion, co-managed instances, databases exceeding 1TB, monitoring a hundred thousand nodes, etc.) I am all for a UnDP when its needed.... but simple tasks should stay simple.

You're right.

But where you can't have what you need, maybe you can build what you want with a UnDP.  Agreed, it shouldn't be required.

Your stack should fire off a trap or syslog message when one of the units' CPU or Memory hits a threshold.  You can easily build an Alert based on the info in that trap or syslog message, if you don't want to dig into UnDP's.

Yep, that would be a good option... but Unfortunately, not all the instances we co-mange are sending Traps or Syslog to Orion... possibly in the future.  And I have no issue using UnDPs or Traps/Syslog to build out the alerts... Just think that this is a feature that should be included out of the box... Orion is already pulling the data, and already displays info for these individual members... why not be able to easily build alerts on whats there?

I can't disagree with your logic.

Here's hoping my suggestions could possibly get you results today instead of waiting months or years for SW to change their solution.

Appreciate it.. i actually do have a work around to get the data into an alert.... by using a custom sql alert to pull the data (which is in different sql tables, of course).... Custom SQL alerts are pretty limited due to the Select From being pre-populated...., but it does the job).  Just requesting the feature as I felt it should have already been there.

Custom SQL Alert?  Talk about a hassle!

But I'm pressed you've got the skillset to make it work.

+1

Can you share your SQL query? We've had memory leaks on 3650s, but we haven't been able to get good alerts on increasing memory The issue is on the stack master but it looks like Orion averages out the memory across the stack.

For your Alert, in the Trigger Condition, use "Custom SQL Alert":

For Memory the SQL is:

JOIN [dbo].[MemoryMultiLoad_Current] c ON Nodes.NodeID = c.NodeID

JOIN [dbo].[NPM_SwitchStackMember] ss ON Nodes.NodeID = ss.NodeID

WHERE c.AvgPercentMemoryUsed > '80'

We set it to greater than 80%...and that looks at  each individual switch. Adjust that value as needed.

For CPU, create a different alert the same way... and the SQL is:

JOIN [dbo].[CPUMultiLoad_Current] c ON Nodes.NodeID = c.NodeID

JOIN [dbo].[NPM_SwitchStackMember] ss ON Nodes.NodeID = ss.NodeID

WHERE c.AvgLoad > '70'

These will get you alerts for the individual CPU or Memory of the switches in the Stacks.

Rebump.