QoE is great but I'm finding it very difficult to actually track anything down as a cause of an over-threshold event.
I do all my QoE probing at the node rather than on a SPAN or something.
Since we have an agent on the local machine, it seems like it would be easy to collect the following when a threshold is exceeded (make the agent threshold-aware & don't kick off some server-based job to collect as it may be too late when that happens)
- Source/Destination IP address of the offending connection
- A snapshot of the process list/cpu consumption
- A snapshot of memory usage (I know this is in NPM but I need it at the exact moment of the offense)
- Event log entries for the last X minutes (maybe 1 maybe five. not sure...)
Wrap all this up as an event that gives me easy-to-access information that might give me a clue as to why something just shot over the threshold.
I think this would be killer.
thanks.
Dave