This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Difference between Events and Node Status: High Response Time, High Packet Loss

I am doing a Perf Stack analysis of a problem node. Interestingly, PerfStack shows there are Events recorded for High Response Time and High Pack Loss for the node, but those stats do not show up when I add the Max Response Time, or Avg Response Time, or Packet Loss resources listed for the node. I went to what I assume are the sources, to verify the information.

Perf Stack shows the events but not the graphs for Average Response Time or % Packet Loss:

pastedImage_1.png

pastedImage_0.png

I went to Message Center to verify that the Event Log indeed records the high Average Response Time events. Likewise, Message Center also confirms packet Loss above threshold. These events last up to 30 minutes.

I went to the Node. Here is the Node graph showing the Average Response Time and Packet Loss. You can see that the graph does at times record this information on the graph. I have circled in Red the Time period of the PerfStack graphs above. When I zoom in on that time frame, it says no data available.

pastedImage_5.png

So, It seems PerfStack is agreeing with the data at the sources. I guess my question is not a PerfStack question then-- because PerfStack is just reporting the information from the sources. So, I guess my question becomes, "Why would there be no data available, when it seems evident that Orion must have known the data in order to trigger the alerts which made it into the Event Log?" Here are the Event Log entries of the time period above.

pastedImage_6.png

Please note, this is not an isolated incident. This seems to happen on a nightly basis at about the same time. Hence, it seems as if PerStack -- a great time based tool would be a good method to discover correlations and possible causes. So, the Alerts system has enough information to trigger these alerts, but the database doesn't have enough information to show the numbers-which-triggered-the-alerts on a graph.

Any ideas?

Thanks for your help.

  • I chatted this over with a senior-level support tech while doing the Dev-Assisted upgrade to RC2. He agrees that data would have to be present to trigger the alerts. Unfortunately, we didn't have the opportunity to delve into the cause of the issue.