We recently had a node where the platform owner asked us to disable CPU and Memory utilization data collection. The node is still polled by SNMP and we are still collecting other statistics (disk and interface stats, etc.) but not CPU and memory usage. To remove the stats we did a 'List Resources' and unchecked the 'CPU and Memory' check box. Easy as pie, right?
A few days later the platform team complained that they were still getting CPU load alerts. How could that be? Granted, our CPU alerts were the The Ultimate CPU Alert (care of adatole) but we didn't have any current CPU or memory usage data. Why would this alert trigger? We did some digging and this is what we found:
1) The CPULoad_Detail table contains a detailed collection of the CPU and memory usage for the node for up to the last 192 hours (8 days -- depending when the roll-up jobs run). The last values in the CPULoad_Detail.AvgLoad and CPULoad_Detail.AvgPercentMemoryUsed were 98 and 16.0877 respectively.
2) The Nodes table contains data from the LAST entry in the CPULoad_Detail table, specifically it has the nodes.CPULoad (=CPULoad_Detail.AvgLoad) and the Nodes.PercentMemoryUsed (=CPULoad_Detail.AvgPercentMemoryUsed)
3) The gauges on the nodes view show the values from the Nodes table for CPU and memory utilization (average) from the Nodes table
A couple of outstanding questions:
1) Once the data from the CPULoad_Detail table has been rolled up to the CPULoad_Hourly table and, since we are no longer updating the CPULoad_Detail table for this node, will the data in the Nodes table for Nodes.CPULoad and Nodes.PercentMemoryUsed clear?
2) If the values in the Nodes table do not clear, is there any way to clear those values without deleting and re-adding the nodes (without the CPU and memory of course)
3) Thoughts on modifying The Ultimate CPU Alert (and likely The Ultimate CPU Alert ... for Linux! since it is built on the same logic) to capture aged data? It could be something as simple as a datediff on the most recent entry in the CPULoad_Detail table where the value > Nodes.StatCollection.
Case 737340 is opened with support for this one. The first response was to delete the node and re-add it with the CPU and memory but I've asked for an alternative so we don't lose the historical data. Last resort, I'll delete and re-add.