
We continue to get a 'down' status when trying to use APM TEMPLATES to monitor VM counters, mostly performance counters.
Does anyone have these issues and is there a fix?
The components work for a while and then they go out for hours and out of the blue start working again!!!!
Thanks for any help.
Could you post a screenshot with error message?
When they are first built they look good and can be selected fine. Like in this screenshot:
But over time they go down and APM shows this:
And then once polled by APM while they are down, they are shown empty/no performance counters offered.
And maybe hours or days later they show back up and go green.
I would suggest to open a support case and reference this thread.
Case #314922
lchance, I see your case with support is closed. Hopefully you were able to get to the bottom of what was causing the issue.
it's an ugly picture...
After building literally hundreds of APM Components to monitor VM systems, each VM system I attempt to use with Performance Counters/WMI will eventually stop working.
The eventual error is CATEGORY DOES NOT EXIST.
For instance, when trying to use Windows PERFMON to the remote system you can see only a subset of WMI after these errors.
However, if you sit at the system (server) and run PERFMON then you'll see the full set of WMI. So the problem must be in the transmission of these WMI objects.
Maybe this is happening with RPC or even Kerberos....I don't know!
But it is hurting my attempts to monitor high-profile applications we're deploying...
Sometimes Performance Counters get corrupted and it is necessary to rebuild them using 'lodctr /R' command as described here. Could you give it a try?
Also when it stops working - are there some errors logged in Windows Event Log on that remote VM?
Based on the screenshots you provided above the issue you seem to be having is with WMI. Have you tried using Windows Performance Counters instead?
I have not, no. Will look into that right now. What could be wrong w/ WMI, and how could I try and fix that?
It's difficult to say for certain but I would recommend looking at the Windows Event Log of the remotely monitored machine to see what errors are listed when these component monitors fail. This should give you a good idea of where to start troubleshooting.
Based solely on what you've provided hear this looks on the surface to be an authentication issue, possibly with the domain controller. This may be unlikely the real cause of the issue, but rather symptoms of another issue entirely. It's also possible that you've reached some limit for maximum number of WMI connections/queries you can run simultaneously. It would be good to know how many component monitors you have running against this machine, how frequently your polling this information, as well as the operating system and version of the monitored host. Ideally I would recommend opening a case with support so we can troubleshoot this issue properly.
I opened a case (Case #314922) and it is closed with a solution referencing a Microsoft KB article. But that's not the long term fix.
I still have my problems with using APM and Performance Counters/WMI in large volumes.
Based on the findings of support, the issue is not that APM is unable to monitor the WMI objects, but rather that the WMI objects are missing on the remote host. Unfortunately the reason why these WMI objects are disappearing is unknown. I've included a link to our WMI troubleshooting guide that may help to resolve the issue.
Optionally you can use RPC instead of WMI to monitor your Processes & Services, which is a simple pull-down box change in the component monitor itself. You can also utilize Windows Performance Counters to monitor statistical information you're currently collecting via WMI.
As a final option you can contact Microsoft support directly. They should be able to aid and assist with troubleshooting why these WMI objects are disappearing from your host.
I've been seeing a similar but different issue when polling Multiple WMI components to a server. We keep seeing an out of memory condition specifically when polling services. I have had to change several components to RPC. It's as if the WMI queries are overloading the system. I did see articles regarding WMI memory leaks but that was related to 2003 systems these have all been 2008.
mdriskell, your issue is likely unrelated and may be resolved by applying the following Microsoft hotfix to your monitored servers.
Thanks Alterego....
I had searched and searched but couldn't find a KB on Win2008 only things that came up were 2003. I will get this over to our Wintel team.