This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Server performance monitors having issues.

The performance monitors that utilize WMI will fail when WMI does not reply.  Which is a problem. The monitor functions normally when tested and such but then will fail. 

My theory is the server is busy and is not replying in time?  Which is a problem because the reason I'm monitoring is to track server load and if it fails to answer when its busy... what's the use of monitoring? 

Has anyone run into this?  Any ideas on how to address?

GLenn.

  • Yes. The WMI template that grabs performance data will sometimes fail.  I don' know how to track down cause for these failures. 

    What has happened is we will have a performance event Orion will not report it because the WMI will have failed.  After a period of time WMI will go back and begin working.

  • Hey gbutler--

    I'm going to move this to the APM Forum.

    M

  • If server is so overloaded that it's unable to report it's status, there is no way how we can get it. If you need to monitor processes, you can try SNMP monitors which is far less demanding. WMI is more complicated and so puts more load on server to actually get data.

  • I don’t know that the reason the server is not responding to WMI query is its over-loaded. This is only a guess on my part. What I do know is this… we get an alarm failure that states:

     

    IIS Admin Service for Internet Information Server (via WMI) on AFI-ARC-WBSVC01 is in an unknown state

     

    Which as I understand is the default message Orion uses when its query into WMI comes with no values.

     

    When you log onto the server… things look fine. The polling graphs are 15 minutes apart so its quite possible that we had a spike between the interval that exceeded the thresholds and made WMI unresponsive… that is only a theory. I already have my monitors set at 5 minutes intervals… but the graphs will only provide the 15 minutes.

     

    As it stands my server teams/developers get these false alarms that makes Orion appear to not report accurately.
  • This message can appear if WMI is not available, you have access denied to WMI or some other WMI-related error occurs. If your server is most of the time fine, you are not experiencing overloads during operation but sometimes monitor fails this way, you can try to open a support ticket.

    Does this occurs for all your servers or just for some specific ones and these are always the same? Also what is average CPU and memory consumptions around these fails? Is it constantly high or is it normal without any clues about possible failure?

  • Thanks Jiri,

    You pointed me in the correct direction. Which I should have checked in the first place rather then assume anything... CPU was sitting at a steady 5-20% capacity... memory was another story. Both vmem and pmem show the same story...

    The graph... which I would post here if I knew how... shows a steady line going up to 100% utilization at 5:29 AM (which is when WMI query would not answer). So the server just did not have the resources to answer the query.

    Glenn.

  • Hi Glenn--

    thwack offers two ways to attach a graphic:

    1. Click the Insert Media icon.

    • Click Upload or select a file.
    • Click Upload File.
    • Click OK.

    Or you can:

    1. Click Reply to reply to the post. You'll see three tabs at the top of the reply page.
    2. Click the Options tab.
    3. Click the Add/Update button.
    4. Select the Upload File option, browse for the file and attach.
    5. Click the Save button.

    Give that a try and let me know how it works for you.

    M

  • Cool! Thanks Marie!  I was doing the "quick reply".... trying to save time yah know! :-)  Here is my graph: