Microsoft Lync Server (Front-End Role)

Microsoft Lync Server (Front-End Role)

This template assesses the status and overall health of services as well as the performance of the Front-End Microsoft Lync Server.

Prerequisites: WMI access to the target server.

Credentials: Windows Administrator on the target server.

Monitored Components

Note: You need to set thresholds for these counters according to your environment. It is recommended to monitor these counters for some period of time to understand potential value ranges and then set the thresholds accordingly. For more information, see http://knowledgebase.solarwinds.com/kb/questions/2415.

Service: Lync Server Audio Test Service

This component monitor returns the CPU and memory usage of the Lync Server Audio Test Service. This service offers users the ability to subjectively test the quality of a call before placing the call. The user checks the call quality by making a test call.

Service: Lync Server File Transfer Agent

This component monitor returns the CPU and memory usage of the Lync Server File Transfer Agent. The File Transfer Agent is responsible for replicating configuration settings with the Replica Replicator Agent that runs on every Lync Server.

Service: Lync Server Front-End

This component monitor returns the CPU and memory usage of the Front-End Lync Server. The Front-End Servers maintain transient information, such as logged-on state and control information for an IM, Web, or audio/video (A/V) conference.

Service: Lync Server IM Conferencing

This component monitor returns the CPU and memory usage of the Lync Server IM Conferencing. The IM Conferencing service is responsible for multiplexing the instant messages data feed from the leader to all participants in the session.

Service: Lync Server Master Replicator Agent

This component monitor returns the CPU and memory usage of the Lync Server Master Replicator Agent. This service is used by File Transfer Agent for replication configuration settings.

Service: Lync Server Replica Replicator Agent

This component monitor returns the CPU and memory usage of the Lync Server Replica Replicator Agent. This service is used by the File Transfer Agent for replication configuration settings.

SIP Peers: Connections Active

This component monitor returns the number of established connections that are currently active. A connection is considered established when peer credentials are verified (e.g. via MTLS), or the peer receives a 2xx response. You will need to baseline this counter by testing and monitoring the user load. This returned value should be less than 15,000 connections per Front-End.

SIP Peers: TLS Connections Active

This component monitor returns the number of established TLS connections that are currently active. A TLS connection is considered established when the peer certificate, and possibly the host name, are verified for a trust relationship. You will need to baseline this counter by testing and monitoring the user load.

SIP Peers: Sends Outstanding

This component monitor returns the number of messages that are currently present in the outgoing queues. If you receive error message 504, investigate the results from this counter. Doing so will indicate which servers are having problems. To do so, you will need to change the instance from _Total, to the server hostname. You can check this within perfmon.exe

SIP Peers: Average Outgoing Queue Delay

This component monitor returns the average time, in seconds, that messages have been delayed in outgoing queues. Check the Outgoing Queue Delay for delays in sending messages to other servers or clients that could be causing messages to be accumulated in the server. The server will drop client connections if it is in a throttle state and messages stay in the outgoing queue for more than 32 seconds.

SIP Peers: Flow-controlled Connections Dropped

This component monitor returns the total number of connections dropped because of excessive flow-control. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible.

SIP Peers: Average Flow-Control Delay

This component monitor returns the average delay, in seconds, in message processing when the socket is flow-controlled. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible.

SIP Peers: Incoming Requests/sec

This component monitor returns the rate of received requests, per second. You will need to baseline this counter by testing and monitoring the user load.

SIP Protocol: Incoming Messages/sec

This component monitor returns the rate of received messages, per second. You will need to baseline this counter by testing and monitoring the user load.

SIP Protocol: Events In Processing

This component monitor returns the number of SIP transactions, or dialog state change events, that are currently being processed. You will need to baseline this counter by testing and monitoring the user load.

SIP Responses: Local 500 Responses/sec

This component monitor returns the rate of 500 responses generated by the server, per second. This can indicate that there is a server component that is not functioning correctly.

SIP Responses: Local 503 Responses/sec

This component monitor returns the rate of 503 responses generated by the server, per second. The 503 code corresponds to the server being unavailable. On a healthy server, you should not receive this code at a steady rate. However, during ramp up, after a server has been brought back online, there may be some 503 responses. Once all users get back in and the server returns to a stable state, there should no longer be any 503 responses returned.

SIP Responses: Local 504 Responses/sec

This component monitor returns the rate of 504 responses generated by the server, per second. A few 504 responses to clients (for clients disconnecting abruptly) is to be expected, but this counter mainly indicates connectivity issues with other servers. It can indicate connection failures or delays connecting to remote servers.

SIP Load Management: Average Holding Time For Incoming Messages

This component monitor returns the average time that the server held the incoming messages currently being processed. This should usually be less than one second, on average, but it is normal to see short spikes of up to three seconds. The server will throttle new incoming messages after going above the high watermark and until the number of messages falls below the low watermark. The server starts rejecting new connections when the average holding time is greater than overload time of 15 seconds.

SIP Load Management: Address space usage

This component monitor returns the percentage of available address space currently in use by the server process. The returned value should be as low as possible.

SIP Load Management: Page file usage

This component monitor returns the percentage of available page file space currently in use by the server process. The returned value should be as low as possible.

IM Conferences: Active Conferences

This component monitor returns the number of active conferences. You will need to baseline this counter by testing and monitoring the user load.

IM Conferences: Connected Users

This component monitor returns the number of connected users in all conferences. You will need to baseline this counter by testing and monitoring the user load.

IM Conferences: Throttled Sip Connections

This component monitor returns the number of throttled Sip connections. If the value is greater than ten, it could indicate that Peer is not processing requests in a timely fashion. This can happen if the peer machine is overloaded. Peer is defined as the connected servers, adjacent Front-End servers, or MCUs in the same EE Pool. The same set of counters apply.

IM MCU Health And Performance: MCU Health State

This component monitor returns the current health of the MCU.

Possible values:
0 = Normal.
1 = Loaded.
2 = Full.
3 = Unavailable.

IM MCU Health And Performance: MCU Draining State

This component monitor returns the current draining status of the MCU.

Possible values:
0 = Not requesting to drain.
1 = Requesting to drain.
2 = Draining.

When a server is drained, it stops taking new connections and calls. These new connections and calls are routed through other servers in the pool. A server being drained allows its sessions on existing connections to continue until they naturally end. When all existing sessions have ended, the server is ready to be taken offline.

User Services - DBStore: Queue Latency (msec)

This component monitor returns the average time, in milliseconds, that a request is held in the database queue. This counter represents the time that a request spends in the queue of the Back-End Database Server. If the topology is healthy, this counter averages less than 100 ms. Occasional spikes are acceptable. The value will be higher on Front-End Servers that are located at the site opposite the location of the Back-End Database Servers. This value can increase if the Back-End Database Server is having performance problems or if network latency is too high. If the returned value is high, check both network latency and the health of the Back-End Database Server. Server health decreases as latency increases to 12 seconds, when server throttling begins.

User Services - DBStore: Sproc Latency (msec)

This component monitor returns the average time, in milliseconds, it takes to execute a stored procedure call. A healthy state is considered to be less than 100 ms. Server health decreases as latency increases to 12 seconds, when server throttling begins.

User Services - Https Transport: Number of failed connection attempts / Sec

This component monitor returns the rate of connection attempt failures, per second. You will need to baseline this counter by testing and monitoring the server's health.

Portions of this document were originally created by and are excerpted from the following sources:

Microsoft Corporation, “Technet Library,” Copyright Copyright 2012 Microsoft Corporation. 
All rights reserved. Available at
http://technet.microsoft.com/en-us/library/gg670897.aspx