Distributed File System (DFS)

Distributed File System (DFS)

This template assesses the status and overall performance of a Microsoft Distributed File System (DFS) service. This template uses Windows Performance Counters, WMI Monitors and Windows DFS Replication Event Log.

Prerequisites: WMI access to the target server.

Credentials: Windows Administrator on the target server.

Monitored Components

Note: You need to set thresholds for counters according to your environment. It is recommended to monitor counters for some period of time to understand potential value ranges and then set the thresholds accordingly. All Windows Event Log monitors (beginning with Warning or Error) should return zero values. Returned values other than zero indicate an abnormality. Examining the Windows DFS Replication log file should provide information pertaining to the issue.

Service: DFS Namespace

This counter monitors DFS Namespace service which enables you to group shared folders located on different servers into one or more logically structured namespaces. Each namespace appears to users as a single shared folder with a series of subfolders.

Service: DFS Replication

This counter monitors DFS Replication service which enables you to synchronize folders on multiple servers across local or wide area network connections. This service uses the Remote Differential Compression (RDC) protocol to update only the portions of files that have changed since the last replication.

Replication Folders: Conflict Space In Use (B)

This counter returns the total size (in bytes) of the conflict loser files and folders currently in the Conflict and Deleted folder used by the DFS Replication service. The DFS Replication service automatically detects and resolves conflicts encountered in replicated folders and moves the losing version to the Conflict and Deleted folder. The service automatically cleans up the Conflict and Deleted folder when it exceeds a pre-configured threshold of the quota.

Note: The instance field is installation-specific. Open perfmon, find DFS Replicated Folders object. After that, choose any counter in this object and you will see available instances (for example: test-{79E95064-B701-449D-9B3C-32F58932B96B}).

Replication Folders: Deleted Space In Use (B)

This counter returns the total size (in bytes) of the deleted files and folders currently in the Conflict and Deleted folder used by the DFS Replication service. The DFS Replication service detects remote deletes from its sending partner and moves the file or folder to the Conflict and Deleted folder. The service automatically cleans up the Conflict and Deleted folder when it exceeds a pre-configured threshold of the quota.

Note: The instance field is installation-specific. Open perfmon, find DFS Replicated Folders object. After that, choose any counter in this object and you will see available instances (for example: test-{79E95064-B701-449D-9B3C-32F58932B96B}).

Replication Folders: Staging Space In Use (B)

This counter returns the total size (in bytes) of the files and folders currently in the staging folder used by the DFS Replication service. This counter will fluctuate as staging space is reclaimed. The DFS Replication service stages files and folders in the staging folder before they are replicated, and automatically cleans up the staging folder when it exceeds a pre-configured threshold of the quota.

Note: The instance field is installation-specific. Open perfmon, find DFS Replicated Folders object. After that, choose any counter in this object and you will see available instances (for example: test-{79E95064-B701-449D-9B3C-32F58932B96B}).

Replication Folders: Updates Dropped

This counter returns the number of redundant file replication update records that were ignored by the DFS Replication service because they did not change the replicated file or folder. For example, dropped updates can occur when access control lists (ACLs) are overwritten with identical ACLs on a file or folder.

Note: The instance field is installation-specific. Open perfmon, find DFS Replicated Folders object. After that, choose any counter in this object and you will see available instances (for example: test-{79E95064-B701-449D-9B3C-32F58932B96B}).

Replication Folders: File Installs Retried

This counter returns the number of file installs that are being retried due to sharing violations or other errors encountered when installing the files. The DFS Replication service replicates staged files into the staging folder, decompresses them in the installing folder and renames them to the target location. The second and third steps of this process are known as installing the file.

This counter should be as low as possible.

Note: The instance field is installation-specific. Open perfmon, find DFS Replicated Folders object. After that, choose any counter in this object and you will see available instances (for example: test-{79E95064-B701-449D-9B3C-32F58932B96B}).

DFS Replication State

This counter shows the current state of DFS Replication service.

Possible values:
0 - Service Starting.
1 - Service Running.
2 - Service Degraded.
3 - Service Shutting Down.

DFS Volume State

This counter shows the current DFS volume state.

Possible values:
0 - Initialized.
1 - Shutting Down.
2 - In Error.
3 - Auto Recovery.

Warning: Failed to Contact Configuration on DC

This monitor returns the number of events when the DFS Replication service failed to contact the domain controller to access configuration information.

Type of event: Warning. Event ID: 1204.

You should check your network connection.

Warning: Staging Space above High Watermark

This monitor returns the number of events when the DFS Replication service detected that the staging space usage exceeds the staging quota for the replicated folder. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will attempt to clean up the staging space automatically.

Type of event: Warning. Event ID: 4202.

Staging files might be purged prematurely because the replicated folder contains files that are larger than the configured staging quota, or because the configured maximum staging size has been exceeded. This purging can cause excessive hard drive activity and CPU usage.

To avoid this error, increase the quota of the staging folder.

Warning: Failed to Clean Old Staging Data

This monitor returns the number of events when the DFS Replication service failed to clean up old staging files for the replicated folder at the local path. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 30 minute intervals. The service may start cleanup earlier if it detects some staging files have been unlocked.

Type of event: Warning. Event ID: 4206.

It is recommended to increase the quota of the staging folder.

Warning: Staging Space above Staging Quota

This monitor returns the number of events when the DFS Replication service detected that the staging space usage is above the staging quota for the replicated folder at the local path. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will attempt to clean up the staging space automatically.

Type of event: Warning. Event ID: 4208.

It is recommended to increase the quota of the staging folder.

Warning: File Prevented from Replication

This monitor returns the number of events when the DFS Replication service has been repeatedly prevented from replicating a file due to consistent sharing violations encountered on the file.

Type of events: Warning. Event ID: 4302 or 4304.

Event 4302: A local sharing violation occurs when the service cannot receive and update the file because the local file is being used. This occurs on the "receive" side of the file change. The file is already replicated. However, it cannot be moved from the installing directory to the final destination.

Event 4304: The service cannot stage a file for replication because of a sharing violation. This occurs on the "send" side of the file change. DFSR wants to stage or copy the file for replication; however, an exclusive lock prevents this.

Warning: No Configured Connections for Replication Folder

This monitor returns the number of events when the DFS Replication service has detected that no connections are configured for replication group. No data is being replicated for this replication group.

Type of event: Warning. Event ID: 6804.

If the data replicates through DFS Replication without any issues, ignore this event. If problems with replication exist, you should closely look at the replication folder configuration.

Error: Failed to Contact DC

This monitor returns the number of events when the DFS Replication service failed to contact the domain controller to access configuration information. Replication is stopped. The service will try again during the next configuration polling cycle.

Type of event: Error. Event ID: 1202.

The DFS Replication service could not contact the domain controller to obtain new configuration information.  If replication was previously working and this error is reported, the service will use cached configuration, stored locally, but will not respond to any configuration changes until the issue is resolved.

This event can be caused by TCP/IP connectivity, firewall, Active DirectoryRegistered, or DNS errors or misconfigurations.

Error: Replication Stopped

This monitor returns the number of events when the DFS Replication service stopped replication on the replicated folder on the local path.

Type of event: Error. Event ID: 4004.

When the DFS Replication service initializes the replicated folders for the replication process, it traverses all related paths to check whether the replicated folders are reparse points that act as symbolic links or that act as mount points.

The DFS Replication service expects to open synchronous handles to access these paths. However, it uses the asynchronous handles incorrectly. The DFS Replication service cannot handle the I/O requests that are held by a filter driver. Therefore, the DFS Replication service stops responding.

If the DFS Replication service stops responding, you may need to install the following HotFix: http://support.microsoft.com/kb/977381/EN-US.

Error: File Changed on Multiple Servers

This monitor returns the number of events when the DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.

Type of event: Error. Event ID: 4412.

Error: Communication with Partner

This monitor returns the number of events when the DFS Replication service encountered an error communicating with a partner for the listed replication group.

Type of event: Error. Event ID: 5002.

This error usually appears when the DFS Replication service is unable to set up a Remote Procedure Call (RPC) binding to communicate with the partner. It may also be caused by RPC blocking at a firewall between partners, or a DNS error. Finally, this error may appear when the two partner computers are running different versions of Dfsr.exe.

Verify that normal communication between the two computers is working. Ensure that the same version of DFSR.exe is running on both partner computers. The file is located at %system32%\DFSR.exe. You might need to install service packs, downloads or hotfixes on one or both computers in order to run matching versions of the service.

Error: Host Unreachable

This monitor returns the number of events when the DFS Replication service failed to communicate with a partner for the replication group. This error can occur if the host is unreachable or if the DFS Replication service is not running on the server.

Type of event: Error. Event ID: 5008.

Check network connectivity.

Error: Partner Didn't Recognize the Connection

This monitor returns the number of events when the DFS Replication service failed to communicate with a partner for the replication group. The partner did not recognize the connection or the replication group configuration.

Type of event: Error. Event ID: 5012.

This error usually occurs when one partner attempts to establish an RPC connection with another member, but is unable to. The problem may be intermittent and resolve itself automatically. If the two members obtain configuration data from different domain controllers, they may have mismatched configuration data due to Active Directory replication reaching one domain controller before the other.

The service will retry the connection periodically. If this problem persists, please verify that Active Directory replication is working and that the service is able to reach a domain controller.

Error: Connection with Partner Removed or Disabled

This monitor returns the number of events when the DFS Replication service detected that the connection with a partner for the replication group has been removed or disabled.

Type of event: Error. Event ID: 5016.

Check network connectivity.

Error: Invalid Local Path to Replication Folder

This monitor returns the number of events when the replicated folder has an invalid local path.

Type of event: Error. Event ID: 6404.

DFS Replication cannot replicate the replicated folder because the configured local path is not the fully qualified path name of an existing, accessible local folder. This replicated folder is not replicating to or from this server.

Fix this problem by configuring the replicated folder with a valid local path using the DFS Management snap-in or the Dfsradmin.exe command-line tool.

Warning: No Free Space for Replication

This monitor returns the number of events when DFS Replication service encountered errors replicating one or more files because adequate free space was not available on volume. This volume contains the replicated folder, the staging folder, or both. The service will retry replication periodically.

Type of event: Error. Event ID: 4502.

Please make sure that enough free space is available on this volume for replication to proceed.

APM_Distributed File System (DFS)_Template.pdf