This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Patch Management Policies

I'm putting together a patch management process from scratch.  Up to now patching has been managed by simply running Windows Update every so often and deploying what's available.

There is a lot of information available about how to download, deploy, and report on patches and patch compliance but I don't see anyone discussing the discretionary parts of the process.  When do you force patch deployments and reboots on your end users?  How long do you wait before applying patches to production servers?  Do you really test each patch or simply run them for a while on less important systems and install to Prod if nothing crashes?

I'm interested in hearing from other administrators how they handle patch compliance, what has worked or not worked, and why.

Thanks.

  • There is a lot of information available about how to download, deploy, and report on patches and patch compliance but I don't see anyone discussing the discretionary parts of the process.

    Your observation is consistent with mine, and I think the reason for this is the very word you used: discretionary.

    When something is discretionary, it's also usually organizationally-specific or organizationally-unique, which doesn't make it particularly good content for generic how-to information. At best you'll get a "Your Mileage May Vary" suggestion for what to do.


    When do you force patch deployments and reboots on your end users?

    How long do you wait before applying patches to production servers?

    Do you really test each patch or simply run them for a while on less important systems and install to Prod if nothing crashes?

    The answer to each of those three questions is: "It depends."  It depends on what your particularly organization needs in terms of security and patch management, and what the organization can tolerate in terms of operational disruption.


    Now, having said that, we can talk about what the expectation of software vendors (e.g. Microsoft, if we want to focus on the Windows OS for a moment) is with respect to those questions:

    • Patch deployments and reboots, regardless of whether to end-users or servers, should be implemented as soon as is reasonably practical, i.e. after appropriate (for your organization's risk level) pre-deployment testing, and at a time that will minimize operational disruptions. As an example, it would probably be a bad idea to deploy updates in early afternoon on the Wednesday after Patch Tuesday. Such an event would probably not be consistent with risk-level considerations, and almost certainly would result in operational disruptions. Yet, if you're talking about deploying in a testing lab, or to designated (read: identified and aware) pilot users, Wednesday afternoon might be perfectly acceptable.
    • The primary difference between deployment to end-users and to servers is going to be the risk and timing associated with reboots. If an end-user workstation fails to restart after patching (a rare occurrence itself, but not unheard of), only one (or maybe a few) individuals are negatively impacted. If a server fails to restart, the entire organization may be majorly affected. The significance here is with respect to how closely those reboot cycles are monitored. For end-users, you've got a dedicated staff member for each workstation who will monitor the success of the reboot and let you know at the start of the workday if it failed. But you don't want to wait until the start of the workday to have a dozen or more staff members inform you that a server is offline. That you need to know about immediately, so proactive and real-time monitoring of a server reboot is the critical component.
    • Nobody does thorough acceptability testing on every patch; that is simply not practical. From what I've seen, most organizations have a mix of one or more of these philosophies -- again relevant to the organizational risk involved in the particular patch as well as the targeted system.
      1. Wait a week and see if it blows up for anybody else. If not, deploy the patches to everybody.
      2. Deploy the patches to end-users and accept the risk that something may go wrong. (I strongly advise against this because rollback of defective patches is a very expensive operation; rather, designating a "short list" of highly trusted and generally self-sufficient power users as a pilot for end-user patches is preferable.)
      3. Patch servers from lowest-risk to highest-risk over a period of time. (This period of time is quite often driven by when you can take servers offline.) Except for application specific patches, generally if a common patch is problematic it will show up on the lower-risk servers before you get the opportunity to deploy to the higher-risk servers.
      4. Build out a testing lab with reproduction of production services as close as possible. Deploy some or all server/service/application patches (again, the choice of testing may be risk-motivated) to the test lab first and verify normal operations of the affected services/applications.


  • Lawrence Garvin wrote:

    • Nobody does thorough acceptability testing on every patch; that is simply not practical. From what I've seen, most organizations have a mix of one or more of these philosophies -- again relevant to the organizational risk involved in the particular patch as well as the targeted system.
      1. Wait a week and see if it blows up for anybody else. If not, deploy the patches to everybody.
      2. Deploy the patches to end-users and accept the risk that something may go wrong. (I strongly advise against this because rollback of defective patches is a very expensive operation; rather, designating a "short list" of highly trusted and generally self-sufficient power users as a pilot for end-user patches is preferable.)
      3. Patch servers from lowest-risk to highest-risk over a period of time. (This period of time is quite often driven by when you can take servers offline.) Except for application specific patches, generally if a common patch is problematic it will show up on the lower-risk servers before you get the opportunity to deploy to the higher-risk servers.
      4. Build out a testing lab with reproduction of production services as close as possible. Deploy some or all server/service/application patches (again, the choice of testing may be risk-motivated) to the test lab first and verify normal operations of the affected services/applications.


    I use a combination of 1 and 4 in our environment. There are exceptions to these guidelines as well, for instance I recently had a patch that seemed to work well on the test computers but in the week after patches came out some other organizations have had issues with the patch. I used my discretionary powers to not approve this patch to the rest of my organization until most of the issues with the patch were corrected and/or the patch expired.