This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Patch manager issues

We deployed patch manager a few months ago and did testing with several machines set up to test with. These tests seemed to go well, with the patches getting installed as they should and rebooting as expected.

After this testing, in October, we ramped it up to patch most of our environment. That didn't go quite so well.

I have servers broken out into groups - Domain Controllers, File servers, SQL Servers, Terminal servers and Utility Servers, so that I can control when each type reboots - I prefer DC's to go first, followed by File, then SQL, and after that any order.

The main issue we had that cycle was a lot of machines seemed to get patched, but they didn't reboot. There were also a lot of patches that didn't install for one reason or another. We attributed it to the fact that for some reason, when our main DC/DNS server rebooted, DNS didn't come back right, causing DNS lookup issues. Since that was the only DNS servers that our machines were using, when it wasn't working, obviously that would cause issues. Since then, all servers have been set to look to two DNS Servers, so if one isn't available, they should still be able to resolve off the second server.

This month we had similar issues. Two of the smaller patch tasks that only have two and four machines each at the moment (DC's and File Servers) both returned 100% success.

According to the task history for the jobs, for the Utility servers group, only 28 of the 42 servers in the group had any tasks performed on them - successful or otherwise. The Terminal Servers group had similar results. Only 24 out of 33 servers in the group had any action performed on them by Patch Manager.

Of those servers that had actions performed on them, there were some failures. I believe some of them may be just 'normal' windows update failures, and some seem to be related to a pre-install reboot (the server may have taken more then five minutes to come back to a state where it could process PM commands), so right now, I don't think those need much attention, but one thing that did happen is of the machines that did get acted upon, there were some that, while they did install successfully, and PM said the post install reboot was initiated successfully, the machines actually did not get rebooted, so when users logged in, they were prompted with 'you need to reboot to finish installing updates'.

The fact that a number of machines in the groups did not get acted upon in any way is concerning, along with the fact that PM indicated that machines were successfully sent a post install reboot command did not actually reboot.

I saw a strange issue back when I was setting up groups where exactly half the machines I selected to add to a group actually got added (every other server got added), so I had to keep going back adding machines over and over until finally, they were all added. I don't know if this lack of action on servers in the group is in any way related to that.

I figured I'd post here to see if anyone has any thoughts before I call into support.

Does anyone have any thoughts or suggestions?

  • Hello,

    We have the same issue, since going to v2.1.1 - when we patch a large percentage of servers issue the reboot request, but do NOT actually reboot.  They maintain connection with the PM server for quite some time, and eventually go away, but never actually reboot.  If I catch it quickly enough, I login, tell it to reboot, and then PM completes the job.

    Is there any answer/update on this question?  With over 1700 servers to patch, I can't be logging into each one just to issue the reboot, and after our last fiasco with catroot, we don't REALLY want to issue random reboots.  My only 'workaround' is to tell them to never reboot, and then issue reboots via PM in between patching rounds, which clearly adds a LOT of time to our jobs. I have a feeling it might be related to WMI in some way because I have noticed on some servers, when I "Check computer Connectivity" and repair, they do behave better afterwards.

    Thanks!