This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

What Maintenance Window?

Not quite two years ago, SAP founder Hasso Plattner was asked why his company was developing an appliance for running an entire SAP instance in RAM.  His response was surprising. 

"The simple reality is that 15 seconds is the longest anyone will wait for an answer on an iPhone," Plattner said. And a moment later he added, "So the real key is not so much the speed of our technology, but rather how we apply the speed to business."

In my opinion, this one statement encapsulates a recent paradigm shift in IT toward speed, agility, and heroic levels of availability.  A decade ago, if the e-mail server went down, it wasn't the end of the world as long as external customers weren't impacted.  These days, your internal customers are as demanding as external ones, and they know your phone number. emoticons_laugh.png  People demand 24/7 access to all their applications, and they're getting it from their personal and social apps.  Why shouldn't you be able to provide the same? 

In my experience, this translates into ever smaller maintenance windows.  In the past, we might have had all weekend to get an application upgraded, or to troubleshoot a failed patch.  That's just not the case anymore.  In addition to shrinking time windows, more and more applications that aren't customer facing are being added to the increased maintenance demands. 

What I want to know is how this shift is affecting your patching schedules, methodologies, and applications.  How can we possibly be expected to keep applications patched and available in an environment like this?  The number of applications is growing, the number of "servers" is growing, thanks to virtualization.  With the ever-increasing emphasis on security, the sheer multitude of patches that are released every month now is mind-boggling.  And to top it all off, the increasing number of regulatory audits means you have to apply even the most irrelevant patches IMMEDIATELY. 

So how are IT people out there managing patching when the battle is so asymmetrical?  How are you balancing the need for availability, with the need for constant patching?  Also, in your most critical production environments, how are you managing to patch quickly in an automated fashion, while still providing for necessary post-patch testing and verification in a shrinking maintenance window?

I hope we can have some fruitful discussion over the next few weeks on these topics that will help us, and our fellow troops!

  • Luckily, our environment is not 24/7......yet. So our patch processes on servers can take place outside of the working hours monday-friday 7am - 5pm. I am interested in other peoples responses, just in case our environment starts to change. The patches on our workstations happen during the workday for most users, usually 15 minutes into the workday.

  • We are a MSP and cloud service provider so I have an opportunity to work with customers with all sorts of different availability requirements.  I am also in the process of replacing our current patching solution with a new system.  The unfortunate truth of patching is that you need to do it and it takes both time and reboots.  The important thing for us was finding a system that would easily support flexible scheduling so that we can accommodate different patching schedule requirements.  We also wanted a solution that would get the entire job done in one patch window with as few reboots as possible.

    When it comes to availability I have found that it's less about the patching solution and more about how the environment is built.  If you want high availability you need to build the environment for it using clustering and other HA technologies.  By doing this you can bring one half of an environment down without ever taking the important services offline allowing you to patch one half at a time.  Building these types of environments can be expensive and complex.

    We evaluated several patching solutions and what I found is that the patching products out there have not gone beyond just patching to really address the needs beyond patching.  I would like to see more integration with hypervisors and clustering technologies so the patching solution can be configured to automatically patch environments leveraging the HA technologies and do things like manage VM snapshots of the environment to allow a fail-back point in the event patching doesn't go well.

    This has been my experiences, I am eager to hear from others on this topic!

  • I don't really have any insight on this topic but found it amusing that I'm reading this thread while sitting here at 5:15 waiting on MS patches to apply to my Solarwinds servers.  emoticons_laugh.png

  • Excellent points.  If one is fortunate enough to have modern applications that lend themselves to clustering and load balancing, then it's a much easier job. 

  • On our environment patching is done after work-hours (9am -7pm) but due to batch processes working late at night or at weekend, it's a bit difficult to manage sometimes.

    In order to test post-patching my design of the patching solution implemented a few selected users to be beta testers and those computers are patched everyday night, if any patch available/approved. If all goes well with those computers/users, the approved patches for that group are approved on the beginning of the next week to all others. Computers not belonging to the beta testers group are patched at the weekends (mainly Saturdays) when it less likely to have someone working or batch processes schedule. We work on Financial Markets and at the weekends all stock exchange are closed!!!

    With the servers that's another story... For our mission critical applications we have clusters and currently evaluating the option to send them to a cloud (or down the drain if it goes bad!!!). With that solution, patching those servers will be subject of the contracts SLA. Currently, any upgrade or patching is under contract although the servers are physically stored on our datacenter and is a big problem to schedule downtime maintenance (they work at night and weekends). The rest of the servers are almost all virtualized and with snapshots we manage to schedule fairly easy maintenance downtimes.

    I like the idea of byrona to a solution that would integrate the patching with the snapshot/fallback capability of virtualization solutions.

  • No maintenance window is just about right.  We aren't a 24x7 shop in the Windows arena as there are other parts of the business that are 24x7, but mostly there is no acceptable Window to get maintenance done.  Not to mention how much fun patch management can be... Patch manager makes it better, but it still is closer to pulling teeth than set it and forget it...

  • Interesting replies, all.  So am I incorrect in my assumption about users demanding access to what we would deem non mission critical apps 24/7?  Or are people moving that stuff to the cloud?  I am in financial services as well, and public cloud just isn't a step most fiservs are willing to take yet.  I just find it fascinating that if someone's Blackberry can't get mail, the calls go out faster than if a customer can't send a massive transaction. emoticons_laugh.png

    I do love the idea of a patching solution that would automate snaps and failback.  If someone can come up with that, plus integrate post patch testing / scripting, it's a gold mine.  Problem is, everyone has their own way of how they want snaps done, so it might be impossible.  Thanks for the input!

  • Hey Brandon,

    You aren't incorrect about your assumption that users demand access 24/7. The thing is whether their "demand" is valid. Ask our payroll folks when's a good time to take down the tax and payroll processing servers (we're an ASP/PEO), and they'll tell you "never". Give them two options to choose from or simply tell them when it's happening based on your experience of their truly relevant demands, and things will go just fine.

    VMware's short-lived vSphere patching solution for Windows/Linux did incorporate the pre-patch snap, etc, but I haven't tracked where that product/technology went when they detached it from their host patching. You're right, though, that everyone's standards for validation are different, even if the capability is there. Heck, sometimes even that many snaps with all the crunching that takes place during patches like .NET Framework will complicate life while trying to help it (snap + massive data change + high SAN I/O + snap removal = downtime).

    --Chris