Geek Speak

15 Posts authored by: gregwstuart

Today’s data center is full of moving parts. If your data center is hosted on-premises, there’s a lot to do day-in and day-out to make sure everything is functioning as planned. If your data center is a SaaS data center hosted in the cloud, there are still things you need to do, but far fewer compared to an on-premises data center. Each data center carries different workloads, but there’s a set of common technologies that need to be monitored. When VM performance isn’t monitored, you can miss a CPU overload or max out memory. When the right enterprise monitoring tools are in place, it’s easier to manage the workloads and monitor their performance. The following is a list of tools I believe every data center should have.

 

Network Monitoring Tools

Networking is so important to the health of any data center. Both internal networking and external network play a key role in the day-to-day usage of the data center. Without networking, your infrastructure goes nowhere. Installing a network monitoring tool that tracks bandwidth usage, networking metrics, and more allows a more proactive or preventative approach to solving networking issues. Furthermore, a networking tool such as an IP management tool that stores all the available IP addresses and ranges and dynamically updates as addresses get handed out. This will go a long way in staying organized.

 

Virtual Machine Health/Performance Monitoring Tools

Virtualization has taken over the data center landscape. I don’t know of a data center that doesn’t have some element of the software-defined data center in use. Industry-leading hypervisor vendors have created so many tools and advanced options for virtualization that go above and beyond server virtualization. With so many advanced features and software in place, it’s important to have a tool that monitors not just your VMs, but your entire virtualization stack. Software-defined networking (SDN) has become popular, and while that might end up falling under the networking section, most of the SDN configurations can be handled directly from the virtual administration console. Find a tool that will provide more metrics than you think you need; it may turn out that you scale out and then require them at some point. A VM monitoring tool can catch issues like CPU contention, lack of VM memory, resource contention, and more.

 

Storage Monitoring Tools

You can’t have a data center without some form of storage, whether it be slow disks, fast disks, or a combination of both. Implementing a storage monitoring tool will help the administrator catch issues that slow business continuity such as a storage path dropping, a storage mapping being lost, loss of connectivity to a specific disk shelf, a bad disk, or a bad motherboard in a controller head unit. Data is king in today’s data center, so it’s imperative to have a storage monitoring tool in place to catch anomalies or issues that might hurt business continuity or compromise data integrity.

Environment Monitoring Tools          

Last, but definitely not least, a data center environment monitoring tool will keep you from a loss of hardware and data altogether. This type of tool will protect a data center against physical or environmental issues within the data center. A good environment monitoring tool will alert you to the presence of excess moisture in the room, an extreme drop in temperature, or spike in temperature. Monitoring tools usually come with a video aspect to monitor it visually, plus sensors installed in the data center room to monitor environmental factors. Water can do serious damage in your data center. Great monitoring tools will have monitors installed near the floor to detect moisture and a build-up of water.

 

You can’t be too careful when it comes to protecting your enterprise data center. Monitoring tools like the ones listed above are a good place to start. Check your data center against my list and see if it matches up. Today, there are many tools that encompass all these areas in one package, making it convenient for the administrator to manage it all from a single screen.

Small-to-medium-sized businesses (SMBs) tend to get overlooked when it comes to solutions that fit their needs for infrastructure monitoring. There are many tools that exist today that cater to the enterprise, where there’s a much larger architecture with many moving parts. The enterprise data center requires solutions for monitoring all the systems within it that an SMB might not have. The SMB or remote office/branch office (ROBO) is a much smaller operation, sometimes using a few servers and some smaller networking gear. There may be a storage solution in the back-end, but it’s more likely that the data for their systems is stored on a local drive within one or all of their servers.

 

It seems unfair for the SMB to be ignored as developers typically work to create solutions that will fit better in an enterprise architecture than in a smaller architecture, but that’s the nature of the beast. There’s more money in enterprise solutions, with enterprise licensing agreements (ELAs) reaching into the millions of dollars for some clients. It would make sense that enterprise software is more readily available than SMB software, but that doesn’t mean there aren’t solutions out there for the SMB.

Find What’s Right for YOU

Don’t pick a solution with more than what you need for the infrastructure within your organization. If your organization consists of a single office with three physical servers, a modem/router, and a direct attached storage (DAS) solution, you don’t need an enterprise solution to achieve the systems monitoring. There are many less expensive or open-source server monitoring tools out there that contain a lot of documentation to help you get through installation and configuration. Enterprise solutions aren’t always the answer just because they are “enterprise solutions.” Bigger isn’t always better. If a solution with a support agreement is more in line with your expectations, there are quite a few providers that can offer an SMB-class monitoring solution for you.

 

Don’t Overpay

Software salespeople are all about selling, selling, selling. Many times, salespeople are sent along on a client call with a solutions engineer (SE) who has more technical experience than the salesperson. Focus more attention on the SE and less on the salesperson. There’s no need to shell out a ton of money for an ELA that’s way more than what you need for your SMB infrastructure. Many times, quality is associated with costs, and that’s just plain false. When it comes to choosing a monitoring tool for your SMB, "quality over quantity" should be your mantra. If you don’t require 24/7/365 monitoring and SLAs at the platinum level with two-minute response time, don’t buy it. Find a tool that fits your SMB budget, and don’t feel you are slighted because you didn’t buy the slightly shiny, more expensive enterprise solution.

 

Pick a Vendor That Will Work for YOU

Software vendors, especially those that work to develop large enterprise monitoring solutions, don’t always have the best interests of the SMB in mind when building a tool. By focusing your search on vendors that scale to the SMB market, you’ll find that the sales process and the support will be tailored to the needs of your organization. With vendors building scalable tools, customization becomes a key selling point. Vendors can cater to the needs and requirements of the customer, not the market.

 

Peer Pressure is Real

Don’t take calls from software vendors that cater only to the needs of enterprise-scale monitoring solutions. Nothing against enterprise monitoring solutions—they’re needed for the enterprise. However, focusing on the chatter and the “latest and greatest” types of marketing will make your SMB feel even smaller. There’s no competition. Pick what works for your SMB. Don’t overpay. Find a vendor that will support you. By putting all these tips in place, you can find a monitoring tool for your SMB that won’t make you feel like you had to settle.

Monitoring tools are vital to any infrastructure. They analyze and give feedback on what’s going on in your data center. If there are anomalies in traffic, a network monitoring tool will catch it and alert the administrators. When disk space is getting full on a critical server, a server monitoring tool alerts the server administrators that they need to add space. Some tools are only network tools, or only systems tools. However, these may not always provide all the analysis you need. There are additional monitoring tools that can cover everything happening within your environment.

 

In searching for a monitoring tool that fits the needs of your organization, it can be difficult to find one that’s the right size for your environment. Not all monitoring tools are one-size-fits-all. If you’re searching for a network monitoring tool, you don’t need to purchase one that covers server performance, storage metrics, and more. There are several things to consider when choosing a monitoring tool that fits your environment.

 

Run an Analysis on Your Environment

The first order of business when trying to determine which monitoring tool best fits your needs is to analyze your current environment. There are tools on the market today that help map out your network environment and gather key information such as operating systems, IP addresses, and more. Knowing which systems are in your data center, what types of technologies are present, and what application or applications they support will help you decide which tools are the best fit.

 

Define Your Requirements

There may be legal requirements defining what tools need to be present in your environment. Understanding these specific requirements will likely narrow down the list of potential tools that will work for you. If you’re running a Windows environment, there are many built-in tools that perform the tasks needed in an environment. Additionally, if your organization is using these built-in tools, it may not be necessary to spend money on another tool to do the same thing.

 

Know Your Budget

Budgetary demands typically determine these decisions for most organizations. Analyzing your budget will help you understand which tools you can afford and will narrow the list down. Many tools do more than needed for some, so it’s not necessary to spend more on a tool that might be outside your budget.

 

On-prem or Cloud?

When picking a monitoring tool, it’s important to research whether you want an on-premises tool or a cloud-based one. SaaS tools are very flexible and can store the information the tool gathers in the cloud. On the other hand, having an on-premises tool keeps everything in-house and provides a more secure option for data gathered. Choosing an on-prem tool gives you the ability to see your data 24/7/365 and have complete ownership of it. With a SaaS tool, it’s likely you could lose some visibility into how things are operating on a daily basis. Picking the right hosting option should be strictly based on your requirements and comfort with the accessibility of your data.

 

Just Pick One Already

This isn’t meant to be harsh, but spending too time researching and looking for a tool that fits your needs may put you in a bad position. While you’re trying to choose between the best network monitoring tools, you could be missing out on what’s actually going on inside your systems. Analyze your environment, define your requirements, know your budget, pick a hosting model, and then make your selection. By ensuring the monitoring tool solution fits the needs of your environment, it will pay dividends in the end.

 

Many organizations grow each year in business scope and footprint. When I say footprint, it’s not merely the size of the organization, but the number of devices, hardware, and other data center-related items. New technologies creep up every year, and many of those technologies live on data center hardware, servers, networking equipment, and even mobile devices. Keeping track of the systems within your organization’s data center can be tricky. Simply knowing where the device is and if it’s powered on isn’t enough information to get an accurate assessment of the systems' performance and health.

 

Data center monitoring tools provide the administrator(s) with a clear view of what’s in the data center, the health of the systems, and their performance. There are many data center monitoring tools available depending on your needs, including networking monitoring, server monitoring, or virtual environment monitoring, and it’s important to considering both open-source and proprietary tools available.

 

Network Monitoring Tools for Data Centers

 

Networking can get complicated, even for the most seasoned network pros. Depending on the size of the network you operate and maintain, managing it without a dedicated monitoring tool can be overwhelming. Most large organizations will have multiple subnets, VLANs, and devices connected across the network fabric. Deploying a networking tool will go a long way in understanding what network is what, and whether or not there are any issues with your networking configuration.

 

An effective networking tool for a data center is more than just a script that pings hosts or devices across the network. A good network tool monitors everything down to the packet. Areas in the network where throughput is crawling will be captured and reported within the GUI or through SMTP alerts. High error rates and slow response times will also be captured and reported. Network administrators can customize the views and reports that are fed to the GUI to their specifications. If networking is bad or broken, things will escalate quickly. The best network monitoring tools can help avoid this.

 

Data Center Server Monitoring Tools

 

Much of the work that a server or virtual machine monitoring tool does can be also accomplished using a good network monitoring tool. However, there are nuances within server/VM monitoring tools that go above and beyond the work of a network monitoring tool. For example, there are tools designed specifically to monitor your virtual environment.

 

A virtual environment essentially contains the entire data center stack, from storage to networking to compute. This entire stack is more than just simple reachability and SNMP monitoring. It’s imperative to deploy a data center monitoring solution that understands things at a hypervisor level where transactions are brokered between the kernel and the guest OS. You need a tool that does more than tell you the lights are still green on your server. You need a tool that will alert you if your server light turns amber and why it’s amber, as well as how to turn it back to green.

 

Some tools offer automation in their systems monitoring. For instance, if one of your servers is running high on CPU utilization, the tool would migrate that VM to a cluster with more available CPU for that VM. That kind of monitoring is helpful, especially when things go wrong in the middle of the night and you’re on call.

 

Application Monitoring Tools

 

Applications are the lifeblood of most organizations. Depending on the customer, some organizations may have to manage and monitor several different applications. Having a solid application performance monitoring (APM) tool in place is crucial to ensuring that your applications are running smoothly and the end users are happy.

 

APM tools allow administrators to see up-to-date information on application usage, performance metrics, and any potential issues that may arise. If an application begins to deliver a poor end-user experience, you want to know about it as much in advance of the end user as possible. APM tools track everything from client CPU utilization, bandwidth consumption, and many other performance metrics. If you’re managing multiple applications in your environment, don’t leave out an APM tool—it might save you when you need it most.

 

Finding the Best Data Center Monitoring Tools for Your Needs

 

Ensure that you have one or all of these types of tools in your data center. It saves you time and money in the long run. Having a clear view of all aspects of your data center and their performance and health helps build confidence in the reliability of your systems and applications.

Cost plays a factor in most IT decisions. Whether the costs are hardware- or software-related, understanding how the tool’s cost will affect the bottom line is important. Typically, it’s the engineer’s or administrator’s task to research tools and/or hardware to fit the organization's needs both fiscally and technologically. Multiple options are available to organizations from open-source tools, proprietary tools, and off-the-shelf tools for purchase. Many organizations prefer to either build their own tools or purchase off-the-shelf solutions that have been tried and tested. However, the option of open-source software has become increasingly popular and adopted by many organizations both in the public and private sector. Open-source software is built, maintained, and updated by a community of individuals on the internet and it can change on the fly. This poses the question: is open-source software suitable for the enterprise? There are both pros and cons that can make that decision easier. 

 

The Pros of Open-source Software

 

Open-source software is cost-effective. Most open-source software is free to use. In cases where third-party products are involved, such as plug-ins, there may be a small cost incurred. However, open-source software is meant for anyone to download and do with as they please, to some extent based on licensing. With budgets being tight for many, open-source could be the solution to help stretch your IT dollars.

 

Constant improvements are a hallmark of open-source software. The idea of open-source software is that it can and will be improved as users see flaws and room for improvements. Open-source software is just that: open, and anyone can update it or improve its usage. A user that finds a bug can fix it and post the updated iteration of the software. Most large-scale enterprise software solutions require major releases to fix bugs and are bound by major release schedules to get the latest and greatest out for their customers.

 

The Cons of Open-source Software

 

Open-source software might not stick around. There’s a possibility that the open-source software your organization has hedged their bets on simply goes away. When the community behind updating the software and writing changes to the source code closes up shop, you’re the one now tasked with maintaining it and writing any changes pertinent to your organization. The possibility of this happening makes open-source a vulnerable choice for your organization.

 

Support isn’t always reliable. When there is an issue with your software or tool, it’s nice to be able to turn to support for help resolving your issue. With open-source software, this isn’t always guaranteed, and if there is support, there aren’t usually the kind of SLAs in place that you would expect with a proprietary enterprise-class software suite.

 

Security becomes a major issue. Anyone can be hacked. However, the risk is far less when it comes to proprietary software. Due to the nature of open-source software allowing anyone to update the code, the risk of downloading malicious code is much higher. One source referred to using open-source software as “eating from a dirty fork.” When you reach in the drawer for a clean fork, you could be pulling out a dirty utensil. That analogy is right on the money.

 

The Verdict

 

Swim at your own risk. Much like the sign you see at a swimming pool when there is no lifeguard present, you have to swim at your own risk. If you are planning on downloading and installing an open-source software package, do your best to scan it and be prepared to accept the risk of using it. There are pros and cons, and it’s important to weigh them with your goals in mind to decide whether or not to use open-source.

When it comes to getting something done in IT, the question of "How?" can be overwhelming. There are many different solutions on the market to achieve our data center monitoring and management goals. The best way to achieve success is for project managers and engineers to work together to determine what tools best fit the needs of the project.

 

With most projects, budgets are a major factor in deciding what to use. This can make initial decisions relatively easy or increasingly difficult. In the market, you’ll find a spectrum from custom implementations of enterprise management software to smaller, more nimble solutions. There are pros and cons to each type of solution (look for a future post!), and depending on the project, some cons can be deal-breakers. Here are a couple of points to think about when deciding on a tool/solution to get your project across the finish line.

 

Budget, Anyone?

 

Budgets run the IT world. Large companies with healthy revenues have large budgets to match. Smaller organizations have budgets more in line with the IT services that they need to operate. Each of these types of companies need to have a solution that fits their needs without causing problems with their budgets. There are enterprise management systems to fit a variety of budgets for sure. Some are big, sprawling systems with complicated licensing and costs to match. Still others consist of community-managed tools that have less costs associated with them, but also have less support. And, of course, there are tools that fit in the middle of those two extremes.

 

Don’t think that having limitless budget means that you should just buy the most expensive tool out there. You need to find a solution that first and foremost fits your needs. Likewise, don’t feel like a small budget means that you can only go after free solutions or tools with limited support. Investigating all the options and knowing what you need are the keys to finding good software at reasonable costs.

 

Do I Have the Right People?

Having the right people on your IT staff also helps when choosing what type of management tool to use. Typically, IT pros love researching tools on their own and spend hours in community threads talking about tools. If you have a seasoned and dedicated staff, go with a more nimble tool. It usually costs less, and your staff will ensure it gets used properly.

 

Conversely, if your IT staff is lacking, or is filled with junior level admins, a nimble tool might not be the best solution. An enterprise solution often comes with more support and a technical account manager assigned directly to your account. Enterprise solutions often offer professional services to install the tool and configure it to meet the demands of your infrastructure. Some enterprise management software vendors offer on-site training to get your staff up to speed on the tool’s use.

 

Don’t forget that sometimes the best person for the job may not even be a person at all! Enterprise management systems often provide the ability to have tools that can automate a large number of tasks, such as data collection or performance optimization. If your staff finds itself overworked or lacking in certain areas, you may be able rely on your EMS platform to help you streamline the things you need to accomplish and help you fill in the gaps where necessary. Not everyone has the need for a huge IT team, but using your platforms as a force multiplier can give you an advantage.

 

There are many other points to discuss when deciding on an enterprise monitoring or management system versus a nimble tool. However, the points discussed above should be the most pertinent to your discussions. Do not make any decisions on a solution without taking the time to make some proper assessments first. Trust your staff to be honest about their capabilities, ensure your budgetary constraints are met, and choose a tool that will be the best fit for the project. In the end, what matters most is delivering a solution that meets your customer and/or company’s needs.

I don’t think there’s anyone out there that truly loves systems monitoring. I may be wrong, but traditionally, it’s not the most awesome tool to play with. When you think of systems monitoring, what comes to mind? For me, as an admin, a vision of reading through logs and looking at events and timestamps keeps me up at night. I definitely see that there is a need for monitoring, and your CIO or IT manager definitely wants you to set up the tooling to get any and all metrics and data you can dig up on performance. Then there’s the ‘root cause’ issue. The decision makers want to know what the root cause was when their application crashed and went down for four hours. You get that data from a good monitoring tool. Well, time to put on a happy face and implement a good tool. Not just any tool will do though--you want a tool that isn’t just going to show you a bunch of red and green lights. For it to be successful, there has to be something in it for you! I’m going to lay out my top three things that a good monitoring tool can do for you, the admin or engineer in the trenches day in and day out!

 

Find the Root Cause

 

Probably the single best thing a (good) systems monitoring tool can do is find the root cause of an issue that has become seriously distressing for your team. If you’ve been in IT long enough, the experience of having an unexplained outage is all too familiar. After the outage is finally fixed and things are back online, the first thing the higher-ups want to know is “why?” or “what was the root cause?” I cringe whenever I hear this. It means I need to dig through system logs, applications event logs, networking logs, and any other avenue I might have to find the fabled root cause. Most great monitoring tools today have root cause analysis (RCA) built in to their tool. RCA can literally save you hours and days of poring over logs. In discussions about implementing a systems monitoring tool, make sure RCA is high on your list of requirements. 

 

Establish a Performance Baseline

How are you supposed to know what is an actual event or just a false positive? How could you point out something that’s out of the norm for your environment? Well, you can’t, unless you have a monitoring tool in place that learns what normal activity looks like and what events are simply anomalies. With some tools that offer high frequency polling, you can pull baseline statistics for behavior down to the second. Any good monitoring tool will take a while to collect data and analyze it before producing metrics that have meaning to your organization. Over time, the metrics collected will learn adaptively, and constantly provide you with the most up-to-date, accurate metrics. Things like false positives can eat up a lot of resources for nothing. 

 

Reports, Reports, Reports

When there are issues that arise, or RCA that needs to be done, you want the systems monitoring tool to be capable of producing reports. Reports can come in the form of an exportable .csv file, .xls file, or .pdf. Some managers like a print out, a hard copy, they can write on and mark up. With the ability to produce reports, you can have a solid history of network or systems behavior that you can store in SharePoint or whatever file share you have. Most tools keep an archive or history of reports, but it’s always good to have the option of exporting for backup and recovery purposes. I’ve found that a sortable Excel file that I can search through comes in very handy when I need to really dig in and find an issue that might be hiding in the metrics.

 

Systems monitoring tools can do so much for your organization, and more importantly, you! Make sure that when you are looking for a systems monitoring tool, sift through all the bells and whistles and be sure that there are at least these three features built in… it might save your hide one day, trust me!

When it comes to IT, things go wrong from time to time. Servers crash, memory goes bad, power supplies die, files get corrupted, backups get corrupted...there are so many things that can go wrong. When things do go wrong, you work to troubleshoot the issue and end up bringing it all back online as quickly as humanly possible. It feels good, you might even high five or fist bump your co-worker, for the admin, this is a win. However, for the higher-ups, this is where the finger pointing begins.  Have you ever had a manager ask you “So what was the root cause?” or say “Let’s drill down and find the root cause.”

 

 

I have nightmares of having to write after action reports (AARs) on what happened and what the root cause was. In my imagination, the root cause is a nasty monster that wreaks havoc in your data center, the kind of monster that lived under your bed when you were 8 years old, only now it lives in your data center. This monster barely leaves a trace of evidence as to what he did to bring your systems down or corrupt them. This is where a good systems monitoring tool steps in to save the day and help sniff out the root cause. 

 

Three Things to Look for in a Good Root Cause Analysis Tool

A good root cause analysis (RCA) tool can accomplish three things for you, which can provide you with the best track on what the root cause most likely is and how to prevent it in the future. 

  1. A good RCA tool will…be both reactive and predictive. You don’t want a tool that simply points to logs or directories where there might be issues. You want a tool that can describe what happened in detail and point to the location of the issue. You can't begin to track down the issue if you don’t understand what happened and have a clear timeline of events.  Second, the tool can learn patterns of activity within the data center that allow it to become predictive in the future if it sees things going downhill. 
  2. A good RCA tool will…build a baseline and continue to update that baseline as time goes by.  The idea here is for the RCA tool to really understand what looks “normal” to you, what is a normal set of activities and events that take place within your systems. When a consistent and accurate baseline is learned, the RCA tool can get much more accurate as to what a root cause might be when things happen outside of what’s normal. 
  3. A good RCA tool will…sort out what matters, and what doesn’t matter. The last thing you want is a false positive when it comes to root cause analysis. The best tools can accurately measure false positives against real events that can do serious damage to your systems. 

 

Use More Than One Method if Necessary

Letting your RCA tool become a crutch to your team can be problematic. There will be times that an issue is so severe and confusing that it’s sometimes necessary to reach out for help. The best monitoring tools do a good job of bundling log files for export should you need to bring in a vendor support technician. Use the info gathered from logs, plus the RCA tool output and vendor support for those times when critical systems are down hard, and your business is losing money every minute that it’s down.

If you are old enough, you might remember that commercial that played late at night and went something like this... “It’s 10 p.m., do you know where your children are?”  This was a super-short commercial that ran on late night TV in the 1980s; it always kind of creeped me out a bit.  So the title of this post is slightly different, changing the time to 2 a.m., because accessing your data is much more than just a 10 p.m. or earlier affair these days. We want access to our data 24/7/365! The premise of that commercial was all about the safety of children after “curfew” hours.  If you knew your children were asleep in their beds at 10 p.m., then you were good. If not, you had better start beating the bushes and find out where they are.  Back then you couldn’t just send a text saying “Get home now!!!” with an angry emoji .  Things are different now, we’re storing entire data centers in the cloud, so I think it’s time to look back into this creepy commercial from late night 80s TV and apply it to our data in the cloud. “It’s 2 a.m., do you know where your data is?”

 

Here are some ways we can help ensure the safety of our data and know where it is, even at 2 a.m.

 

Understanding the Cloud Hosting Agreement

Much like anything else, read the fine print!  How often do we read it?  I’m guilty of rarely reading it unless I’m buying a house or something to do with a legal matter.  But for a Cloud Hosting Agreement, you need to read the fine print and understand what it is you are gaining or losing by choosing them for your data.  I’m going to use Amazon Web Services (AWS) as an example (this is by no means an endorsement for Amazon). Amazon has done a really good job of publishing the fine print in a way that’s actually easy to read and won’t turn you blind.  Here’s an excerpt from their data privacy page on their website:

Ownership and Control of customer content:

Access: As a customer, you manage access to your content and user access to AWS services and resources. We provide an advanced set of access, encryption, and logging features to help you do this effectively (such as AWS CloudTrail). We do not access or use your content for any purpose without your consent. We never use your content or derive information from it for marketing or advertising.

Storage: You choose the AWS Region(s) in which your content is stored. We do not move or replicate your content outside of your chosen AWS Region(s) without your consent.

Security: You choose how your content is secured. We offer you strong encryption for your content in transit and at rest, and we provide you with the option to manage your own encryption keys. 

 

 

Choose Where Your Data Lives

Cloud storage companies don’t put all their eggs, or more accurately, all your eggs, in one basket. Amazon, Microsoft, and Google have data centers all over the world.  Most of them allow you to choose what region you wish to store your data in.  Here’s an example with Microsoft Azure; in the U.S. alone, Microsoft has 8 data centers from coast to coast where you data is stored at rest.  The locations are Quincy, WA; Santa Clara, CA; Cheyenne, WY; San Antonio, TX; Des Moines, IA; Chicago, IL; Blue Ridge, VA; and Boydton, VA.  AWS offers their customers the choice of which region they wish to store their data, with the assurance that they “won’t move or replicate your content outside of your chosen AWS Region(s) without your consent (https://aws.amazon.com/compliance/data-privacy-faq/).” With both of these options, it’s easy to know where your data lives at rest.

 

 

Figure 1 Microsoft Azure Data Centers in the US

 

Monitor, Monitor, Monitor

It all comes back to systems monitoring.  There are so many cloud monitoring tools out there. Find the right one for your situation and manipulate it to monitor your systems the way you want them to be monitored.  Create custom dashboards, run health checks, manage backups, make sure it’s all working how you want it to work.  If there is a feature you wish was included in your systems monitoring tool, ping the provider and let them know.  For most good companies, feedback is valued and feature requests are honestly looked at for future implementation. 

Systems monitoring has become a very important piece of the infrastructure puzzle. There might not be a more important part of your overall design than having a good systems monitoring practice in place. There are good options for cloud hosted infrastructures, on-premises, and hybrid designs. Whatever situation you are in, it is important that you choose a systems monitoring tool that works best for your organization and delivers the metrics that are crucial to its success. When the decision has been made and the systems monitoring tool(s) have been implemented, it’s time to look at the best practices involved in ensuring the tool works to deliver all it is expected to for the most return on investment.

 

 

The term “best practice” has known to be overused by slick salespeople the world over; however, there is a place for it in the discussion of monitoring tools. The last thing anyone wants to do it purchase a monitoring tool and install it just for it to slowly die and become shelfware. So, let’s look at what I consider to be the top 5 best practices for systems monitoring. 

 

1. Prediction and Prevention              

We’ve all heard the adage that “an ounce of prevention is worth a pound of cure.”  Is your systems monitoring tool delivering metrics that help point out where things might go wrong in the near future? Are you over-taxing your CPU? Running out of memory? Are there networking bottlenecks that need to be addressed? A good monitoring tool will include a prediction engine that will alert you to issues before they become catastrophic. 

 

2. Customize and Streamline Monitoring        

As an administrator, when tasked with implementing systems monitoring, it can bring lots of anxiety and visions of endless, seemingly useless emails filling up your inbox. It doesn’t have to be that way. The admin needs to triage what will trigger an email alert and customize the reporting accordingly. Along with email alerts, most tools allow you to create custom dashboards to monitor what is most important to your organization. Without a level of customization involved, systems monitoring can quickly become an annoying, confusing mess.

 

3. Include Automation

Automation can be a very powerful tool, and can save the administrator a ton of time. In short, automation makes life better, so long as it’s implemented correctly. Many tools today have an automation feature where you can either create your own automation scripts or choose from a list of common, out-of-the-box automation scripts. This best practice goes along with the first one in this list, prediction and prevention. When the tool notices that a certain VM is running out of space, it will reach back to vCenter and add more memory before it’s too late, assuming it has been configured to do so. This makes life much easier, but proceed with caution, as you don’t want your monitoring tool doing too much. It’s easy to be overly aggressive with automation. 

 

4. Documentation Saves the Day

Document, document, document everything you do with your systems monitoring tool. The last thing you want is to have an alert come up and the night shift guy on your operations team not know what to do with it. “Ah, I’ll just acknowledge the alarm and reset it to green, I don’t even know what IOPS are anyways.” Yikes! If you have a “run book” or manual that outlines everything about the tool, where to look for alerts, who to call, how to log in, and so on, then you can relax and know that if something goes wrong, you can rely on the guy with the manual to know what to do. Ensure that you also track changes to the document because you want to monitor what changes are being made and check that they are legit, approved changes.

 

5. Choose Wisely

Last, but definitely not least, pick the right tool for the job. If you migrated your entire workload to the cloud, don’t mess around with an on-premises solution for systems monitoring. Just let the cloud provider use their proprietary tool and run with it. That being said, get educated on their tool and make sure you can customize it to your liking. Don’t pick a tool based on price alone. Shop around and focus on the options and customization you can do with the tool. Always choose a tool that achieves your organization's goals in systems monitoring. The latest isn’t always the greatest.

 

Putting monitoring best practices in place is a smart way to approach a plan to help ensure your tool of choice is going to perform its best and give you the metrics you need to feel good about what’s going on in your data center.

What happens to our applications and infrastructure when we place them in the cloud?  Have you ever felt like you’ve lost insight into your infrastructure after migrating it to the cloud?  There seems to be a common complaint among organizations that at one point in time had an on-premises infrastructure or application package. After migrating those workloads to the cloud, they feel like they don’t have as much ownership and insight into it as they used to.

 

That is expected when you migrate an on-premises workload to the cloud: it no longer physically exists within your workplace.  On top of your applications or infrastructure being out of your sight physically, there is now a web service (depending on the cloud service) that adds another layer of separation between you and your data. This is the world we now live in; the cloud has become a legitimate option to store not only personal data, but enterprise data and even government data. It’s going to be a long road to 100% trust in storing workloads in the cloud, so here are some ways you can still feel good about monitoring your systems/infrastructures/applications that you’ve migrated to the cloud.

 

Cloud Systems Monitoring Tools

Depending on your cloud hosting vendor, you may have some built-in tools that you can utilize to maintain visibility into your infrastructure and applications. Here’s a look at each of the big players in the cloud hosting game and what built in tools they have for systems monitoring:

 

Amazon Web Services CloudWatch

AWS has become a titan in the cloud hosting space and it doesn't look like they're slowing down anytime soon. Amazon offers a utility called Amazon CloudWatch that offers you complete visibility into your cloud resource and applications. CloudWatch allows you to see metrics such as CPU utilization, memory utilization, and other key metrics that you would define. Amazon’s website summarizes CloudWatch as the following:

“Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. You can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize your applications, and ensure they are running smoothly (AWS CloudWatch).”

 

Microsoft Azure Monitor:

Azure Monitor is a monitoring tool that allows users to navigate different key metrics gathered from applications, application logs, Guest OS, Host VMs, and activity logs within the Azure infrastructure. Azure Monitor visualizes those key metrics through graphics, portals views, dashboards, and different charts. Through Azure Monitor’s landing page, admins can onboard, configure, and manage their infrastructure and application metrics. Microsoft describes Azure Monitor as follows:

“Azure Monitor provides base-level infrastructure metrics and logs for most services in Microsoft Azure. Azure services that do not yet put their data into Azure Monitor will put it there in the future (MS Azure website)… Azure Monitor enables core monitoring for Azure services by allowing the collection of metrics, activity logs, and diagnostic logs. For example, the activity log tells you when new resources are created or modified.”

 

Full-Stack Monitoring, Powered by Google:

Google Cloud has made leaps and bounds in the cloud hosting space in the last few years and is poised to be Amazon’s main competitor. Much like Microsoft and Amazon, Google Cloud offers a robust monitoring tools called Full-Stack Monitoring, Powered by Google. Full-Stack works to offer the administrator complete visibility into their application and platform. Full-Stack presents the admin with a rich dashboard with metrics such as performance, uptime, and health of the cloud-powered applications stored in Google Cloud. Google lays out a great explanation and list of benefits that Full-Stack Monitoring provides to the end-user:

“Stackdriver Monitoring provides visibility into the performance, uptime, and overall health of cloud-powered applications. Stackdriver collects metrics, events, and metadata from Google Cloud Platform, Amazon Web Services, hosted uptime probes, application instrumentation, and a variety of common application components including Cassandra, Nginx, Apache Web Server, Elasticsearch, and many others. Stackdriver ingests that data and generates insights via dashboards, charts, and alerts. Stackdriver alerting helps you collaborate by integrating with Slack, PagerDuty, HipChat, Campfire, and more (Google Cloud website).”

 

Trust but Verify

While there are several great proprietary tools provided by the cloud vendor of choice, it’s imperative to verify that the metrics gathered are accurate. There are many free tools out there that can be run against your cloud infrastructure or cloud driven applications.  While it’s become increasingly acceptable to trust large cloud vendors such as Google, Amazon, and Microsoft, the burden rests on the organization to verify the data they are receiving in return. 

Before we get into my list of patch management tools, we all have used WSUS and some of us have become proficient at SCCM, these tools aren't in my top 3 list... they don't even crack my top 10!  However, from an enterprise point of view, an enterprise that is primarily Windows, those tools are great and they get the job done.  I want to talk about 3 tools that are easy to set up, easy to use and provide a good value to the admin team when it comes to managing updates and patches.  Administrators that have to manage patches (which is just about all of us) want an easy solution that's not going to require a ton of overhead.  I feel like SCCM is a monster when it comes to management and overhead, maybe that's not your experience.  The end result we all desire, is to move away from manually patching and find a solution that will do that work for us.  My list is not be any means definitive, these are tools that I've actually had interaction with in the past and that I've found to be helpful and easy to use.  Without further ado, here's my top 3 list of patch management tools (in no particular order) with an accompanying video:

 

LANDesk

 

 

 

GFI LanGuard

 

 

 

SolarWinds Patch Manager

 

 

 

What do you think?  Am I way off?  Did I leave off any good tools that some of you are using out there?  I'd love to hear from you.

It goes without saying that patching and updating your systems is a necessity.  No one wants to deal with the aftermath of a security breach because you forgot to manually patch your servers over the weekend, or your SCCM/WSUS/YUM solution wasn't configured correctly.  So how do you craft a solid plan of attack for patching?  There are many different ways you can approach patching, in previous posts I talked about what you are patching, and how to patch Linux systems, but we need to discuss creating a strategic plan for ensuring patch and update management don't let you down.  What I've done is laid out a step by step process in which you will learn how to create a Patching Plan of Attack or PPoA (not really an acronym but looks like a real one).

 

Step 1: Do you even know what needs to be patched?

The first step in our PPoA would be to do an assessment or inventory to see what is out there in your environment that needs to be patched.  Servers, networking gear, firewalls, desktop systems, etc.  If you don't know what's out there in your environment then how can you be confident in creating a PPoA??  You can't!  For some this might be easy due to the smaller size of their environment, but for others who work in a large enterprise with 100s of devices it can get tricky.  Thankfully tools like SolarWinds LAN Surveyor and and SNMP v3 can help you map out your network and see what's out there.  Hopefully you are already doing regular datacenter health checks where you actually set your Cheetos and Mt. Dew aside, get our of your chair and walk to the actual datacenter (please clean the orange dust off your fingers first!).

 

Step 2:  Being like everyone else is sometimes easier!

How many flavors of Linux are in your environment?  How many different versions are you supporting?  Do you have Win7, XP and Win8 all in your environment?  It can get tricky if you have a bunch of different operating systems out there and even trickier if they are all at different service pack levels.  Keep everything the same, if everything is the same, then you'll have an easier time putting together your PPoA and streamlining the process of patching.  Patching is mind numbing and painful, you don't want to add complexity to patching if you can avoid it.

 

Step 3:  Beep, beep, beep.... Back it up!  Please!

Before you even think about applying any patches, your PPoA must include a process for backing up all of your systems prior to and after patching.  The last thing anyone wants to do is have a RGE on their hands!  We shouldn't even be talking about this, if you aren't backing up your systems, run and hide and don't tell anyone else (I'll keep your secret).  If you don't have the storage space to back up your systems, find it.  If you are already backing up your systems, good for you, here's a virtual pat on the back!

 

Step 4:  Assess, Mitigate, Allow

I'm sure I've got you all out there reading this super excited and jonesing to go out and patch away, calm down, I know it's exciting, but let me ask you a question first.  Do you need to apply every patch that comes out?  Are all of your systems "mission critical"?  Before applying patches and creating an elaborate PPoA, do a risk assessment to see if you really  need to patch everything that you have.  The overhead that comes with patching can sometimes get out of hand if you apply every patch available to every systems you have.  For some, i.e. federal, you have to apply them all, but for others it might not be so necessary.  Can you mitigate the risk before patching it?  Are there things you can do ahead of time to reduce the risk or exposure of a certain system or group of systems?  Finally what kind of risks are you going to allow in your environment?  These are all aspects of good risk management that you can apply to your planning.

 

Step 5:  Patch away!

Now you have your PPoA and you are ready to get patching, go for it.  If you have a good plan of attack and you feel confident that everything has been backed up and all risks have be assessed and mitigated, then have at it.  Occasionally you are going to run into a patch that your systems aren't going to like, and they will stop working.  Hopefully you've backed up your systems or better yet, you are working with VMs and you can revert back to an earlier snapshot.  Keep these 5 steps in mind when building out your PPoA so you can feel confident tackling probably the most annoying task in all of IT.

Let's talk about patching for our good friend Tux the Linux Penguin (if you don't know about Tux, click here.).  How many of us out there work in a Linux heavy environment?  In the past it might have been a much smaller number, however with the emergence of virtualization and the ability to run Linux and Windows VMs on the same hardware, it's become a common occurrence to support both OS platforms.  Today I thought we'd talk about patching techniques and methods specifically related to Linux systems.   Below I've compiled a list of the 3 most common methods I've used for patching for Linux systems.  After reading the list you may have a dozen more way successful and easy to use methods that the ones that I've listed here, I encourage you to share your list with the forum in order to gain the best coverage of methods to use for patching Linux systems.

 

Open Source Patching Tools

There are a few good open source tools out there for use in patching your Linux systems.  One tool that I've tested with in the past is called Spacewalk.  Spacewalk is used to patch systems that are derivatives of RedHat such as Fedora and CentOS.  Most federal government Linux systems are running Red Hat Enterprise Linux, in this case you would be better off utilizing the Red Hat Satellite suite of tools to manage patches and updates for your Red Hat system.  In the case, your government client or commercial client allows Fedora/CenOS as well as open source tools for managing updates, then Spacewalk is a viable option.  For a decent tutorial and article on Spacewalk and it's capabilities, click here.

 

 

YUMmy for my tummy!

No, this has nothing to do with Cheetos, everybody calm down.  Configuring a YUM repository is another good method for managing patches in a Linux environment.  If you have the space, or even if you don't you should make the space to configure a YUM repository.  Once you have this repository created you can then build some of your own scripts in order to pull down and apply them on demand or with a configured schedule.  It's easy to set up a YUM repository, especially when utilizing the createpro tool.  For a great tutorial on setting up a YUM repository, check out this video.

 

 

Manual Patching from Vendor Sites

Obviously the last method I'm going to talk about is manual patching.  For the record, I abhor manual patching, it's a long process and it can become quite tedious if you have a large environment.  I will preface this section by stating that if you can test a scripted/automated process for patching and it's successful enough that you can deploy it, the please by all means, go that route.  If you simply don't have the time or aptitude for scripting, then manual patching it is.  The most important thing to remember when you are downloading patches via FTP site, you must ensure that it's a trustworthy site.  With RedHat and SUSE, you're going to get their trusted and secured FTP site to download your patches, however with other distros of Linux such as Ubuntu (Debian based) or CentOS, you're going to have to find a trustworthy mirror site that won't introduce a Trojan to your network.  The major drawback with manual patching is security, unfortunately there are a ton of bad sites out there that will help you introduce malware into your systems and corrupt your network.  Be careful!



That's all folks!  Does any of thing seem familiar to you?  What do you use to patch your Linux systems?  If you've set up an elaborate YUM repository or apt/get repository, please share the love! 


tux.jpg Tux out!!

All of us that have had any experience in the IT field have had to deal with patching at some point in time.  It's a necessary evil, why an evil?  Well if you've had to deal with patches then you know it can be a major pain.  When I hear words like SCCM or Patch Tuesday, I cringe, especially if I'm in charge of path management.  We all love Microsoft (ahem), but let's be honest, they have more patches than any other software vendor in this galaxy!  VMware has their patching, Linux machines are patched, but Windows Servers, there is some heavy lifting when it comes to patching.  Most of my memories or experiences of staying up past 12 am to do IT work has revolved around patching, and again, it's not something that everybody jumps to volunteer for.  While it's definitely not riveting work, it is crucial to the security of your server, network device, desktops, <plug in system here>.  Most software vendors are good about pushing out up to date patches to their systems such as Microsoft, however there are some other types of systems that we as IT staff have to go out and pull down from the vendor's site, this adds more complexity to the patching.

 

My question is, what are you doing to manage your organization's patching?  Are you using SCCM, WSUS or some other type of patch management?  Or are you out there still banging away at manually patching your systems, hopefully not, but maybe you aren't a full blown enterprise.  I'm curious, because to me patching is the most mundane and painful process out there, especially if you are doing it manually.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.