Log Management for a System Admin

Version 2

    With rob.johnson

     

    Systems Administrators face a unique challenge when tasked with managing log collection because they have to support a diverse environment. Typically, SysAdmins are responsible for servers and applications, which are then deployed on several different operating systems, database platforms, and, of course, virtualization and storage. Because of this diversity, it is very important to put a plan in place for centralizing your log collection and before actually collecting the logs.

     

    Your first inclination may be to start enabling every audit feature possible on your systems and applications. However, within just a few minutes, you will find out just how much log data can be generated in a short time, and now you have to turn that information to useable data.

     

    Regardless of how you plan to manage your log collection, there a few items to consider about what logs you should collect and what impact the process will have on your network. Below, I present a few recommendations that may help you with planning and collecting logs. As professionals in your field, I think it would be interesting to learn about your experiences, best practices, gotchas, and general advice on log collection. Feel free to lend your thoughts on the topic.

     

    1. How will log collection and storage impact your system resources? Whether it’s a free Syslog Server on a VM or a feature-rich log management solution, you will need to provide resources (i.e. memory, CPU, and storage).
      • Storage is almost always the most expensive resource. Make sure you have a detailed understanding of how your logging solution interacts with storage and whether there is any compression that is either configurable or automated. Virtual appliances, or even VM-based Syslog Servers are more flexible. However, if you plan to use a standalone physical box, you will need know your storage costs ahead of time. Performance monitoring tools in your hypervisor or on the physical servers are a great way to test the impact of logging. I recommend using one of your busiest systems for testing, such as a domain controller.  Configure the maximum level of auditing or logging, then monitor disk performance to measure the impact.
      • Memory and CPU will take the next hit. Real-time collection tends to use more sustained memory and CPU. The amounts vary based on volume because log collection and storage occur simultaneously. Scheduled collection or polling may create a spike in usage for a short time; however, you can create a collection schedule for off-peak hours to minimize the impact. Spend some time and figure out how you want to use the logs you collect. Is it just for storage and reporting? Real-time incident response? Knowing how your logs will be used helps you make the right choices when it comes to collection. As described in the previous statement, you can perform tests with some high-volume systems to get a good idea of resource impact.
      • Review your documentation to understand how logging will impact your devices, servers, and applications once it’s enabled. For example, Windows®-based systems allow you to limit the amount of events that are stored locally. I advise storing only a small amount of data on the local server, then set your event logs to “Overwrite as needed.”  Microsoft® has a useful document that details Windows Audit Policy configuration: http://technet.microsoft.com/en-us/library/ee513968(WS.10).aspx. Linux® operating systems are usually controlled through audit daemon. Auditctrl (or Audit Control) rules are used to define exactly what you want to audit from the OS. Here is an article that breaks down the use of a Linux Audit: http://doc.opensuse.org/products/draft/SLES/SLES-security_sd_draft/cha.audit.comp.html
      • Applications that reside on the operating systems will normally log to separate log files and they can vary, based on file types, from flat text files to database tables. Because they also sit on the server, it is best to test the impact of logging on the local server as well as the application. Pay close attention to how the log files are created and stored by the application. Applications, like IIS, create hourly or daily files that will use up space quickly if you don’t configure a rotation play. The same applies to Linux auditing; however, you will usually have to add an entry to a configuration file within the OS or the app itself. If I have a logging solution, I will always err on the side of caution and attempt to store as little data as possible on the local server to limit resource impact.
    2. Determine logging requirements. Regulatory requirements are usually the biggest influence on centralized log storage. However, troubleshooting and root cause analysis are strong runners-up. Either way, a detailed understanding of any requirement ensures that you collect the right information from the right devices.
    3. Understand what information your systems log, and how. As I mentioned above, the amount of logging generated by different devices, operating systems, and applications can be enormous. So, along with determining the actual requirements for which logs you need to collect, it is important to understand how each different system generates logs.
      • For example, Windows operating systems offer a detailed audit configuration (see image below) that allows you to be selective about which logs will be generated (i.e. logons/logoffs, privileged use, system events, and others). Additionally, you can choose whether to use successful or failed events, which allows you to take a more granular approach to log configuration.
        • winauditconfig.png
      • Linux can also be configured for granular auditing based on auditctrl rules, as mentioned earlier.
        • auditctrl.png

     

    Hopefully, the suggestions above provide some insight into preparing your logging environment.  Logs contain a ton of useful information. When they are collected properly, they can help you improve security and quickly resolve network and systems issues. Creating a logging plan before you set up your log collection process can be the difference between long hours of digging through useless data and quickly finding what you need.

     

    This post is part of our Best Practices for Log Management Series. For more best practices, check out the index post here: Best Practices for Log Management