Log Management for a Network Admin

Version 2

    With rob.johnson


    As a Network Admin or Engineer, you may be tasked with or have the need to centralize your network device logs into a log management system. Network devices are some of your busiest systems, and as a result, they create large volumes of log data. With that in mind, it’s critical to have a plan in place BEFORE you start collecting logs.

     

    What logs should you collect from networking devices and what should you plan for?

     

    Your first instinct may be to enable debug logging on every device. However, within the first few minutes, you will be buried in millions of logs. Making sense out of a huge pile of logs is a challenge, to say the least. Network Admins have some specific use cases for log collection, such as traffic analysis, troubleshooting, compliance, and change monitoring.

     

    Below, I will present some best practices that I have learned while working with Network Admins and in preparing my own network for log collection. Please share your own tips and tweaks that have helped you with log management, and be sure to warn us of any pitfalls you have experienced in the process!

     

    1. What will be the impact of log collection on your log management system? Log management systems range from a simple Syslog Server to a feature-rich log management system and SIEMs deployed as a physical or virtual appliance. In either case, you will have to consider the memory, CPU, and storage impact that log collection will have on these systems.
      • Storage is almost always the most expensive resource. Make sure you have a detailed understanding of how your logging solution interacts with storage and whether there is any compression that is either configurable or automated. Virtual appliances, or even VM-based Syslog Servers offer more flexibility because they can attach to different types of storage (i.e. SAN or NAS). However, it is wise to test the disk’s read and write performance on the host system. I highly recommend using dedicated or reserved resources for virtual log management systems. Physical appliances are static, so it is important to get an idea of how much data you plan to collect. Typically, this is calculated in Events Per Second (EPS) or total volume per day in megabytes and gigabytes.
      • Memory and CPU will take the next hit. In my experience, the majority of network devices that send logs will either use Syslog or SNMP traps. This means they will be sending data in real-time. Collecting data in real-time tends to use more sustained memory and CPU. The amounts vary based on volume because log collection and storage occur simultaneously. Some devices offer scheduled collection or polling, which may create a spike in resource usage for a short time. However, to minimize the impact, you can set the collection schedule for off-peak hours. Spend some time and figure out how you want to use the logs you collect. Is it just for storage and reporting? Real-time incident response? Knowing how you will use your logs helps you make the right choices when it comes to collection and storage.
    2. How will logging impact your network devices and bandwidth?
      • Review your documentation to understand the impact that logging will have on your network devices. I have found that, in most cases, it is important to disable or limit any “local” or "console" logging. As an example, here is an article from Cisco® on logging best practices for an ASA: http://www.cisco.com/web/about/security/intelligence/firewall-best-practices.html#_Toc332805980. This article briefly describes the impact that console logging will have on the firewall. If you have deployed a log management solution, I recommend turning console logging off, if possible.
      • Bandwidth is ALWAYS a concern. And it seems the more you have the more you use. Syslog and SNMP will have an impact on bandwidth, primarily because it’s a constant stream of data that is typically uncompressed. To minimize the impact on bandwidth, try localizing your log collection from remote sites with Syslog Servers or remote collectors and filter that data up to a primary system. Many log management systems also offer an agent that provides encryption and compression. Another option is to install something like a Snare agent that has a bandwidth rate limitation feature.
    3. Determine whether you have any specific requirements for logging. Compliance regulations usually drive this conversation. However, you may also have specific internal requirements for legal purposes. Either way, a detailed understanding of any requirement ensures that you collect the right information from the right devices.
      • For example, PCI provides some detail on what should be collected from your audit logs. See page 56 of this document for a complete list of details: https://www.pcisecuritystandards.org/documents/pci_dss_v2.pdf
        • User identification
        • Type of event
        • Date and time
        • Success or failure indication
        • Origination of event
        • Identity or name of affected data, system component, or resource
      • HIPAA is a bit more general:
      • Finally, if compliance is not driving log collection, then what is? As a Network Admin, your approach may be to collect TCP, UDP, and other network traffic logs to troubleshoot connectivity issues or monitor for changes. Systems Administrators may want to monitor servers and applications for errors or exceptions related to operating systems and application failures. Security professionals may want to collect security and change events from everything in the network. [TM5] Defining the problem you want to solve with centralized log collection helps with my next suggestion regarding actual log configuration levels.
    4. Understand what your systems log, and how. As I mentioned in the previous suggestion, the amount of logging generated by different devices—especially perimeter devices like firewalls and Web filtering systems—can be enormous. So, along with determining the actual requirements for which logs you need to collect, it is important to understand how each system logs information. For example, Windows® operating systems offer a detailed audit configuration that allows you to be selective about which logs will be generated (i.e. logons/logoffs, privileged use, system events, and others). Additionally, you can choose whether to use successful or failed events, which allows you to take a more granular approach to log configuration.
      • Networking devices, like routers and switches, tend to focus on severity levels like informational, critical, and debug. Each level provides different information, so your use case will determine which level is appropriate. The table below is an example of Cisco's logging levels and the information that will be contained in the logs.
        • ciscologlevels.png

     

    Hopefully, the suggestions above provide some insight into preparing your logging environment.  Logs contain a ton of useful information, and when they are collected properly, they can help you improve security and quickly resolve network and systems issues. Creating a logging plan before log collection can be the difference between long hours of digging through useless data and quickly finding what you need.

     

    This post is part of our Best Practices for Log Management Series. For more best practices, check out the index post here: Best Practices for Log Management