SNMP Polling
Simple Network Management Protocol (SNMP) polling is a method used to gather information about network devices such as routers, switches, and servers. A network management system (NMS) sends queries (polls) to SNMP-enabled devices to retrieve performance metrics, status information, or configurations. This is achieved by using Object Identifiers (OIDs) defined in the device's Management Information Base (MIB).
In simple terms, Polling is the NMS sending out queries to a device to get information. SNMP polling is widely used for monitoring device health, status, tracking resource usage like CPU, Memory, bandwidth.
Best Practices for SNMP Polling
- Optimize Polling Intervals:
- Set appropriate polling intervals to balance the frequency of data collection with network bandwidth usage. For critical devices, use shorter intervals (e.g., 3-5 minutes); for less critical devices, longer intervals (e.g., 10-30 minutes) are sufficient.
- Use SNMP Versions Securely:
- Use SNMPv3 as it provides authentication and encryption. Avoid SNMPv1 and SNMPv2c unless security is not a concern.
- Limit the Scope of Polling:
- Only poll essential OIDs to minimize the load on network devices and reduce unnecessary traffic.
- Implement Access Control:
- Restrict SNMP access to specific IP addresses and networks using Access Control Lists (ACLs) on the devices.
- Monitor Polling Performance:
- Regularly assess the performance of the polling process to identify delays or missed data caused by overloaded devices or network bottlenecks.
- Plan for Scalability:
- Design SNMP polling with growth in mind to accommodate additional devices or metrics as the network expands.
SNMP Traps
SNMP traps are asynchronous notifications sent by SNMP-enabled devices to a network management system (NMS) to report significant events or changes in status. Unlike SNMP polling, which requires the NMS to request data, traps are initiated by the device itself when a predefined condition or threshold is met. For example, a trap might be sent when a device experiences high CPU usage, a link goes down, or a power supply fails.
In simple terms, Traps are a device sending out notifications to the NMS on specific conditions that may arise on the device. Traps are essential for real-time monitoring, allowing network administrators to quickly respond to critical issues without waiting for the next polling cycle.
Best Practices for SNMP Traps Implementation
- Use SNMPv3 for Security:
- Configure SNMPv3 to ensure secure communication, providing authentication and encryption for trap messages. Avoid SNMPv1 and SNMPv2c if security is a priority.
- Define Meaningful Traps:
- Configure devices to send traps only for critical events. Specific traps make the system more efficient, for example an Internet facing router configured with sending out BGP related traps.
- Centralized Trap Receiver:
- Use a centralized NMS or trap receiver to collect and analyze all traps, simplifying management and troubleshooting.
- Filter and Prioritize Traps:
- Apply filters & categorize traps based on severity (e.g. critical, warning, informational) for prompt action on high-priority events.
- Regularly Review Trap Configuration:
- Periodically review and update trap configurations to align with changes in the network and operational priorities.
- Monitor Trap Volume and Performance:
- Keep an eye on the volume of traps to avoid overwhelming the NMS, & optimize device configurations to reduce excessive or redundant traps.
Achieving the right balance between Polling & Traps
Increasing the polling frequency means that the NMS keeps sending frequent queries to devices to get status information & performance statistics. Keeping the frequency too low, could mean losing vital performance data & status information.
Increased polling frequency could mean –
- Increased Database Size: More frequent polling will cause the database to grow rapidly.
- Increased Network load: Frequent query responses from the devices can lead to a huge amount of network traffic.
- Higher Load on the Device: The device needs to respond to frequent queries being posed to it and provide responses to each query. This may hamper device performance and may also impact critical device functions.
- Higher Load on the Polling Engine: This could result in performance issues or strain on the polling engine.
- Higher Load on the Database Engine: Frequent collection of data also means that the database engine has to frequently write onto the disks, causing strain on the CPU, Memory as well as disk input output process.
- Potential Polling or Performance Problems: Polling too often may cause delays or failures in other processes.
Recommendations on Polling & Traps
- Faster Polling for critical devices only: NMS platforms should have configurable polling intervals. Critical devices maybe polled at a faster frequency as compared to other devices. Keep in mind the database requirements, network bandwidth consumption & processing requirements.
- NMS platforms like SolarWinds offer 3 polling options – Status Polling, Statistics Polling and Device rediscovery. A practical approach is to have a 2-min Status Poll to check availability, 10-min Statistics Poll for performance metrics & 30-min device discovery to identify any device changes.
- Device Metrics: Devices like routers and switches also have counters that capture the statistics like input rate and output rate, the default timing for which is every 5 minutes, hence it is not recommended to poll faster than what the device can provide.
- Using Traps: To strike the right balance, the device should be configured to send out Traps (Notifications) to the NMS whenever a condition arises. Example when the Core switch VLAN topology changes, send out a Trap immediately rather than wait for the next polling cycle.
- Meaningful Traps should be configured on a device. For example – Routing protocol based traps for WAN devices, VLAN traps for core switches.
By following these practices, organizations can effectively leverage SNMP polling & SNMP Traps to maintain a robust and well-monitored network infrastructure, while ensuring scalability, reliability, and security.