In part 1 we discussed the following points:

  • Identify server hardware failure – when, where and how
  • Determine if the applications running on the server are performing properly or creating issues
  • Ensure server security, high availability and performance to meet business service requirements

The other component to Server Monitoring is Hardware related.

Hard Disk Monitoring:

The hard disk is the device that the server uses to store data.  The data stored is permanent (survives a reboot unlike RAM) and is available till it is consciously erased by the end user.

It’s important to monitor hard disk for a couple of reasons: The operating system needs space on the disk for normal operating processes including paging files and caches. The applications running on the server also needs space to write temporary data to cache. Low free space on a drive is one of the reasons for file system fragmentation which causes severe performance issues.

Server Hardware Monitoring:

A server has many hardware components that need monitoring. Server performance monitoring issues may be due to a malfunctioning or failing component in the hardware.

  • Monitoring CPU Fan

The fan draws heat away from the CPU by moving air across a heat sink to cool a particular component.  If the CPU fan fails, the server will eventually overheat causing your server to become unavailable. To prevent this from happening, you should monitor CPU Fan speed. Monitoring the historical data of fan RPM is one way you can keep a watch for any sudden spikes in the fan RPM.

  • Power Supply

A power supply unit (PSU) converts AC to low-voltage regulated DC power. To have visibility into this metric it’s important to monitor amperage, voltage, and wattage of the power supply.

  • Temperature

This refers to the temperature of the system board or mother board. Unusually high temperatures can cause permanent damage to the server, and will affect server performance adversely. Safe working temperature limits can be obtained from the manufacturer. These must be monitored to ensure they do not exceed this safe range for efficient server monitoring.

  • Environmental Factors

Temperature, air flow, and humidity are important parameters to monitor. Problems with temperature may be a direct result of faulty A/C, improper air flow, and dangerous humidity levels. Other Components to Monitor include CMOS Battery, Disk array health, Intrusion detection and CPU hardware status.

