Compared to logging monitoring is nice and clean. In monitoring you are used to look at data that is already normalized. So you basically have the same look and feel for statistics from different sources like e.g.

switches, routers and servers. Of course you will have different services and checks across the different device types but some of these interface statistics can be compared easily with each other. Here you are always

looking at normalized data. In the Logging world you are facing very different Log types and formats. An interface that is down will look identically in the monitoring no matter if it is on the switch or the

connected server interface. If you now want to find the error massage for the interface that is down in the Logs of the switch and the server you will find two completely different outputs. Even how you can access

the logs is different. On a switch it is usually a ssh connection and a show command and on a windows based server eventually a RDP session and the "Event Viewer".

This is mostly a manual and time consuming process to compare the different logs with each other and find the root cause for the problem. An other problem is that many devices have only a limited storage for logs or

even worse loosing all stored logs after a reboot. Sometimes after an unexpected reboot of a device you end up with nothing in your hands to figure out what has caused the reboot.

 

We can do better with sending all the Logs to a centralized Logging server. It stores all log data independently from the origin. That reduces also the needed time for information gathering. Often you will see once all

the logs are concentrated at one point that many devices have different time stamps on their log massages. To make the logs easy consumable it is important that all log sources have the same time source and pointing

to a synchronized NTP server. Once the centralization problem is solved the biggest benefit comes from the normalization of the Log Data into logical fields that are searchable. This is something that is often done by

a SIEM solution that has been implemented to address the security aspect of logging. But I have seen a lot of SIEM projects where the centralized logging and normalization approach also improves the troubleshooting

capabilities significantly. With all the logs on the same place and format you can find dependency that are not visible in the monitoring. For example I was facing periodically reboots on a series of modular routers.

In the monitoring all the performance graphs looked normal and the router was answering to all SNMP and ICMP based checks as expected until it reboots without any warnings. So I looked into the log data and found

that 24 hours before the reboot was happening that on all of the routers a "memory error massage" was showing up.

Because the vendor needed some time to deliver a bug fix release that addressed this issue we needed a proper alarming for that. So every time we captured the " memory error massage" that triggered the reboot on the

centralized logging server we created an alarm , so that we could at least prepare a scheduled reboot that was manually initialized in a time frame when it effected less users. That was a blind spot in the monitoring

system and sometimes we can improve the alarming with the combination of Logging and active checks. So afterwards you have found the root cause for your problem ask yourself how you can prevent it from causing

an outage the next time. When there is the possibility to achieve that with logging this is worth the effort. You can start small and add more log massages over time that trigger events that are important for you.