Log aggregation

Way back in the past I used to view logs after an event has happened. This was painfully slow, especially when viewing the logs of many systems at the same time.

Recently I've been a big fan of log aggregators. On the backend it's a standard log server, while all the new intelligence is on the front end.

One of the best uses of this in my experience is seeing what events have occurred and which users have made changed just before. Most errors I've seen are human error. Someone has either fat fingered something or failed to take into account all the variables or effects their change could have. The aggregator can very quickly show you that x amount of routers have OSPF flapping, and that x user just made a change 5 minutes ago.

What kind of intelligent systems are you using on your logs? Do you use external tools, or perhaps home grown tools to run through your logs and pull relevant information and inform you? Or, do you simply use logs as a generic log in the background to only go through when something goes wrong?

Parents
  • We spun up a VM for a PoC. You can use Splunk for free up to 500MB of indexing volume/day, but they gave us a short-term unlimited license so we could really test it out. We installed several of the pre-made apps, such as Cisco IOS, Cisco Security Suite, and Cisco ISE. I think we were in PoC phase for a couple of months, but we got value almost from the start. The Security and IOS apps identified issues within a few days of pulling in logs.

    We didn’t hire professional services, but we had Splunk come on site on several occasions answering questions and providing some training.

    We have since gotten other teams onboard sending logs…..active directory, Citrix, VMWare, BlueCat DHCP/DNS, etc… I am still in a mode of spending time with the data to see what else we can glean from it, but we already have some custom dashboards that are used by our NOC for troubleshooting VPN and authentication issues.

    Romeo

Comment
  • We spun up a VM for a PoC. You can use Splunk for free up to 500MB of indexing volume/day, but they gave us a short-term unlimited license so we could really test it out. We installed several of the pre-made apps, such as Cisco IOS, Cisco Security Suite, and Cisco ISE. I think we were in PoC phase for a couple of months, but we got value almost from the start. The Security and IOS apps identified issues within a few days of pulling in logs.

    We didn’t hire professional services, but we had Splunk come on site on several occasions answering questions and providing some training.

    We have since gotten other teams onboard sending logs…..active directory, Citrix, VMWare, BlueCat DHCP/DNS, etc… I am still in a mode of spending time with the data to see what else we can glean from it, but we already have some custom dashboards that are used by our NOC for troubleshooting VPN and authentication issues.

    Romeo

Children
No Data
Thwack - Symbolize TM, R, and C