Closed

Closed due to inactivity. Received 27 votes with last vote on 12 Dec 2019.

Improved Log Aggregation / Correlation / Search

Presently, NPM is the center of your network monitoring solution, and you likely collect syslog / traps with it. It works pretty well- but we can always make it better. Let's think about how cool it would be if we were able to aggregate windows event logs,

IIS logs, random flat file logs, etc. And around all this data slurping we wrapped around an instant search and filtering capability. Go on, drill down through time ranges and find your needle in the haystack. Maybe we even have the ability to visualize the results

over time and some nifty canned reports for common issues. What data sources are important to you? How would you use your new-found power? Would you want to integrate with any other existing systems you have? Discussion below...

  • what sort of criteria are you going to be allowing for in 3rd party application log file names ?

    Will there be the ability to generate alerts based upon matched strings from specific log files ?

    How will it handle log files that roll every few hours ?

    How close to real/near time will the "slurping" of log file data be ?

  • We are investigating using splunk to watch for certain events in various event logs and then forward them to orion in order to generate alerts and help provide input to some component monitors.

    Our biggest pain is some event logs have very unique names that change throughout the day depending on how often they roll due to size or some other constraint.  We have other challenges

    where we are having to watch 10 different logs for a number of different strings on the same server...times 3 - 10 servers.  That's a whole lot of powershell having to do things across the network.

    By using splunk it can handle most of those issues while doing the parsing on the fly and only index the items we are looking for.  So it will preprocess them and reduce much of our overhead.

    I am a firm believer that a logfile parser needs to be agent based especially if you have a rather verbose logfile..

  • Richard,

    Thanks a bunch for the detail. We'll ping you offline to see if you would happen to have availability to chat. Most appreciated sir.

  • In a PCI/HIPPA environment it's useful to have splunk cleanse the logfiles of any credit card, PII, or other sensitive data before loading them into the server, and then restrict the types of search. If you're handling PPT (position of public trust) data (e.g. student loans) then you can restrict the data to those with security clearance. Authorization is a big deal in big data environments.

  • ok, as a start I would like all of the trace and logfiles written by all of the SolarWinds applications to be automatically loaded into this database so we don't need to look in random directories on the server trying to find the logfile with the SQL error in it.

    We're using fortianalyser for our fortinet logs; splunk for our academic applications; grep for the server and historical network logs; Solarwinds for the current network logs; dhcpserver logs go onto the enterprise message bus for consumption by other applications. I've no idea if the windows logs go anywhere.

    If you're targeting a splunk replacement then this might be of interest, and I could talk about our ideal architecture and how we would need to be able to scale it out. You would be competing on price with splunk (and its competitors), which may be an insurmountable challenge.

  • So let's say in this magical future we have a separate DB to handle all this data and indexing. Do you use something presently for log aggregation?

  • Our full Syslog data for the network devices alone is over 50GB/day; it would be bad to drop this into the NPM operational database. Trap data is quite a bit lighter at only a couple of GB/Day. It's orders of magnitude larger for the server logs. fortunately I don't have to take the whole feed for network events and I am working on steadily converting syslog events into actionable alerts.

    I would like a splunk-like query language for the netflow data

  • I wouldn't say there's any real alert management, just alert creation.   The alerts are sent do a distribution group and then individuals action those alerts as needed.

  • Interesting benefit.

    Then where do you send the alerts? How do you manage them?