Logging; Without a complete picture, what’s the point?

Just the other day (Okay, it was a few weeks ago) I was having a discussion about logging with a “small” Fortune 50 company.  Their problem was… They wanted a more intelligent way to analyze the information they are logging so they could help troubleshoot or understand problems in their environment easier.  This is obviously a capability we all would love, intelligence out of our data collection, systems and event log subsystems. 

Oh but logging intelligence doesn’t come without its challenges, you tell me if you experience some of the same challenges they expressed because this really throws a wrench into the works.

- Only collecting logs from some systems not every single one of them

- Not collecting Windows Event Logs, Syslog, or detailed logging from every server or device

- Inability to ingest the information of the existing logs which are being collected

- Unable to keep long collections of information in accord with compliance due to lack of allocated storage

Now let’s not even bring compliance or regulatory requirements into this, because imagine the above challenges, at scale and then retention over the course of 7 to 10 years depending upon who’s “rules” you need to follow.

You might be asking yourself just as I was asking while we were discussing this; If your unable to collect the data fast enough, without enough space to store it for a long enough duration, from an incomplete picture of your entire infrastructure… What’s the point? I mean what if we were trying to more than merely troubleshoot a problem and had to react or respond to a breach which seems all the rage these days?

With breaches like the ones which are making all the news having some elements of intelligence to analyze, interpret and act upon the data would be ideal, however without a complete picture of the environment, or only selectively logging it gives us an incomplete ability to react and respond to incidents.

The challenges we all face when it comes to logging collection can be paramount to a successfully defended and understood infrastructure.

Are there other challenges you see organizations face?

Do you find logging to be more of a ‘set it and forget it’ never to look at unless troubleshooting or responding to an incident?

I know it’s difficult to ask these questions without implicitly exposing your environment by saying, “Yes we have an incomplete logging solution” which is why it can be a sensitive topic to discuss.

What are your thoughts, is this off the mark and these issues are few and far between? I’d love to hear your thoughts on this matter.

  • Almost all of these comments talk of filtering, but of course so many investigations are looking to correlate different activities on different nodes. Filtering at source will by definition mean that there are huge gaps in the captured information and if you don't capture the data you can't filter it.I know it costs space to store & grunt to process, but if it's necessary, then well -  that's that then.

    It's the age old decision of how important is it & how do I get from here to there. First ping a node, then add standard SNMP polling to it, then WMI, SAM, AppInsight & log/event capture, but implement it in an incremental way and reasonably consistently across the entire estate. To get an enterprise wide view of the estate's operation there's no point having a handful of nodes monitored to death and then another bunch just pinging.

  • If you can afford to log everything, you will better be able to convince yourself of what is not happening.  If your logging is sporadic or overly narrowed, the clues you need may never have been captured and you may have lingering doubt about what has really been ruled out.  If your retention is minimal, good luck investigating an issue that you become aware of much, much later.  For whatever you can't afford to log, ensure that your management is aware.  Don't let visibility on the issue peter out.  Document the tradeoffs your management may force you into.  You may need that information when you are explaining why you can't provide the data you've been asked to provide.

  • It's a great justification for our obscene bill rates emoticons_happy.png

  • pertinent is sometimes discovered later but is generally discovered through knowledge and experience to a specific environment. (SME)

    It is not a great answer but the best I can give at this point tin time.

  • The relevance of log data if often, unfortunately, determined at a time when you need it but don't have it.

    I guess it comes from past experience. When a situation occurs and you find out that part of the log isn't available, you quickly make the necessary changes but that particular situation may never occur again.

Thwack - Symbolize TM, R, and C