cancel
Showing results for 
Search instead for 
Did you mean: 

Log aggregation

Level 9

Way back in the past I used to view logs after an event has happened. This was painfully slow, especially when viewing the logs of many systems at the same time.

Recently I've been a big fan of log aggregators. On the backend it's a standard log server, while all the new intelligence is on the front end.

One of the best uses of this in my experience is seeing what events have occurred and which users have made changed just before. Most errors I've seen are human error. Someone has either fat fingered something or failed to take into account all the variables or effects their change could have. The aggregator can very quickly show you that x amount of routers have OSPF flapping, and that x user just made a change 5 minutes ago.

What kind of intelligent systems are you using on your logs? Do you use external tools, or perhaps home grown tools to run through your logs and pull relevant information and inform you? Or, do you simply use logs as a generic log in the background to only go through when something goes wrong?

26 Comments
zackm
Level 15

Generally, the clients I work with do not have log aggregators. And the ones that do were not really using them for "forensics" so to speak. More aptly, they used them as a single IP address for all logs to go to so they could simplify their configurations across the enterprise. (These are generally global companies or companies with every single device sending informational-level syslogs, etc.)

To be fair, I have worked with a good number of companies that are not sure what the security team does or what technologies they own. I have always found this a little weird. Coming from a background where everyone was expected to know how to perform your neighbor's job at a functional level, it is just strange to be in contrast to that train of thought I guess.

I generally try to fashion my LEM engagements in the manner that you are referring to. It is OK to see that X happened, but if we can narrow down the causes to Y number of possible correlations... well then we have moved from being a plain-jane SIEM to something that is a really cool, $$$ saving, functional tool that engineers will use for more than creating reports to send to the auditors. On that note, I also try to impress that policy requirements, while important, are not even  the tip of the iceberg when it comes to the capabilities of these tools. Out of the box thinking with the technology almost always results in finding some process automation hiding that saves you a few headaches a month!

theflyingwombat
Level 9

I have never used log aggregation in the past. I need to start doing some research and see if this is something that will work for us. Currently logs are just being dumped on a log server and only looked at if there are issues.

RomeoG
Level 10

We (our Network team) implemented Splunk this year. We are also grabbing firewall logs and server logs from our virtual environment. It's pretty crazy the kind of searches and dashboards that can be built quickly, once you understand the search language. And it's not just about pretty dashboards for management. It's easy to identify anomalies looking at trending charts/graphs. Our NOC has been able to leverage the correlated data from various sources to help troubleshoot real issues.

clubjuggle
Level 13

We are using log aggregation and it has proven extremely helpful. We use it both proactively and reactively.

tcbene
Level 11

I guess were still old school.  Our logs are used in manual searches for troubleshooting and cursory reviews throughout the day.

byrona
Level 21

RomeoG I would love to hear what your Splunk roll-out process looked like from the time you purchased the product until the time you started receiving value from it.  I haven't spent a lot of time with it which is why I am curious.  Also, did you use any professional services or was the project all done in house?

byrona
Level 21

zackm what does your typical LEM engagement look like?  What specific things help your clients receive the most value from the product?

byrona
Level 21

Most of my log aggregation implementations have been specifically to satisfy compliance requirements.  I have leveraged these implementations using some intelligence to harvest malicious IP addresses and then watch for any internal activity by those IPs.  Have also created rules to look for problems with VPN tunnels.  Those are the two most recent use cases that come to mind. 

We are just now starting up a project to roll out a robust centralized log management and aggregation solution and I am hoping to use LEM as part of that solution.

aaron.j.denning
Level 12

We built a log aggregation is pretty much like all the others out there just built specifically for our company

syldra
Level 12

It's a plan for next year. Curently only looking at the logs when a problem arises, and then not always...

syldra
Level 12

I'll be following this discussion closely because as much as I would love to use our logs for prevention, I see no easy (and cheap) way to do this. Since only someone in the know could see the benefit of doing this, I guess the board might object paying for that.

RomeoG
Level 10

We spun up a VM for a PoC. You can use Splunk for free up to 500MB of indexing volume/day, but they gave us a short-term unlimited license so we could really test it out. We installed several of the pre-made apps, such as Cisco IOS, Cisco Security Suite, and Cisco ISE. I think we were in PoC phase for a couple of months, but we got value almost from the start. The Security and IOS apps identified issues within a few days of pulling in logs.

We didn’t hire professional services, but we had Splunk come on site on several occasions answering questions and providing some training.

We have since gotten other teams onboard sending logs…..active directory, Citrix, VMWare, BlueCat DHCP/DNS, etc… I am still in a mode of spending time with the data to see what else we can glean from it, but we already have some custom dashboards that are used by our NOC for troubleshooting VPN and authentication issues.

Romeo

cahunt
Level 17

A world where everyone lives and breathes and plays together.. Logs get along with other logs and the world is magical.

  .... i think logs are the last covet around here.

       Share them?   Preposterous!

lfaulkner
Level 9

Have to agree with cahunt Log aggregators do indeed sound the thing of dreams.

cahunt
Level 17

Share and Share alike... it's okay with me, as everyone keeps talking about this new thing called transparency. There are those old dogs who like to be Silo'ed off from the rest of us. Change the atmosphere; you might have to change the location to get that done.

I would prefer aggregation, or at least for everyone to just add my Orion Server IP to the sections of where Logs are Sent. There I go, dreaming again.

byrona
Level 21

If you don't mind sharing, what bits of information did you specifically glean value from?  I ask because Log Management and SIEM deployments can be really hit and miss so I am always interested to see the specific points where people are gaining value.

patrick.mchenry
Level 11

We are currently using Logrhythm and we can see logs real time to find issues. We can also go back to a certain data and view logs

jim.couch
Level 11

As Patrick said, Logrhythm but I only wish that more teams would have viability into those logs.  I think a "if my nodes are sending you logs then I get access to them" rule should apply.  Otherwise, it does me no good.

jswan
Level 13

We've been rolling out ELKstack (Elasticsearch/Logstash/Kibana). Right now we have about 45 days of logs (about 200 million entries) on tap, but we're still some way from having all devices reporting. Searching and aggregating is near-instantaneous for the types of queries we're usually doing. Most of our teams have embraced it wholeheartedly, which was not the case with any commercial products we've tried.

It does require substantial compute and storage resources, and there's no one-size-fits-all implementation template, so we're figuring it out as we go along.

We did our PoC on a single Linux VM, then moved to a 3-node Elasticsearch cluster running on Windows Server 2012 with a single Logstash collector, also running Server 2012.

byrona
Level 21

jswan what do you think the difference was between this solution and the commercial solutions that caused the teams to be more willing to embrace it?

Jfrazier
Level 18

Splunk looks like it will provide of what we need here.  The are some freeware correlation tools that may be useful to some especially if you can run it on a unix based host.

SEC (Simple Event Correlator) is perl based and allows for may types of event rules and actions.

It was originally designed to process ASA firewall logs.

jswan
Level 13

Probably the ease of use: we are using nxlog to deliver Windows event logs to Logstash, which is really easy for Windows admins to set up on their own. ELK is very JSON oriented, and nxlog has a native JSON output, so the whole log ingestion pipeline requires very little configuration. Then, since Kibana has a very simple GUI on top of a super-fast free-text search engine, people can get going on their own with little guidance.

Splunk is the closest thing I've seen in this regard, but we demoed it several major versions ago when it wasn't quite as polished.

The other tools we've used all required a lot of preconfiguration before you could see results, and seemed designed around specific workflows that don't fit everyone. ELK is much more free-form, due in part to its full-text, schema-free design. This is also one of its drawbacks: it's not SQL, so if you need to do traditional SQL operations it's not for you. Off the top of my head:

Advantages

Full text search

Extremely fast for searching, aggregating, counting, charting

Very scalable

Relatively easy to use

Core apps are all free (commercial support available)

Logstash is incredibly flexible and can handle almost any kind of data you can think of

Cross platform (Java-based, so works anywhere Java does)

Good community support

Very resilient to failures if well-designed

Disadvantages

No built-in alerting (there are many roll-your-own solutions for alert-like behavior, but none of them are easy)

No built-in RBAC or security (a commerical RBAC/security plugin was just announced, and there are various roll-your-own approaches)

Compute/storage intensive

Backup and restore is very non-traditional

Not designed as a long-term archiving solution (although again, there are roll-your-own workarounds)

The main complaints I see are coming from people who dive into it thinking that it's a) a free Splunk clone (it's not, but there are definitely similarities), b) a SQL-like database, or c) an alerting/correlation engine. There are various alerting/correlation engines built on top of Elasticsearch back-ends, but that's not an out-of-the box capability. I guess another caveat is that for large deployments you will need some sort of message buffering infrastructure. This is not a native part of ELK and is a whole Linux world unto itself; the most common tools used are RabbitMQ and Redis, but I've seen many others mentioned. We haven't needed that layer yet.

jswan
Level 13

If you're a Linux shop, I'd also throw out a plug for ELSA. It uses a mysql back-end with a Sphinxsearch front-end for full-text indexing. It has a steeper learning curve and requires more care-and-feeding than ELK on the data ingestion side, but it offers SQL capabilities as well as alerting and LDAP integration. It's also pre-installed and fully integrated with the open Security Onion NSM project, which is the best network security monitoring tool there is, commercial or not.

byrona
Level 21

Thanks for sharing and being so honest with your information!

goodzhere
Level 14

Unfortunately, we tend to go through the logs when there is an issue.  We really need to get some alerts setup to trigger when certain logs are received.  We certainly have a ways to go in our environment in terms of logging.

jkump
Level 15

Looking at implementing a SIEM with log aggreation for this purpose.  Definitely, need to be able to develop forensically deep when something happens.  Fortunately, we have an extremely communicative department that discusses and reviews changes before they are rolled out.  This gives us a good baseline to run against logs to verify what changed.