13 Replies Latest reply on Sep 2, 2012 8:52 PM by superfly99

    Drowning in data while searching for information

    Jeremy Stretch

      Where I work, we collect logging information. A lot of it. From core systems, from customer systems, from internal systems -- probably even from systems we have no business collecting logging information from. Everybody wants to make sure their stuff has logs, so they turn it on and point it to a syslog server. Mission accomplished, right?

       

      Log data is of little use when it just sits on a disk somewhere, only to be eventually overwritten by newer log data which will again eventually be overwritten by more log data in an endless cycle. Logs are of course helpful when we're already aware that a problem exists and need to dig through a system's history to determine a cause or scope. But how many of us are actually processing log data as a routine task, looking for errors and other indicators proactively?

       

      Turning gigabytes of log data into something of value is a complex task. The idea invokes the old saying, "drowning in data while searching for information." The difference between the two is that data grows in size linearly, while information is created only when distinct pieces of data are correlated with one another. There is no doubt plenty of valuable information is hiding in all of our log data; the trick is getting to it.

       

      There are a number of tools out there for correlating log data, but it seems like automated software can only go so far. What successes have people had in correlating log data? Any unorthodox approaches people can share are especially appreciated.

        • Re: Drowning in data while searching for information
          joelgarnick

          We haven't had a lot of successes, but are looking for ways to make it happen.  We are currently using Symantec SIM from a security perspective to track when changes are made outside of approved COs, but that's about it.  We looked briefly at Solarwinds LEM a few months back, but didn't really have budget at the time for it, so it didn't go anywhere.  At the moment, other than SIM we are just logging to a couple of kiwi syslog servers and haven't really done much checking as we only recently got that set up.

            • Re: Drowning in data while searching for information
              andymcl

              In my experience with different SIEM products, the issue has always been normalizing the data so you can find your needle in the haystack of data. I have used several different products, but the crux of the issue has always been finding an intelligent manner to aggregate and normalize so that your searches actually return something useful.

               

              I have found, in my experience that the aggregation and correlation engines in the open source tools only go so far. I would encourage you to look at the tool that SolarWinds has now. I've had it since before SolarWinds acquired it, and it has really made a huge difference for me in our efforts to correlate and normalize our data.

               

              Cheers!

            • Re: Drowning in data while searching for information
              byrona

              For years we have sent all of our log data (both Syslog and Eventlog) to a central repository where that data is rarely used except by maybe our network team for troubleshooting.  Ultimatley we are storing a lot of useless data and finding anything useful would be like looking for a needle in  haystack.

               

              It seems to me that storing all of that log data is less important than generating alerts for the few things that you do care about and then purging the rest of the data.  Unfortunatley not everybody I work with agrees though I think some of them are the IT world's equivilant of hoarders!

               

              When it comes to SIEM products what I have found is that many of them are not cost effective.  I have also found that many of them require too much and/or too difficult of configurations to make them worth the trouble.

               

              Recently I have had the opportunity to work with SolarWinds LEM and have been very impressed with it's balance of simplicity and functionality.  It really does a great job of getting you the data that you care about in a way that works.  It does have a steep initial learning curve but once you get past that it becomes very easy to work with which I can't say for some of it's competitors that I have used.  I also really love it's small footprint in the environment and the virtual appliance approach which equates to me spending less time managing it and more time using it; it has all the qualities I look for in a good IT solution.

              • Re: Drowning in data while searching for information
                Sohail Bhamani

                In my network engineer days, we always sent logs to syslog.  This was only a slight pain based on the amount, but since we were only sending network device syslogs to a syslog-ng server, it was not too bad.  Now a days, customers seem to log everything including networking devices and servers and so on.  This vast quantity of data is most definitely similar to finding the needle in the haystack as has been mentioned before.

                 

                I have investigated Splunk, but its correlation abilities really aren't fully geared for all devices... it definitely lacks in the network side of the house.  I recommend LEM to any of my customers who want this log correlation ability and what usually sells them is the active responses.

                 

                LEM has to be one of (if not the) best acquisition by Solarwinds in recent history for me.  Being able to send all logs via syslog or gather logs via the agent really opens up its abilities.  It stands out from the pack with the active responses, easy deployment model, and gorgeous gui and so far is definitely helping the customers I work with who have purchased.

                 

                The only draw back is the slight learning curve it takes to get used to the program.  Once you get passed this, it would be hard pressed to find a solution which provides this much value.  I definitely do not miss grepping through tons of syslog data for sure.

                 

                Sohail Bhamani

                Loop1 Systems

                  • Re: Drowning in data while searching for information
                    byrona

                    Sohail

                    It's very validating to hear that other folks in the community have had the same experience with products that I have had.

                    • Re: Drowning in data while searching for information
                      Jeremy Stretch

                      Cool, I'll have to try out LEM. We messed around with Splunk at one point but had a difficult time gauging our licensing requirements.

                        • Re: Drowning in data while searching for information
                          byrona

                          I tried Splunk out at one point to as it has always been the product everybody had suggested to me in the Log Management space.  I personally found the interface very clunky and the configurations to be overly complex.

                            • Re: Drowning in data while searching for information
                              Ryan Adzima

                              I've grown quite fond of Splunk in the past year or two, with the 4.3 upgrade the interface is full HTML5 and their "language" is not too hard to understand. Sizing the license properly was definitely the hardest part for us and we still run across the occasional license violation. The data correlation abilities are phenomenal if you have the time to put into it. One specific function I have developed for it is user experience on our web sites. It correlates the data between the proxies, web servers, databases, and spits them all out into easy to read charts (that took me a few months to fine tune). I keep it in rotation with my NPM page on a separate monitor in my office for monitoring network health.

                               

                              I will be checking out LEM though, if I can have it all in one location but still have everything I need ready to go, that makes it much easier.

                                • Re: Drowning in data while searching for information
                                  byrona

                                  Ryan

                                   

                                  What exactly did you develop for user experience?  The mention of that was intriguing to me and I am interested in knowing more about what you are doing.

                                   

                                  I guess for us as an MSP, I don't really want to have to take the time to learn a language and don't have months to work on fine tuning the functionality of the system.  Once we finish developing our new SIEM Management service that will leverage LEM I anticipate being able to quickly deploy, configure and customize SIEM environments for customers.  Since LEM normalizes the logs the correlation piece becomes very simple with drag and drop rule creation.

                                   

                                  I know Splunk provides a more open framework with their language so you probably have more flexibility there assuming you are willing to invest the time.  My experience with Splunk is somewhat dated so I am also glad to hear that they have improved their interface.  They are a big name in the SIEM and Log Management space so they are definitely doing something right.

                                   

                                  Please post back once you have had a chance to work with LEM as I would be really interested to hear your take on the two products side-by-side.

                                    • Re: Drowning in data while searching for information
                                      Ryan Adzima

                                      byrona,

                                       

                                      The user experience basically monitors and correlates all transactions across the entire web infrastructure and outputs pretty little charts and graphs (even maps with requestor location based on IP) that give me the ability to quickly see if there's an influx of traffic, an issue in one of our databases, or a proxy isn't load balancing properly. It's not a pretty setup (on the backend), it wasn't easy, and I _know_ it would have to be completely rewritten to be even remotely usable by someone else, but I was determined to do it and am happy with the results for my environment.

                                       

                                      I like the idea of LEM since I have so many other SW products, I'd imagine (hope) that it can correlate just more than log entries since it's also watching so many other parts of my infrastructure. I really haven't looked into it at all yet but hope to get on it as soon as the students are back and password resets settle down.

                            • Re: Drowning in data while searching for information
                              jswan

                              Here are a couple of simple things that cost nothing and pay off in time saved troubleshooting:

                               

                              1) Collect syslog from all your network infrastructure gear, if you don't already. If you're a Solarwinds and/or Microsoft shop you can use Solarwinds Kiwi Syslog for free. Use a loopback interface on all your routers, and use "logging source-interface loopback0" to ensure that your log messages are easily correlated to a device by IP address.

                               

                              2) Build a daily report that sorts unique log messages by the number received, aggregated both across all devices and by individual device. There are lots of ways to do this; I use a Python script that I run off the Kiwi server with the schedule function built into Kiwi (I've found this to be more reliable than the Windows Task Scheduler). However, you could use other scripting tools, SQL, or even Excel if you're crazy. Here's some sample output showing interesting events:

                               

                              2 %BGP-5-ADJCHANGE:

                              6 %PORT_SECURITY-2-PSECURE_VIOLATION:

                              2877 %ENVMON-3-FAN_FAILED:

                              4477 %SERVICE_MODULE-4-WICNOTREADY:

                               

                              I review this every day, first thing in the morning. In this case I can see that I need to investigate a BGP flap, possibly a port security issue, a fan failure, and a strange looking WIC error message. Another part of the report (not shown) lists the messages sorted by sending device.

                               

                              Just doing this much is really valuable for giving you operational insight into your network, and it costs nothing besides some messing around.

                               

                              You can also have Kiwi send emails based on certain conditions; I have one that emails me any syslog messages with "critical" or "emergency" priority from certain devices.

                               

                              You could write similar reports for stuff like VPN user access, TACACS/RADIUS logons, firewall logs, etc.