29 Replies Latest reply on May 31, 2013 10:44 AM by michael2907

    Big Data or Big Garbage?

    Mrs. Y.

      At a special panel for RSA Security’s gargantuan conference this year,  Jerry Sto. Tomas, director of global information security at Allergan said "I don't call it Big Data," I call it garbage data,” when discussing the challenges security professionals face in gathering useful information. I felt vindicated based upon the work I’ve been doing that challenges the current trend towards log and event gluttony. News flash: Information technology groups aren’t failing in their attempts at being proactive because they aren’t collecting enough events. Most organizations have enough SNMP, syslog, and netflow data to circle the planet a few times. I believe it’s because most of the applications available in the market to correlate data require a PhD in physics because they’re so complicated to use. Then there’s the impossible task of trying to find the performance “sweet spot” of these applications. I recently needed to perform some log correlation for an investigation and after carefully constructing the query in our event reporting and analysis tool, still hadn’t received the full results after four hours. As I yearned for the bygone days of using the grep/sed/awk triple-threat on some text files, I thought, “Since when do you need a supercomputer for log and event correlation?”

        • Re: Big Data or Big Garbage?
          bsciencefiction.tv

          As our society relies more heavily on the latest greatest thing and the More is better, bigger is better mantra, at the end of the day it is still the people, not the machines that save the day.

           

          As processors and storage have increased, it seems programmers have gotten lazier.  Just throwing layers of unfiltered or useless information at us to be able to say, look how much more we can give you.  Someone one has to be able to interpret and validate the information.  And truth be told, in our industry, I still see people predict cyber attacks and vulnerability issues with anecdotal evidence or gut feeling based on real time observation.

           

          We can throw all the data we want at people, but someone with real world experience and that "it" factor can do more with a little data than someone who put in four years at college and a run at the CCNA Securty and CISSP certs -- who just though this would be a well paying job or give them their dictator fix -- can do with a terabyte of data.

          • Re: Big Data or Big Garbage?
            rharland2012

            Since when, indeed...

            I guess it's since some time in the not-so-distant past when application/solution suppliers seemingly heard that what everyone wanted was event/log/forensic information as an available output, erring on the side of information hemorrhage just to be safe.

            Those of us who only have to keep track of logging for routing/switching/firewalls for years are pretty fortunate nowadays - syslog is easily-digestable and standardized for decent readability without a lot of finish work.

            Application/platform/other event creation, however, doesn't always work as easily - because we need more specific information from our applications, don't we? The more info we want to get, the greater the complexity of the event creation logic used to forward that info, and as a result, the tools to manipulate this data into something consumable increase in complexity as well. Not to mention integrating application-level events with transport-level with some intelligence.

             

            With SDN rolling down the tracks at increasing speed, I can see that inroads may be made with event correlation and intelligence when transport/routing/switching systems are more tightly integrated with - and intelligent to - the compute/memory/storage pieces that live alongside them. It's early days yet, I think.

            • Re: Big Data or Big Garbage?
              superfly99

              I agree. Unless there's a product on the market that can analyze the data correctly and easily, then most data collected is in fact useless. Luckily for me, I only collect data from switches/routers. I remember attempting to use HP Openview and it was way too complicated at the time. Ciscoworks was a little easier but still required a course to be able to make it really usefull. Solarwinds on the other hand, makes everything a breeze in comparison.

               

              But as so much data comes in (have to make sure everything is collected in case of an "event"), a supercomputer is needed to be able to get the info as quickly as possible. Nobody want to wait 30 mins for a report.

              • Re: Big Data or Big Garbage?
                mdriskell

                We are in the process of implementing Log Rhythm for this purpose.  It is owned by another team so I won't get the opportunity to play with it too much.  One of the key reasons it was chosen was for it's correlation engine.  We will see how well it performs once it's fully functioning.

                  • Re: Big Data or Big Garbage?
                    byrona

                    mdriskell while I realize this is being done by a different team I thought I would throw out there that I had the opportunity to talk to a Gartner Researcher regarding SIEM products and he indicated that out of the dozen or so products that he was tracking that both Log Rhythm and SolarWinds Log & Event Manager are both great products and very comparable.

                    • Re: Big Data or Big Garbage?
                      Mrs. Y.

                      SIEM, the promise that never seems to really deliver.... Would *love* to know the cost of the product you mentioned and how much PS is involved to get it up and running. I don't know which is worse, semi-correlation or none at all. At least when I have to manually retrieve all the logs, I know what I'm up against. With the log correlation engine we use at my $dayjob, I've been trying to get some application logs based upon a query for about a week for an investigation. I talked to our CISO about this last night and she said that they're going to bring in PS to re-engineer the architecture. Great. Feels like some of these products use a mafia protection support model. I have to keep paying them to get *any* value at all out of their product. When will the madness end?

                    • Re: Big Data or Big Garbage?
                      byrona

                      Mrs. Y, I always find your discussion topics to be very interesting and thought provoking.

                       

                      I completely agree that a lot of data being collected is "garbage data" and that data gluttony is a real problem which has me wondering what the driver for all of that is?  I don't think its generally the technical folks behind the problem, I think it's a combination of management and compliance requirements, or even a failure to understand compliance requirements.  The scenario I am thinking of goes something like this...  Management wants to meet some set of compliance requirements, reads a document on it and takes away that all data needs to be collected and kept for some large number of years and sends those requirements on to the technical folks.  While this scenario is a bit of an exaggeration, I have seen it more than once already.

                       

                      I think that SIEM products provide a lot of value and there is a lot of value in log data so long as you scope things appropriately and have realistic expectations of the system, don't get caught up in the hype that many of these vendors want to sell you on.

                        • Re: Big Data or Big Garbage?
                          rharland2012

                          The hype works very well to sell solutions....to the same management-level folks who actually sign the checks! This is why a good CIO is so valuable in shops large enough to warrant the hire. Stop the madness before it starts, normalize or boost expectations, and get the right tool for the job in the hands of those who do the job.

                        • Re: Big Data or Big Garbage?
                          jswan

                          Mrs. Y, your problem is one of the reasons I continue to squirrel away plain text logs whenever possible... so when the tool de jour blows up I can go back to grep/awk/sed.

                           

                          At the Bro Exchange last year this topic came up and I was fascinated to hear operators from one of the largest installations state that they had tried a huge laundry list of commercial and OSS log management tools before going back to a home-built tool that just runs grep in parallel across many machines.

                          • Re: Big Data or Big Garbage?
                            matt.matheus

                            I think the difficulty is not in gathering enough useful information, but in finding the information you want in a reasonable amount of time.  When you are looking through 27 months of logs from a particular site, things that should come up as red alerts become buried in the chaff.  Knowledge on how to find what needs to be found / correlated... and the computing power to do it are in short supply for many organizations.  A good log management tool should be able to interpret logs and automatically parse for problems after being told what to look for.

                             

                            Another significant problem is that the people conducting the investigations / reviewing logs aren't the ones writing the checks.  A salesperson can come in and talk at length about their whiz-bang product, throw in terms like compliance and visibility with a few buzzwords (like BIG DATA), and walk out with a purchase order.  Fast forward a few weeks when the product arrives and the people who actually will be needing to use it are forced to make a square peg fit into a round hole.

                            • Re: Big Data or Big Garbage?
                              freid.42

                              I really think the difficulty comes from the number of logs that are coming in from the number of devices. The more devices you add the more logging events that are going to be there. As the devices become more powerful and smarter the more they are going to put out. I feel that the number of logs coming from a bigger company can become unwieldy. This has become sure a burden for shops of any size.

                               

                              We have started to use Symantec MSS for log evaluation. They collect logs from us and evaluate them and contact us for any issues that arise. This is convenient for the IT shop, but not everything is being logged. It's a cost vs risk kind of idea.

                              • Re: Big Data or Big Garbage?
                                IGFCSS.DSI

                                Hi Mrs. Y,

                                 

                                I was reading your post and immediately thought of the book I'm currently reading on my Kindle, "The Signal and the Noise: Why So Many Predictions Fail-but Some Don't" from Nate Silver. At a first glance (and even reading the first few pages) you can think it's about another subject but really it's about BIG DATA. Today we create some pentabytes of data every day and one of the major challenges we face is to filter the "signal" from the "noise". Or in Jerry Sto. Tomas words, filter the data from the garbage...

                                 

                                On the past forgotten times of "grep/sed/awk triple-threat" we really had to know what we were looking for otherwise those grep regex wouldn't work or returned us garbage. Somehow we passed from those "geek times" to a new generation of big data, fast data and click-click-click on a mouse button. Companies that produce/develop software, for example the analysis tools that you've mentioned, just deliver us that and leave it to us to have or not the mentioned PhD.

                                I'm remembering a famous software I've used a few years ago that is one of the best for SNMP monitoring and after install I've just added the IP address of a 48-port Layer 3 switch and "voilá"... Huge amount of graphs, counters, etc... And I though: "I don't need all of those graphs and counters. I don't want to graph bandwidth, in/out packets, in/out bytes for every switch port!!!". I've spent more time to delete the garbage than I've spend installing the darn thing.

                                 

                                Log analysis and correlation is even more difficult because our systems produce more and more information making it almost an "Hercules Task" for us to filter information.

                                 

                                Today when I need to implement such tools I spend more and more time to plan and design, then I implement the "sweet spot" or what I think it's my "sweet spot"...

                                  • Re: Big Data or Big Garbage?
                                    byrona
                                    Today when I need to implement such tools I spend more and more time to plan and design, then I implement the "sweet spot" or what I think it's my "sweet spot"...

                                    Great point!  I think that there is often not enough time spent in the plan/design stage.  When it comes to SIEM products I think a lot of people go out and buy a product thinking "this will solve all of my log problems" based on the hype around the product without taking the time to identify the specific problems that you are trying to solve and then test the product to see if it will actually do as expected.  If you don't do this you end up with a solution in search of problem.

                                  • Re: Big Data or Big Garbage?
                                    Aaron Denning

                                    i agree i dont get why i need so many servers just to house the log files it seems like everytime we get a new server we need 3 more just for log files. but i as well do alot of planning before i implement just to try to make sure that sweet spot is hit.

                                    • Re: Big Data or Big Garbage?
                                      mikegrocket

                                      Big data or big garbage? Since it is electrical bits now, it seems no on is throwing anything away anymore. How many of us are guilty in our own lives, taking a trip and coming home with hundreds of digital pictures. Do we sit down and gothrough them, getting rid of the ones that are no good? Not usually, we just leave them on our hard drive and go on. That seems to be the prevailing attitude with data today. I've heard people say things like, why not, stroage is cheap. Maybe so, but managing that storage can become costly in time. These attitudes tend to enable software to continue  to grow and grow while giving rise to logging everything. Where does it end? It makes my head spin. To the question of SIEM, which is right for me? There are so many solutions and ranging the whole spectrum for costs. I just don't know...

                                      • Re: Big Data or Big Garbage?
                                        Aforsythe

                                        It really depends on why we are collecting data. If it's for event correllation then use something to filter the data you don't need and just log it somewhere. On a day to day basis, I'm really only interested in Alerts from my SonicWall, but when I get a call asking if I can prove an e-mail left our exchange server and left, I'm very happy that I'm archiving EVERYTHING when it keeps us out of court, provides evidence, etc...

                                        • Re: Big Data or Big Garbage?
                                          RichardLetts

                                          In another life I worked for a SaaS vendor. For the display tier there were logs from firewalls, web servers, database engines, and application servers. We consciously built a log handling service (using splunk) to consume these inputs and analyse them as fast as the professional services and support staff needed to do their job (i.e. a few minutes was okay). The benefit to this was that we could put access control, cleanse the data (PCI/HIPPA/etc.), and be more responsive to customer inquiries  Too often the kind of log-file storage infrastructure at organizations is an after-thought. Though linux-geeks and sysadmins can grep logfiles, I feel that it was so much better to give fast access to the data to the people who need it. I think splunk is a great product, but it's quite expensive and in the past couple of years competitors like LogRhythm and logstash have come along

                                           

                                          When talking about 'big data' we should qualify it. One person's 'big' is another persons 'small'. Where I work a year's worth of logs for the network infrastructure is probably well over 20TB, but this pales in comparison with what researchers here consider 'big': The data analysis systems for the UW eScience institute supports several PB of storage and I think they're up over 1000 nodes. (They also offers the free coursera course: https://www.coursera.org/course/datasci. on data science as well as Ph.D in Big Data).

                                           

                                           

                                          • Re: Big Data or Big Garbage?
                                            dougeria

                                            Big Garbage it is, if the people going through the data don't know the good from the bad.  It will take a program to read the program which reads the data.  How many levels of complexity do we need?  Has the payoff been worth it?

                                            • Re: Big Data or Big Garbage?
                                              zackm

                                              I've always viewed (perhaps naively) event logs in the same way that I have viewed packet captures... a royal pain in the foot.

                                               

                                              The inability to have a simple query run through the mounds and mounds of information you receive that parses reliable and accurate results astounds me. It's akin to the gasoline engine; you know there has to be something better out there, but no one seems to be able to create a full featured alternative yet.

                                               

                                              The person who patents a SIEM that can handle the large amounts of data generated in global data center environments and can correlate the logs from multiple sources... that's the next billionaire in this industry.