i agree i dont get why i need so many servers just to house the log files it seems like everytime we get a new server we need 3 more just for log files. but i as well do alot of planning before i implement just to try to make sure that sweet spot is hit.
Big data or big garbage? Since it is electrical bits now, it seems no on is throwing anything away anymore. How many of us are guilty in our own lives, taking a trip and coming home with hundreds of digital pictures. Do we sit down and gothrough them, getting rid of the ones that are no good? Not usually, we just leave them on our hard drive and go on. That seems to be the prevailing attitude with data today. I've heard people say things like, why not, stroage is cheap. Maybe so, but managing that storage can become costly in time. These attitudes tend to enable software to continue to grow and grow while giving rise to logging everything. Where does it end? It makes my head spin. To the question of SIEM, which is right for me? There are so many solutions and ranging the whole spectrum for costs. I just don't know...
It really depends on why we are collecting data. If it's for event correllation then use something to filter the data you don't need and just log it somewhere. On a day to day basis, I'm really only interested in Alerts from my SonicWall, but when I get a call asking if I can prove an e-mail left our exchange server and left, I'm very happy that I'm archiving EVERYTHING when it keeps us out of court, provides evidence, etc...
In another life I worked for a SaaS vendor. For the display tier there were logs from firewalls, web servers, database engines, and application servers. We consciously built a log handling service (using splunk) to consume these inputs and analyse them as fast as the professional services and support staff needed to do their job (i.e. a few minutes was okay). The benefit to this was that we could put access control, cleanse the data (PCI/HIPPA/etc.), and be more responsive to customer inquiries Too often the kind of log-file storage infrastructure at organizations is an after-thought. Though linux-geeks and sysadmins can grep logfiles, I feel that it was so much better to give fast access to the data to the people who need it. I think splunk is a great product, but it's quite expensive and in the past couple of years competitors like LogRhythm and logstash have come along
When talking about 'big data' we should qualify it. One person's 'big' is another persons 'small'. Where I work a year's worth of logs for the network infrastructure is probably well over 20TB, but this pales in comparison with what researchers here consider 'big': The data analysis systems for the UW eScience institute supports several PB of storage and I think they're up over 1000 nodes. (They also offers the free coursera course: https://www.coursera.org/course/datasci. on data science as well as Ph.D in Big Data).
Big Garbage it is, if the people going through the data don't know the good from the bad. It will take a program to read the program which reads the data. How many levels of complexity do we need? Has the payoff been worth it?
I've always viewed (perhaps naively) event logs in the same way that I have viewed packet captures... a royal pain in the foot.
The inability to have a simple query run through the mounds and mounds of information you receive that parses reliable and accurate results astounds me. It's akin to the gasoline engine; you know there has to be something better out there, but no one seems to be able to create a full featured alternative yet.
The person who patents a SIEM that can handle the large amounts of data generated in global data center environments and can correlate the logs from multiple sources... that's the next billionaire in this industry.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.