Andrew Hay recently did a great post on Dark Reading on whether big data was really just a buzzword being thrown around by the SIEM vendors to get attention, and I have to say I tend to agree with the essence of his conclusion. SIEM (and log management) vendors who have 10 year old architectures claiming big data is what they do probably can’t deliver unless they’ve embraced new big data technology and concepts. But, I don’t think that’s the end of the story.
While the concepts of Big Data can be used to analyze everything from finance to the weather, most of you are probably more interested in what it means to you when your boss asks you what you’re doing about big data (probably right after they sign off on buying that new array). So let’s talk about it. Big Data for IT can be used to manage security threats, application performance issues, identify business insights that go beyond basic trends, and identify operational issues before they become operational problems. All of these are worthy uses of technology and probably your time but getting these insights from big data ain’t so simple (“ain’t” is the little bit of Texas coming out in me).
There are 2 fundamental problems in getting to value from big data.
- You need to collect it all – as Andrew pointed out in his Dark Reading post there are several issues with systems architected around decade old technology. The problems are both in terms of the storage systems which Andrew pointed out (i.e. database) and the organizational structure used to keep and analyze the data (i.e. transactional vs OLAP schemes).
- You need to analyze, identify problems, and then automate the identification for the future. This is more challenging because big data doesn’t come with a “walk through guide”, you have to figure it out. It requires someone with real computational science expertise to find the needle in the haystack and know that it’s meaningful to your business. Also, once identified it’s not clear that the same system used for analysis would be any good for automation and real-time detection. In fact it’s likely that they are different applications at a minimum.
For many of you these 2 things may make big data an impractical luxury, but if you’re in the business of securing your network or really trying to deliver high availability you may want to read on. You have a few options:
- Jump in with both feet: Build a system based on big data technologies like Hadoop, and Google MapReduce, or even more platform oriented products like Splunk, hire your computational scientist and go at it. It’s not impossibly hard – check out this case study about Zion’s Bank. But set your expectations right because this isn’t going to be a deploy it, gather data, and voila results kind of project.
- Or you could look for a product that can give you a few more practical tools. First make sure you can collect machine data and act on it in real-time. If you are writing the data to the database, then making it available to an analysis engine it’s not going to help you react fast enough to a security threat. Second, get a product that can give you some visualization tools for the data, things like word clouds, Tree maps, bubble charts, histograms etc will help you begin your data exploration exercise. This will help you figure out what to search for – IT search by itself isn’t going to cut it. Third, make sure it’s easy to build rules – no handwritten query languages please, we are in the age of drag-and-drop. Lastly, make sure your system can take actions, if all it can do is alert you then it’s not really helping to stop the problem it’s just helping you know that there’s a problem – big difference.
Big Data is here to stay, but my advice is to get practical, if you don’t know what you want to do with the data stop. If you do know what you want to do with it then find a solution that gets you the key parts of the value without having to hire your very own computational scientist.