cancel
Showing results for 
Search instead for 
Did you mean: 

Seeing the Big Picture: Give Me All of the Data

Level 10

The collection of operational and and analytics information can be an addictive habit, especially in the case of an interesting and active network. However, this information can quickly and easily overwhelm an aggregation system when the constant fire hose of information begins. Assuming the desire is collection and utilization of this information, it becomes clear that a proper strategy is required. This strategy should be comprised of a number of elements, including consideration of needs and requirements before beginning a project of this scope.

In practice, gathering requirements will likely happen either in parallel or, as with many things in network, adjusted on-demand, building the airplane as it is in flight. Course correction should be an oft-used tool in any technologist's toolbox. New peaks and valleys, pitfalls, and advantages should be referenced in the constant evaluation that occurs in any dynamic environment. Critical parts of this strategy should be included in consideration for nearly all endeavors of this kind. But even before that, the reasoning and use cases should be identified.

A few of the more important questions that need to be answered are:

  • What data is available?
  • What do we expect to do with the data?
  • How can we access the data?
  • Who can access what aspects of the data types?
  • Where does the data live?
  • What is the retention policy on each data type?
  • What is the storage model of the data? Is it encrypted at rest? Is it encrypted in transit?
  • How is the data ingested?

Starting with these questions can dramatically simplify and smooth the execution process of each part of the project. The answers to these questions may change, too. There is no fault in course correction, as mentioned above. It is part of the continuous re-evaluation process that often marks a successful plan.

Given these questions, let’s walk through a workflow to understand the reasoning for them and illuminate the usefulness of a solid monitoring and analytics plan. “What data is available?” will drive a huge number of questions and their answers. Let’s assume that the goal is to consume network flow information, system and host log data, polled SNMP time series data, and latency information. Clearly this is a large set of very diverse information, all of which should be readily available. The first mistake most engineers make is diving into the weeds of what tools to use straight away. This is a solved problem, and frankly it is far less relevant to the overall project than the rest of the questions. Use the tools that you understand, can afford to operate (both fiscally and operationally), and that provide the interfaces that you need. Set that detail aside, as answers to some of the other questions may decide it for you.

How will we store the data? Time series is easy: that typically goes into a RRD. Will there be a need for complex queries against things like NetFlow and other text, such as syslog? If so, there may be a need for an indexing tool. There are many commercial and open source options. Keep in mind that this is one of the more nuanced parts, as answers to this question may change answers to the others, specifically retention, access, and storage location. Data storage is the hidden bane of an analytics system. Disk isn't expensive, but it’s hard to do right, and on budget. Whatever disk space is required, always, always, always add head room. It will be necessary later, or adjustment of the retention policy may be necessary.

Encryption comes into play here as well. Typical security practice is to encrypt in flight and at rest, but in many cases this isn’t feasible (think router syslog). Encryption at rest also incurs a fairly heavy cost, both one-time (CPU cycles to encrypt) and perpetual (decryption for access). In many cases, the justification for encryption does not make sense. Exceptions should be documented and risks accepted to provide a documented path on decisions and acceptance of risk by management on the off chance that sensitive information is leaked or exfiltrated.

With all of this data, what is the real end goal? Simple: Baseline. Nearly all monitoring and measurement systems provide, at their elemental level, a baseline. Knowing how something operates is fundamental to successful management of any resource, and networks are no exception. By having stored statistical information it becomes significantly easier to identify issues. Functionally, any data collected will likely be useful at some point if it is available and referenced. Having a solid plan as to how the statistical data is dealt with is the foundation of ensuring those deliverables are met.

12 Comments
Level 13

Thanks, good article. Some of this boils down to the basics and it's good to be reminded of that from time to time as we often try to leap to conclusions to speed up the process and that's when the errors creep in or we miss something basic.

Level 13

Thanks for the article!

Level 15

Thanks for the write up!

I agree--having all the puzzle pieces really helps solve the puzzle!

On the other hand, having too much data, extraneous data, "data noise", and un-actionable alerts or information can be so distracting that a person can't get the job done at all.  It's like having pieces from MANY different puzzles while believing there's only ONE puzzle picture to be created from them all.

I'm thinking data filtering needs to take a quantum leap forward.  Intelligent heuristics filtering traps and syslogs seems only to be a start.  I'm envisioning a future where we accept that there's more than any one team can handle.  A future where AI actually becomes mandatory to prevent or efficiently identify problems.

MVP
MVP

having lots of data is great if you don't know the question you want to answer, but the cost is how to hang on to data for which you have no idea what the retention should be or when it might be useful.

It almost feels like having a garage crammed with stuff "just in case I need it, sometime in the future"

To put this into perspective.....my garage has a car in it

MVP
MVP

that is why you need a data warehouse...to keep all that data out of your garage.   Then your daily database is lean and mean.  Your nearline database (data warehouse) is used for all the heavy queries and reporting.

Level 13

Good article.  Baselines are critical - if you don't know what normal looks like it's really hard to know if it changed.

MVP
MVP

Good article

Level 15

Data Warehouse = Sticker Shock = Nevermind

pastedImage_0.png

MVP
MVP

Agreed. Analyzing what to keep, where to keep it and how long is usually done post-disaster. Doing so in advance saves a lot of issues in the future and gives much better use of existing resources and preparation for expansion.

Level 20

Data Science Engineers is one of the biggest new growth areas in Computer Science right now.

Level 10

I am one that has a two car garage with one car in it. As a former Boy Scout, I'd rather have it and not need it than to need it and not have it, because if I am considering keeping it I usually end up needing it. That obviously comes with a price, both monetarily and operationally. My current mechanism, which I have found to be very, very useful, is to sock all of that data into an indexer with an API. There are a few options, both commercial and FOSS, and they're all well developed to do exactly this. I prefer something with an API that way I can wrote my own queries against it, but realistically it doesn't really matter as long as there is an access and query method that works within your operational model.

I would also assert that having lots of data is necessary if you DO know the questions you need to answer, because the answers can often (and usually do) lead to more questions. The real issue I think you're hinting at is retention. How *long* do I need to keep it? Having all of the data is one thing. Storing it indefinitely is quite another. Long term storage can be on a less expensive, cold medium, while ready access data can be weeks or a few months. The larger the environment the more important it is to keep more data longer.

About the Author
15+ years IT experience ranging from networking, UNIX, security policy, incident response and anything else interesting. Mostly just a networking guy with hobbies including, film, beer brewing, boxing, MMA, jiu jitsu/catch wresting/grappling, skateboarding, cycling and being a Husband and Dad. I don't sleep much.