Version 1

    Traffic flow sampling can be a bit of a mystery at times. It’s often difficult to determine the impact of enabling sampling and understanding how much management traffic that sampling can place on the network. Fortunately, tweaking the sampling rates can have a significant impact on the flow arrival rate to your sFlow Collector, and also on the estimated accuracy of the results at different confidence intervals.


    Enter the sFlow Traffic Characterization Tool!


    This is a free tool from SolarWinds that can help you assess the impacts of sampling in a typical data center environment. This simple worksheet allows you to enter your own values for the number and type of switch interfaces, the interface sampling rate, and the average utilization rate for the interface.


    The tool will use an average frame size and some assumptions about the sFlow datagrams to estimate the sample arrival rate, and the estimated bandwidth for this network management traffic for analysis. You can then see and evaluate the impacts associated with changing your sampling rates, or as your utilization increases over time.


    The second part of the worksheet can help you understand the calculated margin of error at different confidence intervals over time. Traffic sampling uses random samples to construct a statistical model of the traffic visible on an interface, and the accuracy of that model can be calculated and expressed with a margin of error using a simple calculation. The accuracy is based on the total number of samples collected, and the margin of error tends to decrease over time.


    The tables at the bottom suggest different operational applications for specific time intervals, with their calculated margins of error. These are based on the sample arrival rates shown in the top half of the worksheet, so you can see the impacts on accuracy as you vary your sampling rates and utilizations.


    One aspect of sampling that can often surprise people is that accuracy depends solely on the total number of samples collected in an interval of time. Intuitively, we tend to think that sampling must see some large proportion of the sampled traffic in order to be accurate. However, that’s simply not the case. If you’d like to take a deeper look at this, there’s an excellent explanation of packet sampling theory—and the more detailed mathematics—available on the website here.


    We’ll discuss this topic in more detail during THWACKcamp 2018, which is taking place October 17 – 18, during the “Visibility in the Data Center” session. We hope to see you there!