This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

how is packet loss and latency calculated?

hello,

first I hope that I posted this in the correct location.

I am trying to understand why I am seeing high packet loss and latency and to explain that from a packet capture I am analyzing.  I am not sure if this is something that can be shared but if it can it will be very helpful for what i am troubleshooting.  I get that I am asking a pretty specific question and understand if what I am asking is something that cannot be shared.  I am asking this question so I can confirm that the results I am getting are not artifically introduced because of something I am doing.

A service is created with a probing interval of 10 minutes to an IP address using tcp port 80.  Just the other day I am seeing 48% packet loss when looking at the last 24 hours which is super high.

In the packet captures it looks like the probing process is to create a TCP connection and then FIN it.  Next there are three TCP connections that are half opened and then reset in succession. (syn, syn-ack, rst).  These three connections are reusing the same source port.  this process is repeated multiple times.  In one example I was looking at it I saw it happen 8 times.

my questions:

latency - is this calculated on the three half open tcp connections rtt on receiving the syn-ack and then averaged out?

packet loss - is this calculated on the failure of receiving an icmp ttl expired packet (the TCP connections begin with an IP TTL of 1)?

thanks for any pointers anyone can provide.

Parents
  • Wow... I thought that someone would have responded to this question.  My friend, you have asked a loaded question, but it can be figured out.  I know that you are specifically referencing Solarwinds, but that answer depends tooooooo!

    Latency

    The queuing theory tells us that the busier a link gets, the longer packets have to wait.  For instance, a 10 Gbps link running at 8 Gbps, or 80 % utilization, means that on an average, when a packet arrives, there are four others already waiting.  At 99% utilization, the queue grows to 99 packets.  Back when links were much lower, this could add a good amount of extra latency, but a 10 Gbps transmitting 99 % packets on an average 500 bytes just 0.248 milliseconds.  These days, buffering in routers in the core of the network isn't going to add a meaningful amount of delay unless links are massively oversubscribed.

    Packet loss

    Ideally, a network should never lose a single packet!  Of course, in the real work they do, and for two reasons.  Every transmission medium will flip a bit once in awhile, and then the whole packet lost.  Wireless networks typically send extra error corrections bits, but those can only do so much.  If such an error occurs, the lost packet needs to be re transmitted.  This can hold up a transfer.

    If network latency or packet loss get too high, TCP will run out of buffer space and the transfer has to stop until the re transmitted lost packet has been received. In other workds; high latency or high loss is not great, but still workable.  High latency and high loss together can slow down TCP to a crawl.

    The port reuse is of great interest;  the affects on time to wait comes into play!

    This is a valid TCP State..... Imagine the scenario:

    1. Client opens a connection “A” to Server
    2. Normal TCP operation (3-way handshake, data transfer, ACK, so on so forth)
    3. Client and server terminate their connection “A” via the use of FIN packets
    4. Client opens another connection “B” to Server
    5. Normal TCP operation
    6. For whatever reason, for example network congestion, latency, high CPU on intermediate nodes, a TCP packet from connection “A” arrives to the server. Should the server accept this packet? Mark it as a duplicate? Deny it?

    In order to solve the above problem, we have the TIME_WAIT state. TCP requires that the endpoint that initiates an active close of the connection eventually enters TIME_WAIT. Usually TIME_WAIT = 2MSL. In other words:

    If a client or server initiates an active close (using FIN packets), then wait for 2MSL before allowing the same socket to be used (i.e. the same IP addresses / TCP port numbers).

    Does this sound like the scenario that you are experiencing?

  • thank you very much for the detailed response.  I appreciate you taking the time to respond like you did.  that is an awesome response.

    yes my questions were specific to SolarWinds and how the result of packet loss and latency was calculated with the way SolarWinds probes using half open TCP connections and opening a TCP connection fully and immediately closing without transferring any application data over the TCP connection.

    Not quite the scenario I am experiencing.  At this point I have not been able to obtain packet captures while the counters are going up.  I am trying to determine a way to capture this data but this will take some effort as I will need packet captures on the side initiating the probe and the side getting probed.  I was hoping that by understanding how SolarWinds calculates packet loss and latency I could determine what was happening from a packet level specifically with the probing process.

    again thank you for your response.

Reply
  • thank you very much for the detailed response.  I appreciate you taking the time to respond like you did.  that is an awesome response.

    yes my questions were specific to SolarWinds and how the result of packet loss and latency was calculated with the way SolarWinds probes using half open TCP connections and opening a TCP connection fully and immediately closing without transferring any application data over the TCP connection.

    Not quite the scenario I am experiencing.  At this point I have not been able to obtain packet captures while the counters are going up.  I am trying to determine a way to capture this data but this will take some effort as I will need packet captures on the side initiating the probe and the side getting probed.  I was hoping that by understanding how SolarWinds calculates packet loss and latency I could determine what was happening from a packet level specifically with the probing process.

    again thank you for your response.

Children