33 Replies Latest reply on Aug 9, 2017 10:11 AM by rschroeder

    NetPath Accuracy

    mtoto

      I have a Netpath connection from my Solarwinds Orion server to a customer.  This first connection on the path shows a 30% packet loss w/ 13ms latency.  This is my local firewall.  Pinging the firewall from my Solarwinds Orion server directly reports back 1ms, and no packet loss.  Any ideas?  The reason in the image for the subnet change - this firewall has multiple LAN and WAN addresses.  I'm assuming NPM is using the default LAN address, even though Solarwinds is using 10.101.101.1 to get out.  To verify it's not a routing issue I've also pinged 10.0.1.1 from Solarwinds and received the same result from pinging 10.101.101.1 - low latency and no packet loss.

      firewall_high_latency.JPG firewall_high_latency2.JPG

        • Re: NetPath Accuracy
          slebbon

          Just a thought, but it's a firewall/security device right? Does it have 'dynamic' or behavior based rules that block/filter 'suspicious' traffic?  The scanning traffic from NetPath might be triggering some protections and it might rate-limit or block some of the traffic?  Is there a way to white-list the solarwinds server in it's rules?

          • Re: NetPath Accuracy
            jeff.stewart

            There is a hotfix for NPM that should address the 13ms latency issue between those two devices. Can you apply that and see if it helps?

             

            Jeff

              • Re: NetPath Accuracy
                rschroeder

                jeff.stewart, would you please post more information about that fix?  I have the same problem.

                 

                 

                Also, if that's not it, remember that ICMP is not a high-priority protocol.  A firewall or router or switch will answer ICMP requests if configured to do so, but only after their main duties (forwarding packets to / from computers and servers) are complete.

                 

                I, too, have seen unexpectedly high latency from my NPM server to its gateway.  I later saw that the gateway (a 7609 doing L2 and L3 services, along with WCCP) had one of eight CPU's often at 100%.  Finding and correcting the cause of that resource utilization resulted in a mapped / recorded decrease in apparent latency between the Orion box and its gateway.

                 

                If your ends up having similar issue to my situation, you can look at NetPath as a nice tool that's discovered a potential problem on your router, and then you can spend your cycles appropriately to discover and correct the cause.

                  • Re: NetPath Accuracy
                    rmatejka

                    NetPath does not use ICMP because of the accuracy issues. It uses TCP in a manner that is similar to the way an actual application would respond.

                     

                    Rick Matejka

                      • Re: NetPath Accuracy
                        ecklerwr1

                        It actually does us ICMP as well actually:

                        Ports

                        Open the following ports on your firewall for network connectivity used by NetPath™:

                        PortProtocolDirectionSourceDestinationDescription
                        17778TCPOutgoingNetPath™ probePolling engineUsed to send information back to your Orion server.

                        11

                        (ICMP Time Exceeded)

                        ICMPIncomingNetworking devices along your pathNetPath™ probeUsed by the NetPath™ probe to discover network paths.
                        User configuredTCPOutgoingNetPath™ probeEndpoint service

                        Any ports of the monitored services that are assigned to the probe.

                        Used by the NetPath™ probe to discover service status.

                        43

                        443

                        TCPOutgoingMain polling engine

                        BGP data providers and announcements, such as:

                        Used by NetPath™ to query BGP information about the discovered IP addresses.
                    • Re: NetPath Accuracy
                      byrona

                      jeff.stewart we are also seeing packet loss similiar to others but are unable to verify that packet loss anywhere.  Also when I look at the NetPath Packet Loss graph (which you need to add as it isn't there by default) it doesn't show any packet loss.

                       

                      Thoughts?

                    • Re: NetPath Accuracy
                      mtoto

                      Thank you all for the suggestions.   Jeff - thanks for the heads up on the hotfix.  That did fix my gateway latency issue.  Still seeing a lot of packet loss - maybe that's legit and I need to figure that one out.

                       

                      Thank you.

                      • Re: NetPath Accuracy
                        vmorales1234

                        We have applied the NPM hotfix that fixes the long latency report for first hop.  However, we are seeing the same packet loss issue from the core switch to our enclave firewall, and between the edge router to the perimeter firewall.  The packet loss reported by netpath is between 30% to 50% at times.  Can someone explain how netpath calculates the packet loss, and is there any issue specifically with the hops to Cisco ASA firewalls, because that seems to be the common element.

                         

                        We are looking at the hardware, but we don't want to waste time on troubleshooting if this is a known issue.

                         

                        Thanks

                        1 of 1 people found this helpful
                        • Re: NetPath Accuracy
                          kv@mcg.dk

                          Hello everyone,

                           

                          Does anyone know if there is a fix for the packets loss issue ?

                           

                          We are currently running NPM 12.0.1 and are seeing packets loss above 40% where there most likely is none.

                           

                          With a packets loss of 40% between the probe and destination, the network would be unusable.

                            • Re: NetPath Accuracy
                              rschroeder

                              If you haven't already verified or resolved the packet loss issue, what have you done to prove there's loss, or no loss?

                               

                              If you remoted into your NPM poller that says there's packet loss, and set a steady ping to nodes that show high packet loss in NPM, your results should prove whether there is packet loss.  Have you tried confirming using pings?

                                • Re: NetPath Accuracy
                                  byrona

                                  I have basically done all of those things.  NPM shows no packet loss dropped/discards on any of the interfaces.  A continuous ping (while I realize this is not what NetPath does it's still a valid test) also shows no packet loss.  Also, with packet loss as high as is being indicated the application wouldn't work and it's working just fine.

                                    • Re: NetPath Accuracy
                                      rschroeder

                                      The only other test might be to send fully-loaded packets in the ping and verify the larger packets (1500 bytes) are not being dropped.  Use "ping x.x.x.x -t -w 1 -l 1500".  Ordinary ping tests from windows boxes only send 32 bytes of data instead of 1500.

                                       

                                      I've seen cases where small (default) ping packets pass successfully, but packets larger than a specific size (e.g.: 1424, 1432, 1472, 1488) were dropped.  This was caused by intermediate equipment not being able to handle the required packet sizes for q-in-q.  "Normal" traffic, like typical PC and ICMP tests, all worked perfectly.  "Special" traffic that required all 1500 bytes, plus 22 more for tagging VLANs inside WAN provider VLANs, were dropping.

                                       

                                      In that case the WAN provider's gear couldn't handle packets larger than 1500, while our WAN routers' MTU's handled up to 1532 quite nicely.  The WAN provider had to replace their gear with something more robust.

                                       

                                      If your fully loaded packets fail, then it's time to open a technical support case with your network hardware service provider and find out why.

                                       

                                      If they DON'T fail, it's time for a technical support call to SW to get resolution.

                                       

                                      Always assuming your interface stats on the NPM poller and the polled device show no errors, no loss.

                                • Re: NetPath Accuracy
                                  worto03

                                  I'm seeing this too, has anyone raised it with SW?  It doesn't happen to all netpaths but we have some that show high packet loss and from what I can tell it's not a real issue.

                                   

                                  A tcping on the same port shows no packet loss & there are no complaints from customers whose traffic would be using this route.