cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Understanding Errors and Discards

Level 12

I was working with a customer the other day and we were analyzing some of the data that Orion NPM is collecting from his core routers. On some of his gigabit interfaces we noticed that every few hours we got a couple of hundred discards (all at once, not spread through the hours). This caused us to investigate the root cause and also got us to talking about errors and discards and the more I thought about I thought that some of this data might be useful to other people.

First off, when you errors or discards within your network management system you need to ask yourself two questions:

a) Do you trust the NMS?

b) Are you seeing any issues on those interfaces?

I mention trusting your NMS first as I've definitely seen cases where the network management software misreported these stats. If your software is from us here at SolarWinds, then you skip this part as in the 10+ years I've been helping to create and using these applications I've never seen them misbehave in this particular way.

When it comes to the second question,what I mean is, if you hadn't noticed the stats being reported in your NMS would you have been thinking about this interface? If not, and the number of errors or discards or relatively low then you might just sort of keep an eye on it to see if it gets any worse.

But let's assume that you've decided to go investigate these stats. One very important thing to understand is that there's a world of difference between discards and errors. Errors indicate packets that were received but couldn't be processed because there was a problem with the packet. In most cases, when you're seeing inbound errors on a router interface the issue is upstream of that device. Could be a bad cable, misconfiguration on one end or the other, or etc. In most cases, these issues are resolved outside of the router where you're seeing the errors. Errors reporting is documented within RFC 1213 (among others including RFC 1573) and typically is pulled from the IF-MIB (ifInErros and ifOutErrors).

With discards, the situation is almost the opposite. The packets were received with no errors but were dumped before being passed on to a higher layer protocol. A typical cause of discards is when the router needs to regain some buffer space. In the case of discards, the issue is almost always with the router that's reporting the discards (not witha a next hop device, bad cable, etc). RFC 1213 also documents discard reporting and they're right beside the errors within the IF-MIB.

This blog post is getting long so I'll stop the description here, but ping me if you want to know more about this as I never really tire of talking about packets...


Flame on...
Josh

pastedImage_0.png

10 Comments
Level 7

Here is a strange thing that I see within my solarwinds. As I look at this it might be HP related? Of course I have no idea why were are monitoring printers but I am stepping into a new office envrionment and red means bad to me. These printers are in our branch offices and this information is traveling over an MPLS network.

DCALPRN0002 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 542,126
discards
0 errors 228,401,545,216
discards
Status NPID92DA6 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 210 discards 0 errors 205,011,042,304
discards
Status PRIDLPRN0003 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 45 discards 0 errors 94,066,180,096
discards
Status CLTLPRN0002 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 92 discards 0 errors 81,178,968,064
discards
Status TPALPRN0003 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 19 discards 0 errors 71,547,584,512
discards
Status PHLLPRN0002 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 318 discards 0 errors 68,795,760,640
discards
Status NPI104D23 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 187 discards 0 errors 64,668,016,640
discards
Status CLTLPRN0003 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 23 discards 0 errors 57,788,780,544
discards
Status tlhlprn0002 Status HP
ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.63
0 errors 1,683
discards
0 errors 16,510,980,096
discards
Level 7

Did you figure out more information about these discards?  We're experiencing similar issues.

NPID844C2    HP ETHERNET MULTI-ENVIRONMENT,ROM R.22.01,JETDIRECT,JD95,EEPROM R.25.09  0 errors  4,008,652,288 discards  0 errors  4,008,636,160 discards 

NPI9E9870    HP ETHERNET MULTI-ENVIRONMENT,ROM none,JETDIRECT,JD128,EEPROM V.28.65  0 errors  26,470 discards  0 errors  2,751,846,144 discards 

NPIBE5C9F    HP ETHERNET MULTI-ENVIRONMENT,ROM V.29.11,JETDIRECT,JD115,EEPROM V.29.12  0 errors  20,590 discards  0 errors  0 discards 

sysName not set    SMSC 9100 series ethernet  0 errors  12,467 discards 

Level 7

I am also having issues with Discards (typical setup is Cisco 3750's). Any help would be appreciated. Thanks

Top_10_Errs_Discards_Today.bmp

Level 8
Node Status VPN-Conn3030 Interface Status DEC 21143A PCI Fast Ethernet 0 errors 2,601 discards

0 errors

0 discards

I am trying to find out where this discards information came from. It did not show under interface status.

This discards are really information that we can trust?

GigabitEthernet4/24 is up, line protocol is up (connected)

  Hardware is Gigabit Ethernet Port, address is e05f.b981.3ec7 (bia e05f.b981.3ec7)

  Description: Cisco Con3030

  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,

     reliability 255/255, txload 1/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 100Mb/s, link type is auto, media type is 10/100/1000-TX

  input flow-control is off, output flow-control is off

  Auto-MDIX on (operational: on)

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input never, output never, output hang never

  Last clearing of "show interface" counters never

  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 8772

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 5000 bits/sec, 2 packets/sec

  5 minute output rate 3000 bits/sec, 4 packets/sec

     827636864 packets input, 237042778534 bytes, 0 no buffer

     Received 20541186 broadcasts (0 multicasts)

     0 runts, 0 giants, 0 throttles

     5 input errors, 5 CRC, 0 frame, 0 overrun, 0 ignored

     0 input packets with dribble condition detected

     995670662 packets output, 829667238749 bytes, 0 underruns

     0 output errors, 0 collisions, 3 interface resets

     0 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier

     0 output buffer failures, 0 output buffers swapped out

Level 15

Now this post has sparked my curiosity and I will have to look over my 6509 and 6513.  Thanks!

Level 15

Helpful and educational.

Level 8

What would be nice is if we can understand the ration of discards to the actual number of packets that were transmitted and then get an idea of the correlated errors if any. For instance what this number tells me is that if the ratio gets above a threshold of 3-5% discards I can get an idea that my interfaces are worked to the max and that I may need to look at either adding a different resource with more capacity or re-route up-links differently or redesign my network to allow for the least path of resistance to my network.  Is that a correct assumption?

Level 7

Hi there!

Great read on Discards!

I'm still a little confused if I should be worried about my situation so perhaps you can help.  We have two Cisco Access Points on one of our switches and about once a day for a minute, they get up to 2.5% discard rates.  The rate seems low and short lived so I'm thinking it's nothing to worry about since the interfaces seem to be properly configured.  I'm also thinking that because they are AP's they may be more susceptible to this sort of issue since you have many user on the same interface.  Let me know what you think?

Thanks.

Level 9

Is it common to see errors and/or discards when you have a trunk interface not configured correctly on one side of the connection? 

Level 13

Browsing through the old Geek Speak posts and found this.  Excellent post. It's always bothered me when folks just ignore this kind of stuff.  There is no such thing as data that has no meaning.  Do the work to investigate and understand the cause, then make a decision as far as what you're going to do about it.  I work with a guy that is constantly saying things like "such and so much have just wigged out - don't worry about it".  That doesn't cut it.  Whether you're receiving bad packets or dropping them, there is a root cause.  Find out what it is, then address it if it needs to be addressed.