I would recommend contacting Metro and tell them what you are seeing. They can tell you if that is normal, and what is considered acceptable/usable, at least by their standards.
It depends on the percent of packets that we're errored per second. Errored packets will affect different applications to a greater degree than others.
The graph you posted is useless. An errored packets per second alarm is the way to go along with a graph of much shorter intervals(minutes). If your lucky enough to be able to send to a cmdb/splunk and have software to dig in it for root cause, you can relate a ticket/service call to the message. No ticket or complaint, then it really doesn't matter any way unless the loss is persistent.
Interesting...It's just an area (the METRO) I'm looking into.
It is our public interface to the Internet and we have users mentioning intermittent slow downs, so I was curious about these METRO 'receive' errors and if they play any part.
Do you think the Engineers Toolset 'SNMP Real-Time Graph' would be worthwhile to dive deeper into these errors? And if so, can you suggest the MIB to use?
I would drill a bit furthur down and see if the time of day the errors were recieved correspond to when your users said performance was bad.
If you can get them to say an exact time like 9:30 am it was slow for 30 minutes, then you may be able to correlate that to what NPM sees.
Also look to see if the errors are only bad when the users are doing one particular task.
I had users with a similar complaint that did correspond to errors I saw at the same time. Turned out it was the backup process they were running that actually created the errors I was seeing. After a couple of process changes on how they did the backups my errors went away and so did the performance complaints.
The acceptable number of errors is ZERO. I've seen a lot of errors on these Ethernet handoffs. If this was a T1/T3/OC# the error rate would be zero. Yet these telecom vendors seem to handoff very error-prone, sloppy poorly engineered "ethernet-like" port. Take a look at your ethernet stats on your servers. Unless you've got problems--they will show Zero Errors--even across months and years.
Many network professionals say low errors can be ignored--in most cases they are wrong. Unless were talking about a user port where laptops areplugged/unplugged, ethernet errors should be non-existant. Every error is a packet that was deleted. The application that is waiting for that TCP packet will wait 30,60, even 120 seconds before performing a re-transmit. Ever go to grab a web page and nothing happens... hit refresh and it instantly appears? Packet loss is a common reason for that. If you were willing to wait the 1-2 minutes for TCP re-transmission timeouts you'd have gotten the page. Is this acceptable?
Packet loss creates catastrophic decreases in throughput for the unlucky TCP stream it randomly hits. And it will randomly hit the CEO or a user you don't care about equally. I've seen duplex mismatches on server ports cause a 98% drop in backup throughput. All due to basic ethernet errors. I've seen shops with packet loss convince themselves that pings will randomly fail and you should only end Alerts that a node is down if it fails 3 or more times, because ping randomly fails. No it doesn't. That's packet loss.
Just because the level hasn't reached a point where the users are gathering the pitchforks and torches, doesn't mean it isn't degrading the quality of the network.