Fault Management (FM) and Performance management (PM) are two important elements of OAM in layer 2 and layer 3 networks.
FM covers faults management related to connectivity/communication of end stations. While PM includes, monitoring the performance of link using statistics like packet loss, latency and delay variation (also called jitter) etc.
Here we need to differentiate between layer 2 and layer 3 networks.
For layer 2 networks, FM is usually done using CCM messages (connectivity check messages) while PM is done using standard protocols like 802.1ag or Y.1731 that can monitor all parameters mentioned above.
For layer 3 networks, Ping and trace route are primary tools for FM and by far the most widely used tools for troubleshooting, while IP SLA is one of the PM tools for Cisco devices. IP SLA can monitor all stats including loss, latency and delay variation at IP layer ( can also do it on layer 2) in addition to helpful stats for VOIP like MOS score. (Please note, Cisco use the term IP SLA also for both layer 3 and layer 2 links, even though the stats at layer 2 are on the Ethernet layer).
Coming from a carrier Ethernet background in my last job, when I look back, I can say that tools especially the PM tools at layer 2, were not used very often. It may be, because many people were not aware of it or the thresholds of pass/fail for performance measurements were not very well defined. Recently, Metro Ethernet Forum (MEF) has done a great job by standardizing the threshold and limits for jitter, delay and packet loss. Therefore, the PM tools have started gaining acceptance industry wide and are being rolled out in layer 2 service provider networks more actively.
However, I am quite curious on how often OAM tools are used in IP networks.
Fault management tools like Ping/traceroute are the bread and butter of an IP engineer when it comes to troubleshooting networks but I am especially interested to know more about the IP SLA and its use in the networks.
So my question to you would be
- How often do you use IP SLA ( or any similar tool) in your network? Do you use it in specific applications like VoIP?
- Do you used it for both layer 2 and layer 3 networks. In enterprises as well as service provider environment?
- Are the thresholds of the PMs (Delay, jitter and packet loss) well defined; by Cisco or any standard body?
Would love to hear your opinion here!