Ok so here is the official word from TAC on our 6500 issue. It appears the affect is broad and they have no intention to fix it. I dont believe there is a VSS version that isnt 12.2, so this would appear to affect anyone running VSS.
Removing ip sla probe (configured by SNMP) in CLI reloads Standby Sup Symptom:
The standby supervisor reloads after removing an ip sla probe in CLI:
R7600(config)#no ip sla 1
R7600(config)#
06:53:31: Config Sync: Line-by-Line sync verifying failure on command:
no ip sla 1
due to parser return error
06:53:31: rf_reload_peer_stub: RP sending reload request to Standby. User:
Config-Sync, Reason: Configuration mismatch
R7600(config)#
06:53:31: %RF-SP-5-RF_RELOAD: Peer reload. Reason: Proxy request to reload peer
R7600(config)#
06:53:31: %OIR-SP-3-PWRCYCLE: Card in module 6, is being power-cycled (RF request)
R7600(config)#
06:53:32: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing
to Simplex mode
R7600(config)#
Conditions:
This only occurs if the probe was configured via SNMP.
Workaround:
The standby supervisor doesn't reload if you remove the probe via SNMP.
Further Problem Description:
This issue is applicable to Catalyst 6500 platform running 12.2SX releases.Yeah, I found a bug exactly like you’re talking about. The bug is CSCtd45679 and more information is available here:Currently there is no bug fix and no ETA on a fix.. There is workaround where if you remove the probe with SNMP then the issue will not occur.Unfortunately since there is no ETA on a bug fix, aside from using the workaround, there isn’t much else I can advise you to do. :-(
Ok so an update for this...
Did some testing involving a 2620 Router and a 2611 Router. Loaded several versions of 12.2 on the 2620 and wasn’t able to recreate the crash, however had several issues just with creating, deleting and scanning for, operations in general. I then received several 2611's and loaded 12.3 on those and the majority of the errors(faulted states) seemed to have clear up. Support then brought to my attention that 12.3(14)T had several modifications specific to ipsla in it.
So at this point I think it’s safe to say Cisco's opinion of the bug applying to the 12.2 "SX" versions is sound, granted we had several other problems, but the routers didn’t actually crash on non "SX" versions of 12.2. The downside of this is that if you have a 6500 VSS (vs-s720-10g) then you are going to want to avoid running IPSLA on them at all costs because Cisco doesn’t have a version of IOS for VSS that isn’t "SX".
(special thanks to Francois Caron and Adrian Cook)
Thanks for your contribution, on such a sensitive area (IOS crashes).
I also wanted to add this to the thread, which hopefully will give the community a deeper view of IP SLA Manager's behavior, always helpful in this type of situation:All Orion operations but path operations (Path Echo and Path Jitter) are created, polled and deleted via SNMP. Only Path Echo and Path Jitter are created, polled and deleted via CLI.
Other helpful info, not directly related to the IOS issue, but interesting to know in the context of such an issue:
More on the discovery logic:
We have the same SW products and three 6500 VSS pairs running the same code. I have been running IPSLA with no problem - added, modified and removed a few so far during the day no less!
I just want to try to get more detail.
If I understand correctly, the CSCtd45679 bug details say that the crash does not occur if the IPSLA configuration is removed via SNMP, and only occurs when the configuration is entered via SNMP, but removed via CLI. Was that your exact scenario?
I need to rebuild my environment as well!
I would love to speak with you if you don't mind talking to to me. Any clarification we can get can be added to this post, and it may help others as well. Send me a pm and we can get in touch
Salman
So my understanding after having worked with Adrian and Francois is that all operations but the path operations(Path Echo and Path Jitter) are created, polled and deleted via SNMP. Only path echo and jitter are done via the CLI so my question to you would be what is your exact IOS version and what operations are you using?
I personally saw the problem happen 3 times, when we realized what was happening we put a freeze on it and didn’t touch anything. Then a power downtime got scheduled later and we used that window to remove all the IPSLA operations from all 4 of our 6500 VSS nodes at once. When we did that the crash did not happen. So either we were lucky, the crash happens when putting operations on or modifying them, or some other mitigating factor we don’t know about.
-b
ps. as someone who tends to be security paranoid id pull that phone number our of your reply.
Rather than run IPSLA on our core or data center infra, we are purchasing 20 Cisco 1800 series routers as "shadow routers" in all our major call centers.
If these shadow routers crash - no harm, no foul.
This will give us visibility on many call center apps (mainly web based) as well as VoIP scores to the Cisco Voice Gateways in the Data Centers.
As a general rule, we will only run IPSLA responder on a production router (C3845s, AS5400s, etc).
There is too much at stake to enable IPSLA operations on a prodcution router.
Hi,
My organisation has just invested in the IP SLA Manager module as part of a larger upgrade of our NPM toolset to the latest release. We've also nearly finished a rolling upgrade of our core Cisco 6509 estate to IOS 12.2(33)SXI5 which after reading this thread puts the use of IP SLA operations across our core network into question.
The majority of 6509s are running as MPLS PEs and we're (were) looking to run IP SLA operations between these core nodes as well as between selected edge sites connected to those PEs.
Can I ask whether anyone has any knowledge of whether SXI5 also contains this bug and what workarounds (if any) people have implemented?
Also can anyone confirm exactly what causes the crash to happen as I'm still not totally sure from reading this thread? My current understanding is that if a probe is configured via SNMP using IP SLA Manager then removed manually from the CLI then you (may) get a crash but using IP SLA Manager exclusivily to add and remove probes is OK?
TIA
Matthew
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 195,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.