4 Replies Latest reply on Jul 6, 2016 12:12 PM by b1b3tt3r

    Partial Stack Failures

    b1b3tt3r

      We currently have Cisco/Nortel Edge switches, which both rely on stacking cables.

       

      I was able to generate traps for Nortel but not Alerts.

      I also discovered that since we are not watching user ports, just trunk ports, we can not see impact, meaning how many ports are actually down, since we are not collecting statistics,  even if we watch  it through UDT.

       

      My goal it to communicate an impact statement during an outage.

       

      What are some of the things being done in your environment, that you can share?

        • Re: Partial Stack Failures
          rschroeder

          If you have UDT or an equivalent product, you can discover all devices attached to every switch port, and build a user-based report on that.  Then keep that data where you can find it in an outage/emergency and use it to generate e-mail or lists of affected users.

           

          You could also discover this information manually on a per-switch or per-router basis with the Engineer's Toolset using Switchport Mapper.

           

           

          If you isolate traffic via VLAN's, and if you isolate specific VLAN's to specific network rooms and to specific network switches, that's easier.  You can discover all devices affected by a scheduled outage simply by pulling the DHCP information for every subnet on the specified switch.  That's a much less labor intensive way to gather the data--especially if you have no automated tools for it like UDT.

           

          And last, if you're doing NAC, you might just be able to pull all the data about everything attached to one switch from your NAC database.

           

          The scary/ironic things come when someone expects to use e-mail or VoIP to communicate the extent or details of an outage to affected users.  Too many times have I had administrators say "Send them an e-mail telling them your E.T.R."  Oops--the users' PC's are down--how will they receive the e-mail?   But as long as your notification is pre-outage, or isn't expected to reach the affected users during an outage, you're OK.

            • Re: Partial Stack Failures
              b1b3tt3r

              So I actually have UDT, and have run up against some challenges:

              Nexus gear is a mystery, even in NPM, we can not see all the ports to discover them

              On our user edge switches, we are getting partial results, as some are layer 3 and some are layer 2 only( router on a stick).

               

              When I run any reports looking for specific devices, being discovered by UDT, it  has most of the devices under the port channels through the Aggregates, but nothing under the layer 2 devices, and only some of the layer 3 devices.

               

              So I was attempting to include a report in the alert for connected devices, but was unsure how to generate the report as an attachment in the alert.

               

              How are you doing this?

              I am usually on a bridge call giving out the information, but what's more important is to get it to the guys who are on-call, so they know if they have to leave their kid's play-off game or not!

                • Re: Partial Stack Failures
                  rschroeder

                  I isolate traffic by VLAN and network switch, therefore I know all the devices on a switch by the VLAN / subnet / DHCP scope.

                   

                  Further, we rely on LANDesk to report all the users on a subnet, which is a slick way of getting the word out about a pending maintenance window.  Plus, since LANDesk leverages AD, my report automatically discovers and lists the affected users' Managers.

                   

                  We use a fair amount of Nexus hardware; mostly it's in our biggest data centers, and impact there is well understood on a per-port / per-device basis--our DC Managers and Operations team have done great documentation of what's plugged in where, and what apps run on every host.  But we do have some Nexus hardware operating as Distribution or Core or collapsed Distribution/Core solutions; there's no problem discovering attached/affected devices--but I use Switchport Mapper from the Engineer's Toolset.

                   

                  Switchport Mapper should give you all the L2 info you need--I use it to discover affected devices & users when I'm planning maintenance windows.  All you need is snmp read-only configured on the switch and its L3 gateway device (might be on the same switch if it's a Layer 3 switch, otherwise you simply point Switchport Mapper at the router or L3 switch that's routing VLAN's on your L2 switch).

                   

                  I expect UDT has a way of getting the info you need. Post a query to the Thwack group in the UDT forum--someone will know how to do exactly what you need.