_trniI experienced my first broadcast storm (I think) yesterday. Switches went nuts and only solution was to enable spanning-tree RSTP on each switch, and forcing core switch to be the root. I'm assuming I can find MIBs for the broadcast and multicast counters for our Force10 switches. Any recomendations people have to keep an eye on this is appreciated.
Another issue I had, was that since my entire LAN went nuts at our main site it has become apparent to me that I need to implament VLANs which I do not use currently. We have 90 users at two sites, soon to be one large site. How do people allow Orion to still communicate alerts if the main LAN goes down? Obvious solution I guess is to have the monitoring LAN OOB or VLAN'd so it will not be impacted. Not quite sure how to do this. I was also wishing I had some type of SNMP based Sensaphone that would call my cell with a message such as "Switch 1 is not repsonding". Has anyone seen something like that?
Finally, I use our Exchange Server to send alerts. That is no longer going to work since if that goes offline then I no longer receive alerts. We just started using a thrid-party filtering service and they were supposed to alert me if SMTP was not responding but a glitch prevented that from happening. So going foward I will have a minimal level of "everything is down" alerting, but it would still be nice to get a persistant phone call that can wake one up when this happens in the middle of the night.
Comments, ideas, encouragement appreciated.
For the level of network you described, your broadcast storms will be related to a topology loop. It's not 100% certain based on your information, but I'd be willing to bet money on it. You need to prevent this from happening first, monitor second. It sounds like you may not necessarily be a core 'network guy', but if you're loking after the network, here are a few tips for smooth sailing.
Spanning Tree is your friend. There are several more advanced options around these days to prevent topology loops and consolidate network links that technically negate the need for spanning tree, but I've yet to come across anyone who is confident enough to turn it off altogether. It is your safety net, use it. It sounds like you've already enabled it, which is great! Make sure any new switches that go out have STP enabled as part of your baseline.
While you're at it, make sure you remove any HUBs from the network. If you want to get a bit fancier, look at MAC address limits/port security on your switchports. You may want to do a bit of reading of the operation/monitoring of STP on the Force10 gear to track down your loop (look for something in a 'blocking' state).
Some simple broadcast domain separation using VLANs is also essential. At a minimum, I would recommend a management VLAN, client/access VLAN and one for your servers. That way you don't lose everything at the same time, should something nasty happen.
I've not heard of any snmp-enabled phones, but I should think that an SMTP to SMS forwarding-service should be sufficient for most situations. Otherwise I'm sure your users will let you know!
Thanks for the info. What is strange is this happened over night. No physical changes. Definately turning on RSTP appears to have solved the issue. I am not a "core" networking guy, but I am the "only" guy. So switches, Exchange, AD, helpdesk, Firewall, etc, etc. I do my best. This network has been solid for the last two years and it just went beserk that evening. I think you are on to something with the hub/switches. We have a few small 5 port DLinks in people's offices that need more than one jack and I'm guessing that one of those went nuts, but I have still not been able to track down the culprit. Time to brush up on on Wireshark to get a better view of protocols on my LAN.
Those small switches is probably where the loop came from. Someone else could have plugged one of the small ones into another and bam loop - you may not have made a physical change, but that doesn't mean one didn't happen. Then it's like a hall of mirrors effect where one packet keeps going and going around on a loop, then enough of them collect up to become a problem and take the network down.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.