As you know, we've been doing some testing lately of some WAN accelerators from different hardware vendors. This has caused me to to need to generate traffic of varying types and weight and has got me to thinking about traffic generators in general. The SolarWinds Toolset has a great tool for generating UDP or TCP echo and discard traffic (WAN Killer) but in this case I needed to generate some more specific types of traffic and to generate some specific TCP transaction types.
So, we're going to write some utilities for generating test traffic here in-house and I'm also looking at possibly purchasing some appliance-based traffic generators. I'm hoping a few of you may write back with a) any experiences you've had with the various traffic generators out there and b) if you'd like to see more advanced traffic generation capabilities within the SolarWinds Toolset.
As I mentioned, we are doing some testing this week in the lab on WAN acceleration and optimization devices and most recently we've been working with some Riverbed devices. I must say, I'm impressed with the manageability of these devices.
Out of the box. Orion was able to collect stats on CPU, memory, swap, buffer allocation, interface traffic, and interface errors. Additionally, I'm getting NetFlow information directly from this device and it seems to be working swimmingly. It's not often that I see a device that is as easy to manage as these, so I'm impressed.
Anybody else using these devices in their network?
An interesting thing happened to us this week that I thought I'd share here. As a general troubleshooting step when working with Orion customers, I commonly remove the data from within the payload portion of ICMP packets that Orion sends. I've seen a lot of situations over the years where this helped. For instance, I've seen firewalls that wouldn't pass packets with content in the payload, I've seen ethernet switches that would drop ICMP packets with an odd byte count (meaning 17 bytes vs. 16), and I've seen situations where when sending a high load of ICMP packets through firewalls, the firewalls could handle a higher packet load if the packet size was decreased.
So, I asked our Orion team to change the default in the next rev of Orion to make the payload empty by default. My role here allows me to make stupid suggestions like this and sometimes people listen. Turns out, I may have been a bit hasty...
Over the last few years firewall vendors have begun placing rules on the firewalls to block ICMP packets with a NULL payload. This is because this is a common signature for several known worms and as far as security vendors are concerned - when in doubt, shut it down...
The RFC does not require that that any data is present within the payload portion of the packet and in doing a quick review of several network management products from different vendors it seems that opinions on this subject are widespread. The only opinions that we really care about here are from our customers, so I'd like to hear your opinion on this...
Also, please note, we're only talking about the "default" here. From within the settings you alter the payload portion of the packet any which way you like.
This week we're testing some WAN acceleration gear in the lab and since its on my mind now and has been for a few weeks I thought I'd spend some time talking about WAN performance, optimization, and acceleration.
Management of WAN connections can be a real bugger. Many of us "cut our teeth" as network engineers troubleshooting WAN circuit problems, arguing with our service providers about link states, and trying to figure out DLCI problems on frame-relay PVCs. Seems like nowadays the problems we see on WAN circuits are different. It's not so much that the links are unreliable from an availability perspective, but that they are unreliable from a performance perspective. Part of these problems are self-induced I suppose. When a companies decide to rollout new applications, we're sometimes the last ones to hear about it irrespective of the fact that the application may have significant requirements in the areas of WAN latency and bandwidth (and they wonder why they don't work right). These applications combined with VoIP, Video over IP, SaaS, and a general increase in internet usage from the workplace have caused a surge in the usage and expectations for our WANs (nobody was watching YouTube 3 years ago).
Anyhow, I'm going to dive into some of the key components of WAN optimization in later posts this week. Drop me a line if you have a particular interest and I'll try to add some content in that area.
p.s. I'm taking a trip out West to meet with a few of the leading vendors that provide WAN optimization and acceleration technologies. Let me know if there's any information I can gather for you while out there.
I get asked a lot about the security implications of enabling SNMP. With most technology, there is a security cost in enabling any non-security related feature. This is true just about anytime you enable a service on a piece of hardware that allows people to access it from the network. So sure, the network would be more secure if we disabled all of the management protocols, web interfaces, and root passwords on our network devices. Problem is, we need to be able to manage and monitor these devices...
There are many ways to help secure SNMP on your network. I won't go into a lot of detail as there are several published whitepapers on this subject, but here are a few tips to keep in mind...
1. Don't use simple community strings. Don't use "public" or "private". Use a long character string that includes both numbers, letters, symbols and multiple cases. Don't make it some derivative of your company name - like "S0larW1nd5" that's the first thing an hold hacker like me would try...
2. Access lists - use them. More specifically, allocate a specific subnet to host your network management applications and call this your "management network". Then implement access lists on your devices so that SNMP, ICMP, Telnet, SSH, and any other management protocols that you're using are limited to this subnet. Don't limit it to just the IP address of your Orion server or a small list of hosts - you'll be coming back and changing it all the time.
3. Encryption - If you can, use SNMPv3. This will ensure that your SNMP traffic is encrypted. If you're managing devices across a public network, build a management VPN network and only send management traffic across the encrypted tunnels.
Anyways, that's all for now. Ping me if you have other suggestions or disagree on the above points.
p.s. Be sure that your network management applications support SNMPv3 or that it's at least on the roadmap before purchasing...
I was in a meeting today where we were discussing some features within our products that aren't necessarily apparent to the average user. These are the types of features that you usually don't find out about until you've used the product for quite a while, gotten some tips from the experts, or attended a training class. I really like these types of features. I mean, when I've been using a product for a while and I stumble up on a cool new feature I feel like just won something. I stumbled upon a new feature in Firefox the other day and as a matter of fact Joel showed me a nifty trick for using the Windows command prompt just yesterday and in both cases it was like getting an unexpected bonus. Ease of use is important, but I also like it when when a product offers me the ability to increase my productivity by learning more about the product. Maybe I just like knowing things that other people don't know, but I think it has a lot to do with the fact that I like being an expert and I like using products that allow me to become one.
I'd like to know your opinions on this... Drop me a comment if you have some time.
Now to pay the bills...
Speaking of "hidden" features, you may not realize this but you can add SLAs on to the interfaces that you're managing in Orion. This can come in really handy, especially if you're manging your circuit providers to SLAs for performance or availability. I use it to draw SLA lines on my bandwidth utilization charts to represent CIR on frame-relay circuits. This is covered in the docs, but basically you'll want to use the Custom Property Editor to add an "SLA" column to your node and/or interface tables and then enter in the respective vales for the objects in those tables.
Seems like at least a couple of times per week someone asks me for advice in helping to manage and/or advance their career. This topic is near and dear to most of our hearts for many reasons, one of which is that when the technology market took a nose dive a few years ago the average salaries for network engineers, administrators, and managers decreased as well and sometimes it can be hard to find the "right" position.
When discussing career development people often ask me to help them target specific types of training and certification. Of particular interest are courses within the Cisco and Microsoft Certification tracks as well as classes to help deepen one's knowledge wrt databases and light programming technologies (perl, XML, jscript). While all of these are valuable, I would strongly recommend two additional training areas - negotiation skills and sales.
If you've never taken a quality class in negotiations - you are missing out. Most of these are 2-3 day courses and they will pay for themselves many times over. I can tell first hand that the skills that I've picked up in negotiations have both saved/made my employers millions of dollars over the years and have saved/made me quite a bit of money as well. I very seldom make a retail purchase, no matter how small, where I don't do at least some negotiating. Equipping yourself with the longest, most prestigious list of technical certifications on the planet doesn't do you a bit of good if you can't effectively negotiate your own compensation. Additionally, as you progress up the career ladder, leveraging these skills within your everyday job will become more and more important, thereby increasing the overall value that you provide to your organization.
Now on to sales training. I know that someone's going to flame me for this and that for many of you sales is the "Dark Side" but hear me out. Sales training teaches you some important skills that you can leverage in your day to day job. Every day you have an opportunity to sell/market yourself and your team to your management staff. While it's important to be good at your job, it's equally important to be effective at communicating your success and at knowing what/how/when to communicate bad news. I'm not suggesting that you become a career salesperson, but learning a few key tricks of the trade can help you to elevate your perceived value within an organization and/or help you to sell yourself to a new employer down the road.
Now to pay the bills...
Tonight I've been testing a new beta candidate for Version 9.0 of our ipMonitor product. It's flippin' sweet. Our developers on this project definitely have their propellers screwed on pretty dang tight... In case you're wondering - ipMonitor is sort of an "Orion Lite" for the SMB or small enterprise. Its geared for smaller networks than Orion and offers basic fault/performance management where advanced monitoring, NetFlow, VoIP, and etc aren't required. I mention this here because a) some of you work for smaller companies that aren't candidates for Orion and b) many of you manage smaller networks in your off hours and for these networks ipMonitor is a great solution. I can't offer you a sneak peak at this new release just yet, but it's on its way...
In the previous post I talked about a situation where a simple configuration issue on a couple of ethernet switches was causing a serious problem for the connected users. I failed to mention that I used one of the tools from the Engineer's Toolset to diagnose and resolve this issue.
A lot of you are probably familiar with using the Switch Port Mapper. This tool is a great time saver anytime you need to locate or document user connected to your switches. Something you may not know is that you can also configure Switch Port Mapper to display settings for speed, duplex, current traffic, and more.
Another key point to mention is that when you're looking at the Switch Port Mapper and you see more than one MAC address on a port, there is most definitely another switch plugged into that port that should be mapped as well.
I just finished troubleshooting an issue where the link between a small workgroup switch and the switch upstream in the LDF kept going down. The workgroup switch was setup amongst a small group (10) of users and they'd basically reboot the switch when the issue occurred.
Manually setting the speed and duplex states on both switches resolved the issue. So, I've gotta ask doesn't this seem a little ridiculous to you? I mean, after all these years we're still having to manually configure these settings to maintain reliability on the switches? Seems pretty freak'in crazy to me...
Are you guys experiencing similar problems or was this an isolated case? I'm not hands-on with the gear as often as I used to be and would like your perspectives...
A week or so ago I wrote about the need for disaster recovery/preparedness within the enterprise. Seems like there is a lot of interest in more information on this subject - especially with regard to Orion - so I wanted to add a little more content before completely moving on to something else..
First, with regards to disaster planning, I'm going to broadly generalize this into two main strategy types: multi-site and same-site. In a multi-site solution, a company will usually establish a secondary or backup data center which hosts hosts the redundant systems. Usually in these cases, you are preparing for a site outage vs. a system or application outage.
In a same-site scenario, your primary and secondary systems physically reside under the same roof and your goal is to provide application or system level disaster protection. Tonight we're going to discuss the same-site scenario and within the next few days we'll discuss options for multi-site scenarios.
In a same-site scenario, a few things can generally be assumed. First, because you're planning for a system outage, it is assumed that this system being down is not necessarily indicative of a larger problem. Therefore, the secondary system needs to be able to scale to the same level as the primary system. Second, there is a much higher likelihood of this sort of system actually being used (possibly often) since something as simple as a bad hard drive or Windows update may cause your primary system to be temporarily unavailable. Third, the switch over from primary to secondary and back to primary needs to be automated or at least quick.
When planning for Orion system redundancy, it helps to think of Orion in terms of the major components that make-up the system. The main components would be:
The Orion website operates as a standard, IIS based website. So, when thinking about how to provide redundancy for this component think about it like you would any other website. If you want to do it on the cheap you can provide some basic redundancy through creative DNS creativity or the best way is probably to front-end the website with an appliance built for this type of role. Also, in terms of what you need from SolarWinds, in addition to the licensing you need for your primary server you'll want to buy a copy of the "Orion Additional Website" to use on the redundant web server.
Orion uses a standard, Microsoft SQL Server database. So, SQL clustering is the way to go in terms of providing database server redundancy. Lots of our customers are utilizing this strategy today with great results.
This is where it gets a little complicated. First, let me admit that our solution in this area is not as comprehensive as we'd like. Let me assure you that we're working on this and you should expect to read more about this in the future as Joel Dolisy our Chief Architect will be doing some guest-blogging for us on this subject.
With regards to the main Orion polling engine (which includes the basic alert engine), the Orion Hot Standby application meets the need for application redundancy/fault tolerance in this area. The Orion Hot Standby can actively "monitor" any number of Orion systems/pollers and take over if one of them goes down. However, remember that it can only actively impersonate one polling engine at a time, so if you have 2 polling engines down and only one Hot Standby you'll lose some data.
With regards to the Advanced Alert Engine, Custom Poller, Application Monitor, and VoIP module - for most customers, this isn't an issue. The main concern for most customers is that if the Orion server goes down they still need to be notified about critical system outages and errors so the Hot Standby solution provides all the functionality that is required during a system outage situation.
However, if it is critical to have complete system redundancy, there is a solution. There are a few ways to implement this solution, so I'll stick with my favorite. In this scenario, what you want to do is build the Orion server just the way that you want it (we're just talking about application server - not necessarily the database server or web server). Then, virtualize the server and host the VM on a separate physical machine. In the event that the primary Orion server is down - simply start-up the VM and away you go. As long as the VM operates under the same machine name as the original polling server you should have no issues. You can either do this manually or setup an automated script to do this for you. You will need a second copy of Orion and any application modules that you use for this scenario, but if you talk to you salesperson about the way that it's being used they'll probably cut you a deal.
Receiver Based Services
For receiver based services, you're basically left with the same solution as above where you have a second server (physical or virtual) standing by to be started-up if the primary system is unavailable. One key difference though - you'll have to add the IP address of the second server as a destination for traps, Syslog messages, and NetFlow exports.
If this thread gets too much longer nobody will read it all so I'll end here. If you want more information or have specific questions post a comment or drop me an e-mail and we'll get you the information. We'll talk more about multi-site solutions in the next couple of days.