Here lately I've been asked alot about using Orion to collect data for PCI compliance. For the most part, this is pretty easy as Orion does a great job of managing routers, switches, firewalls, and servers. As a matter of fact, I've helped hundreds of customers setup special reports in Orion to highlight these features.
Things get a little more complicated when it comes to collecting logs from Windows servers and PCs. One way to tackle this is to use the Windows Event Log forwarder that we provide as a free download. This utility installs on the Windows systems and converts the event log messages into syslog messages and forwards them to Orion's Syslog Server.
Another creative way of solving this problem is to use the SNMP trap service on the windows machines forward the event log messages as traps. Orion can then receive, store, and alert on these traps.
Orion includes a highly scalable rules engine for the SNMP Trap Receiver and the Syslog Server that can be used to setup alerts and actions based upon message format, content, source, and etc. The users that take advantage of these features love them - but not too many people know about them.
Obviously, there are many other ways to collect this data including collecting it directly via WMI and using other third party agents to provide the logs. I'd love to hear how you're collecting logs from your Windows systems and any hiccups that you've encountered.
Well, I'm out in New Jersey this week talking with the DoD about network management systems, interoperability, and best practices for building NOCs. I'd planned to take my iPhone out and get some cool pics of me standing in front of a tank or something but it's been dark when we finish every day.
It's interesting to hear everyone's opinion about how network management systems should operate and interoperate. Definitely some very smart people here and I'm looking forward to chatting with them more tomorrow.
In the meantime, I've been thinking a bit about how these practices can apply to those of you in the commercial sector. If you've got a cool NOC drop me a note and I'd like to chat and possibly come see you.
Two of the things that I get asked about a lot are a) how often you should poll your devices and b) how SNMP counters work. Let's address the question about polling frequency first.
There are two primary types of polling - polling for status (up/down/warning) and polling for statistics (latency, traffic, errors, CPU, memory, etc). There are many schools of thought on how often you need to poll for status. On the extreme I've worked wiht customers that want everything from "once per day" to "real-time or every second". Across the industry 5 minute polling for status is pretty standard, while some products like Orion use a default of 2 minutes. When thinking about how often you need to poll for status there are few things you should know. First, a strong fault management strategy will include both status polling and monitoring for SNMP traps. Status polling gives you a guaranteed way of finding outages but doesn't happen in real-time. Traps happen in real-time but you have no guarantee of receiving the trap. By leveraging both technologies together you gain both speed and accuracy. Second, you need to understand that there are different types of status polling. Most polling for device status is done via ICMP (ping). Status for interfaces, CPUs, volumes, and other sub-elements are ususally done via SNMP and status checking for applications may use SNMP, WMI, or the actual application protocol.
Polling for statistics is a little different. The first thing that you should know is that most of this type of data is stored within two MIB types - gauge and counter. If you're polling a stat this is stored within a gauge, then you need to know what the gauge represents. For instance, most Cisco devices support three different MIBs for CPU load - real-time, 1 minute avg, and 5 minute avg. So, if you're polling every 2 minutes stick to the 1 minute average but if you're polling every 9 minutes go with the 5 minute average. Mature products like Orion figure this stuff out for you, but you can also manually force the behavior so that you can experience he different results. The other common type of MIB used for polling statistics is a counter. Using an SNMP counter to calculate a rate is much like using the odometer in your car to calculate gas mileage. For instance, if your tank is full and you drive 100 miles and then can add 10 more gallons to the tank to get it back to full then you've gotten 10 miles to the gallon (not uncommon in a diesel guzzling 4x4 behemoth like I drive). An example using SNMP counters would be the MIB used to measure traffic on a network interface. The MIB will show you a running total number of the octets (bytes) of traffic that have went in/out of the interface. So, if you poll the MIB and find out that it's sent 1000 octets and then poll it again in 10 minutes to find that it's now sent 10,000 octets you know that in 10 minutes it sent 9,000 octets which correlates to 120 bps. ((9000*8)/(10*60))=120. Generally speaking most people poll these status every 15 minutes, but poll critical connections every 1-2 minutes. Orion pulls this data every 9 minutes by default. Just like how it's important to use status polling along with SNMP traps, it's important to use a technology like NetFlow along with polling for traffic rates.
Anyhow, that's a really brief introduction to this subject. Ping me if you have questions.
The marketing team here is hosting a survey and giving away some pretty sweet prizes. Well, I would never actually purchase a Wii myself, but if I won one I'd sure as heck play it :)
Here's a link to the details. Help us out and take a few minutes to fill it out...
Take a few minutes to share your thoughts in this short online survey. Why, you ask? Well, it's a great opportunity to have your voice heard back at SolarWinds HQ, plus we'll be giving away a Nintendo® Wii® to TWO lucky respondents, so start those mouse buttons clicking!
Well, I'm home now and had a great trip out to Virginia. I got to meet with several customers and some people that are interested in becoming customers so it was a great trip.
For those of you that responded but I was unable to visit this time, I'll be headed out that way again soon.
I'm flying out to Baltimore tomorrow and driving down to Virginia for a meeting Tuesday morning. I think I'll have a couple of extra hours on Tuesday afternoon and would love to go see one of our community members. Ping me on the blog or via e-mail to make contact...
I get asked a lot about using Orion (which requires SQL as a database backend) with a SAN. This usually comes up when people are also leveraging the Orion NetFlow Traffic Analyzer (NTA) which can cause the database to grow very, very quickly.
Before I get started, let me say that I believe that the product documentation and the official stance of our tech support team is that we don't recommend running Orion w/NTA with a SAN, and for good reason based upon our overall experience in this area. You see, SANs are great for moving and storing very large amounts of data. In many cases you can actually read and write data more quickly to a high-performance SAN than to locally attached disk. The problem is that with applications like Orion you're not moving large chunks of data; instead, you're moving ginormous amounts of itty bitty pieces of data and most SANs just don't have the ability to handle this number of I/O transactions in the timeframes that applications like this demand. Time and time again we've seen issues where data is getting dropped when trying to write to a high-performance SAN but after moving the data to even a moderately performing local disk array the problem goes away.
For example, I worked with a customer recently that was seeing holes within some of the data sets the he was collecting and was leveraging a SAN to house his SQL database. Additionally, when trying to query the database for these results the queries would sometime time out. We turned on some perfmon counters on the SQL server and we were seeing disk queue lengths (read and write) of 200-300. Microsoft recommends that for SQL Servers with high amounts of I/O the disk queue lengths not exceed twice the number of physical disks (which in this case was 13 if I remember correctly). After moving the database to a local disk array (RAID 1+0), the problems went way...
What inspired me to write this post is that last week while I was at InterOP I had a chance to meet with several of the SAN vendors and to review some of their newer technology and it seems like maybe SANs have now evolved to a place where they could be used very effectively in these scenarios and may even out perform local high-speed arrays. I'll have to wait to see, but it definitely seemed promising.
If any of you out there are effectively utilizing SANs in environments please drop a comment with some specifics.
A recent post on the forum for the Orion Network Performance Monitor (NPM) at:
got me to thinking about some cool customizations that I've seen implemented on Orion and also a few best practices for using the Orion web interface. Orion's going to be turning 7 years old this year and the website has come a long way and I've seen some really, really creative customizations. Some of the most useful things I've seen have been in the areas of maps and reports. You can create some really complex and interesting maps and reports within Orion. I've seen maps with literally hundreds of levels and containing tens of thousands of elements. As for reports, one of the coolest is the one that Savell from down under posted at:
There are also some things that you can do to the Orion website that require virtually no effort but can offer huge payback. Here's my Top 5 list of Must Do's for Orion NPM website customization:
Head Geek's Top 5 Must Do's for Orion Web Customizations
5. Don't use the "Network Wide" charts and/or resources on any commonly viewed pages. These resources do exactly what they sound like they do, and therefore are can take as long to run as several of the other resources combined. Save these resources for where you really need them (if you need them at all).
4. Right-size the views for the screens that you commonly use. If you have Orion displayed in your NOC on a large HDTV like many customer do, you probably have room for at least 4 columns on the screen.
3. Put your company logo at the top of the page. Your boss will love it and his boss will like it even more.
2. Leverage the 'Views by device type" feature. If you're looking at a Node Details view for a router you want to see CPU, memory, buffers, NetFlow stats, and possibly VoIP call paths. But, if you're looking at the Node Details page for an APC UPS you want to see battery life, power input/output values, and etc.
1. Build views that include data from all of the components - Orion NPM, NetFlow, APM, VoIP, and Wireless. This is the absolute, number one thing that you can do to make your Orion views more usable and to cut down MTTR for issues.
If you're interested in hearing more about cool things you can do to the Orion web console respond here or send me a note (firstname.lastname@example.org) and we'll schedule an informal webcast with an open Q&A and hit it in more detail.