My last couple of posts on How to Avoid "Monitoring Spam" and How to Monitor Effectively have been received well, and have facilitated some really useful conversation around monitoring and alerting in NPM. For those of you keeping score, I had asked how many people monitor link saturation, and unicast or multicast routing tables. As expected, just about everyone came back with emphatic "yes, yes, doubly yes!" responses to the link saturation question. What was a little surprising to me is how many people aren't monitoring unicast routes. This may be a case where the routing is stable enough, with no redundancy or points where route flaps are likely to occur, but it did catch me off guard a bit.
Not surprisingly, multicast route monitoring seemed to be a corner case for most people. Unless you're in certain industries (stock trading, multi-media streaming, etc.) the reality is that most of us have limited exposure to multicast in the form of a handful of standard addresses (OSPF, RIP, VRRP, etc.) and mostly that takes care of itself.
Today, I'd like to continue with these previous themes and ask for any little tips or tricks that the community has to share. Things that may be obvious to those of us who have used the products for a while, and those things that even a seasoned veteran might not know about are all fair game. Even things that don't completely stay within the Solarwinds product line will work (how many of us have Splunk integrated into our dashboards, for instance?). The goal is to see just how many cool things are being done with NPM that might help the community at large.
The depths of the Orion ecosystem are filled with so many "nerd-knobs" that I'm certain there are hundreds of little things I've personally never seen or touched, but that might prove invaluable to me if I knew about them. Sure, we could all search around on Thwack, but it's a bit like the old joke about needing to know how to spell a word in the dictionary before you can look up how to spell it: we don't necessarily know what to search for. We don't know what we don't know.
Here's a quick-and-dirty one that helps our NTA performance.
On your NTA pages, use a high-level page to show your top talkers and such - but build a custom page with copied default TopXX views and chop the heck out of down to the top 5 or so (especially if you're just watching your own shops as opposed to some ginormous number of customer/subscriber networks). It makes the pages load faster, you can fit more resources on a page without things getting nutty, and it's usually one of the top five you're looking for, anyway.
Fool around with the DNS resolution settings in NTA, too...you can do some trade-offs there for performance comparing cached lookups with on-demand. If most of your nodes are local, on-demand's ok - or has been for us.
It's a great tool, but your dog-and-pony show will fall flat if you're all lathered up talking to your bosses/peers about this super-awesome new thing and stuff just hourglasses for 20 seconds.
Super-obvious, sure....but it helps.
A few items used simply can add some complexity to your view.. Custom Properties; We have these for locations (Region, Building, Floor) but also for Power connections and Alert Supression.
The Locations allows us to alert in groups, or set devices of like condition together. Some of the ranges are more than just This building floor 2; in case we have a Distribution Region.
Also a Data Center Region; that drops down into Distribution then Access Layer for granular alerting (supression without using the supression condition)
The Power connections can hold either circuit info, or simply normal power or emergency power.
Custom Property turned into a date/time picker allow for a maintenance window, with a start date and time and an end date and time.
* Using SQL Aert it is easy to flag outside this window to create an extra layer for your alerts.... so you can supress an alert by keeping your maintenance window in position but also still get statistics because you never stopped polling/gathering data.
Maps Maps Maps.
We use the atlas maps extensively. Some of my maps are linked 4 or 5 deep. I have created our NOC and executive dashboards using the atlas maps. I manually set URL,s on the maps and create my own custom graphics for visual alerting.
My most recent NOC view is displayed on a 55" flat screen and occupies the entire display.
It a lot of work using custom properties and groups to create the links on the maps. But the leadership love it, my director told me they were "sexy". What ever floats your boat I guess.
I only wish Solarwinds would focus more on improving the atlas. Little has been done to improve it over the past couple years.
I am with you on this. I really wish SolarWinds would improve on the atlas. Need more OOB images, backgrounds, text box options, link customization, etc. I can't use the new Google stuff, since I'm not connected to the internet. I would think that Maps would be high on the priority list.
I've been working on an Orion NPM deployment lately, and have found that combining Groups and Maps is a great way to help with troubleshooting. I create groups based on locations where IT assets exist (e.g., various wiring closets, server rooms, telco rooms). Then I create a Map using the floor plan of the facility, and hot link the groups to their location on the floor plan. Now when a group goes into warning or alert, the exact location of that node will appear on the floor plan. This is great for shops where new staff need to learn where servers and switches are.
Keep in mind that nodes can be member to multiple groups. So you can still create a group for various applications (like email) while maintaing separate groups for your map.
If nothing else, this will impress the suits.
My favorite is using the power of Solarwinds Orion as a cmdb to provide asset life cycle compliance or "suspect" reports. For example once we identify the proper way to provision or decommission nodes,interfaces,IP's,etc then I can run a report that returns anything outside this range. Examples like:
Once you get them down to a manageable level then combine them into larger combined reports.
Any additional ideas for suspect reports?
Anyone have a local password policy where you have to change the NPM SQL password on a normal basis. To do so, you have to run the configuration wizard again. When you run the wizard for the first time, the wizard tries the password before you are prompted for it. It uses the old password first. That then locks out the SQL account that you just set the new password on. I have a workaround. Go to Program Files-SolarWinds-Orion and look for a file named SWNetPerfMon.DB and open it with notepad. If you update the latest connection string with the new password, the wizard will populate with the new password. This file must also be copied and saved to InetPub-SolarWinds. I have to do this every three months. It can only be done with NPM. It does not work for NCM and I'm not sure about other products yet.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.