Last week I had a great phone interview with Leon Adatole, SolarWinds MVP. Leon works for Cardinal Health as the monitoring architect where he is replacing "a certain agent-based monitoring solution that uses the color blue prominently” with SolarWinds and saving the company nearly SEVEN figures (in the US, that’s a lot of money)! Way to go Leon! If you get the opportunity to meet Leon, you will find that he can keep you very entertained, even while discussion the exciting topic of monitoring in an enterprise environment.
JK: How did you get to be a SolarWinds MVP?
LA: Through the cunning use of intrigue, bribes, plates of cookies, and being a general nuisance. Seriously, in my previous job implementing SolarWinds at Sentinel Technologies, an IT service provider, we used NPM, SAM, NCM, Netflow, and IPSLA. During that time I came up with some interesting workarounds for getting machine by machine thresholds for CPU, how to collect performance metrics without alerts, and so on. From these workarounds I created a series of tips & tricks blog posts, 5 in all, to help the community Stop the Madness. I have also participated in the review of the new thwack as well as customer feedback sessions on NCM and Netflow.
JK: SolarWinds is perceived as a mid-market type of company. However, Cardinal Health is definitely an enterprise and Sentinel is an enterprise type company. If someone was to ask you, “Is SolarWinds well suited for the enterprise?” – what would you say to them?
LA: First, a bit of background: I've implemented various monitoring tools in environments as large as 250,000 systems in 5,000 down to just a few machines in a single location. Most of my installs are in the 10,000 device range. With that said, I believe SolarWinds has a sweet spot in the small to mid-sized market – 2 to5 IT guys, who are completely over worked, who can’t possibly learn every little thing about every system. It's inexpensive relative to the market and installs in no time, is working in no time, and it gives you value in no time.
But even in larger (enterprise-class) environments, implementations are almost a non-event. The installation at Cardinal Health - 6 polling engines and 1 additional web server - only took 4 hours and at that point, SolarWinds was ready to work. Adding 5,000 devices as an initial load was a 2 hour activity and I could then create meaningful alerts.
The biggest concern larger enterprises have with regard to monitoring is scalability – when will the tool max out and you have to start coming up with creative work-arounds. While SolarWinds certainly does have it's limits (~100,000 elements per cluster of polling engines), my experience is that you can get much more mileage per dollar than other tools, and extending a past those limits (Enterprise Operations Console) is relatively simple.
To fully transition from our current solution to SolarWinds, it will take 3-4 months because we have committed to the philosophy of “no monitor left behind.” However, most of the 3-4 months will be spent documenting what we are doing in the agent-based system translating that to the agentless world. This includes taking a hard look at what that alert means – what is the end user doing with that alert and is it worth the effort to monitor it.
Implementing SolarWinds versus the incumbent in FY2013 represents a savings of over US$1 million in the first year and over $500,000 each year after that in software maintenance cost savings.
We will also gain opportunity cost with regard to staffing. Monitoring 5,000 server devices today requires a a staff of 8 guys who are just keeping the monitoring tools alive-not even responding to the alerts. SolarWinds will probably require half, if that, of the staff to keep the monitoring tool going. With that additional people savings, we can go out and add more value – be more focused on creating business process monitors – be proactive in our monitoring approach.
JK: You have worked with a lot of monitoring solutions over the years (Tivoli, BMC, HP Openview, Nagios, SCOM, SolarWinds) – both agentless and agent based. When choosing a solution at Cardinal Health, agentless was your top requirement. Why was that?
LA: Of the tools I have used in the past, I am of the belief that easily 90% of any company’s monitoring requirements can be fulfilled with agentless monitoring. The time and energy spent to keep agents up to date and working outweighs any of the benefits of having the additional 10% capability.
Again, we have a staff of 8 guys who spent all their time keeping the agents up and running for 5,000 devices. And by the way, we aren't even playing in the network space. Post SolarWinds we'll be adding ~10,000 network devices to the mix. While agent-based solutions have ways to get to a router, it's usually an after-the-fact solution. At the end of the day, we just could not monitor network devices and elements with our current solution as easily or as clearly as SolarWinds can.
JK: If you were going to give guidance to SolarWinds users in terms of how to organize a monitoring environment, what would you tell them?
LA: I'd go with: “Plan to fly by the seat of your pants, stay up late, mainline caffeine, regret every decision you make, and weep pitiably”.
Honestly, you always look back no matter how much planning you do so the first piece of philosophical advice is to be mentally flexible. You aren't going to “get it right” the first time. The bad news is that there is no “right”. The good news is that there is no “first time” either.
In terms of concrete advice, be very thoughtful about custom properties. Leverage custom properties to group and sort for the purpose of display, reporting, and alerting. Having a CMDB of some kind would be great to get those custom properties in advance, but many companies do not have this kind of repository, so you might be the one building it. My second piece of philosophical advice is to NEVER mention the word “SolarWinds” and “CMDB” in the same sentence. Just keep building your solution and let people come to their own conclusions.
The next concrete thing is to engage with the people who get the alerts. Too often the monitoring people engage with the developers and set up alerts for what SOUNDS like a good idea. When creating alerts, you want to make sure you know who is going to actually get the alert. It’s not cool to wake the application support team up at 2 a.m. for a non-critical “FYI” alert. Once you identify who those recipients are and talk to them about the things they need to do their job better, patterns begin to emerge and your grouping strategy (and therefore custom fields) will be defined from that.
Finally, consider the actual value of an alert before you go setting it up. Too often we're eager to show off how cool Solarwinds is – what whiz-bang metrics it can pull and show and trigger on. But ask yourself (or the business) what the problem would cost the company if you DIDN'T alert on it. You'd be amazed how often that alert or report that took 20 hours to create saves a single technician 5 whole minutes, while the alert you slammed together in 15 minutes saves the company thousands of dollars.
If you would like more of Leon’s advice, you can engage with him on thwack.