So Kong, it has been a while since you've been back on the SolarWinds set.
Yes, as everyone out there knows, there's been a lot of network-related features released lately. Hashtag: blame the network. I've let the rest of the geeks geek out.
Well, it's true. Between, let's see: NetPath, Network Insight, binary config backups, NBAR2, wireless NetFlow support, there has been a lot of network geeking going on the last few months. So, if you're here, does that mean that I can finally get back to my sysadmin roots? What are we going to cover today?
You know what? I'll let the THWACK community do the talking.
Wow, those are some heavy topics. Are you sure that we're up to it?
No problem. I called in a favor and asked Steven Hunt to help us walk the walk.
Okay, you got Steven, then I know we're going to be able to rock this one out. Welcome to SolarWinds Lab, sir.
I'm really happy to be here.
Let's see. So, you've been here just over a year. You've got the SAM 6.3 release under your belt. You've been going on tour with us around the US on the SolarWinds User Groups, the SWUGs. You've covered a lot of ground.
Yeah, it's been fantastic.
You know guys, we got a lot to cover. Let's kick this off. Howdy folks, I'm Kong Yang
I'm Steven Hunt.
And I'm Leon Adato. Welcome to SolarWinds Lab. We've got a lot of incredible stuff to cover today, things that have been brewing here at SolarWinds for a long time. We know you're probably going to have some questions, so make sure you ask us those questions in the chat window that you see over there. If you don't see the chat window, it means you're not watching us live. To do that, head over to lab.solarwinds.com where you can sign up for upcoming episodes as well as past shows, and even leave us comments for what you'd like to see us cover in the future. Steven, before we dig in to the technical details, we have an opportunity right now to talk about the overall direction that we have now for server and application monitoring certainly, but also the entire systems management arena of tools that we've got. Can you just walk just through that?
Sure. We have three guiding principles here at SolarWinds. The first is unexpected simplicity, and it's all about making our products as simple to use as possible.
The next is hybrid IT. That's all about making sure that IT admins, who have environments in their data center and environments in the cloud, they can still monitor them the exact same way that they normally would. Then the other is application-centric troubleshooting. It's all about the apps. We have to solve app problems so we have to make sure we can troubleshoot app problems.
Excellent, okay. I think that that's going to frame a lot of what we're seeing in future releases as we go forward. Wonderful. I have to admit, we have waited long enough. It is Linux time. Let me in there. We have our Linux box here. I just have to point out that it is running the SNMP polling method. I also am just mentioning that we have minimal information on this screen. You know, we've got the polling details and stuff but there's nothing there, and I mention that very specifically because of what we're about to do. To install the agent, what I want to do is, I want to go to Settings, All Settings under the new environment, want to go to Manage Agents. The next step is one that people tend to overlook because it's a clickable link. Everyone says, "Where’s my Linux ad?" Here, download the agent software. You can install either using a wget, so it'll just reach out, or you can install using yum or apt or sudo apt get, or whichever package, if you're going to put that on another location or repository. We're going to use the wget, in this case. In this case, it happens to be a CentOS. It's not Ubuntu. I know that a lot of folks are used to me saying Ubuntu as often as I can during these episodes. It's actually a CentOS box. Everything else, all the connections, are really sort of just waiting to go. And it will generate the wget command. It wraps it in a Bash shell. So I can either highlight and copy or I can just right click and copy, or copy and copy, or copy, copy. Then, what you want to do is you want to actually Telnet to the box. I have my Telnet session here. Paste that command right in, and it does the install, and it's ready to go. That's it. You actually don't even have to— and correct me if I'm wrong, in most cases you don't even have to add the node into the device. If you actually add the agent before you've added its inventory, it'll connect and be added automatically, automagically.
That's right; we should be able to see it registered within the Orion server node.
Right, now we haven't added this one in. So here, back on my home screen, you can see that we have Linux agents and we have another one here that we've installed. I just want to open that one out. I think it's the next tab down. There we go; it's the Steve-CentOS demo. And the reason why I want to mention this is because the agent can be handy in a number of situations and we're going to enumerate those later on. But the idea is that it's quick to install, it displays a lot of information that you don't get, certainly, with SNMP, and it doesn't take a lot. And that either the wget command or, if you create your own internal repository, you can now deploy it out to all your boxes automatically.
What I love about it, going back to one of your principles about simplicity, is the fact that you can pick your Linux distro; it can generate the Bash code for you in there. You don't have to memorize, you don't have to keep copies of it. Talk about simplicity.
Right, we make it as simple as possible.
Right, and there's a few things under the hood that are actually very interesting about it also which is that it's doing double checks for you. If you pick the wrong distro, it's still going to double check you on that. And I think, and again, correct me, it'll actually self-heal. If you say that it's CentOS and it's actually Ubuntu, it'll actually ask for the right download. We've tried to build as much intelligence into it, so it's double-checking what you said and then it will go out and get it if it can. And if it finds an agent already there, it's not going to install a second one, and, you know, things like that.
Did we mention it's simple?
We might have mentioned that, a couple of times. But yeah, it's simple. It's elegant and simple. So we've got that. Now, another thing that's really nice about it that I want to mention here, is that if your Linux machines are running on a non-standard port, the agent is a wonderful way to get around that. Because if you've got a non-standard connection for SNMP, you're not using the usual one, you can actually configure which one you want there. But there's one other thing on this screen that is just too cool not to let go by and that's this little option here: the SSH. Now, I have to tell you, I have spent some relatively significant hours trying to get the Telnet link that used to be there, the old tool's Telnet link, to automatically open a copy of PuTTY to Telnet to the— It was a little challenging to do back in the day. And now, what we have is that. And yes, I'm logging in as root because it's Lab and we can do things like that. Do not try this at home. So, you've got a login. You've got an SSH prompt right there from the screen. We always try, in Lab, to talk about how this facilitates your day-to-day troubleshooting in your workflow process. You're working along, everything's fine, monitoring is that regular collection of data. It passes a threshold, something happens, you get an alert. That's the poke in the shoulder that says; hey, now you need to look at me. We're not building a monitoring tool where you have to hire eyeballs to stare at screens, waiting for something to turn red. You've gotten that poke in the shoulder. What's in that message? Well, the URL to this page. You're looking at the data, you're seeing what's on there, and then you're able to immediately come in and start the management process, the troubleshooting process.
And that goes to principle number three, app-centric, right? Because this is very powerful to manage the applications in there and you're absolutely right. Wow Leon, you've gone through the power of the Linux agent, adding another tool to sysadmins' toolsets in there. There's another thing that sysadmins have to deal with on a daily basis, right? That thing called high availability.
Indeed, actually, I'm going to step back here because the master of high availability should take the stage here.
So, the basic premise here is we're monitoring environments that need to be up, need to be highly available, and by extension, our monitoring solution needs to be up and always available as well.
And that is a huge hassle. That's actually a compliment that I find a lot of monitoring engineers run into, where they set up monitoring, and it's sort of like a whatever, and they push really hard, and they get it in there, and then, all of a sudden, six months later, a year later, people are saying, "How come monitoring was down?" "It can't be down for that long." And you're trying to now manage patches and upgrades, whether it's Windows patches or it's the hot fixes that we're putting out. And, all of a sudden, the organization is telling you, "Oh no, we can't have a three-hour change window." "You've got to have everything up all the time." "It can't be down for that amount of time." You know, or because there was a hiccup in the data center. That is an enormous hassle. That's actually a compliment, because it tells you how useful and how beloved the monitoring has become.
Monitoring is mission critical. You don't have production unless you have monitoring as part of the solution.
So, with high availability now, we're giving another way to deal with that. Is that what I'm understanding?
That's correct, absolutely. So, let's take a look at how we would set that up in Orion. What I have here is a very basic setup. I have a main poller and I have a secondary server that I've installed the polling engine on. So it's very simple. We go to Settings, All Settings, and then we click on High Availability Deployment Summary. So I have a few servers that I can use to set up a high availability pool. So we'll click on SET UP HIGH AVAILABILITY POOL. Notice that we have two members for this pool, and then we have our pool name that we set, and then we give it an IP address.
A virtual IP?
A virtual IP address. And then we click CREATE POOL. And now we have our high available pool created.
Awesome. Only thing harder that doing that was saying high availability pool 100 times fast.
Which is why we call it HA. Because it really is hard to say 10 times fast.
It's definitely a tongue twister.
But, I just want to unwind this a little bit for everybody who's watching. So a high availability pool—see, I got it, is good because they're both up, they're both running. If I patch one and it has to go down and restart or whatever, the other one's going to automatically take over for it.
You can have more than just two boxes. If you really want, you can have several involved. And this is all, I want to say, this is all a SolarWinds feature. This isn't basing itself on Windows Cluster or anything like that, right?
That's right. We have this built in to the Orion server. This works with your main pollers, it works with your additional pollers. It gives you the ability to set up each of those and ensure that the polling capability is up.
And, if we want, we can actually come in here. We can simulate a failover, Force Failover. And this is going to shift all of the polling happening from the main server over to the secondary server. And now it's up and running. We never had an outage.
That's fantastic. One of the things that I think that a lot of our viewers are going to be thinking about, especially for the primary poller, is, again, the patching. We've got SAM 6.3. It's not quite going to be there for this, but we were looking at the NPM 12 upgrades and obviously we have more upgrades. Is the process that we're advising that people would patch the non-primary, the non-active member of the pool, patch that one first, fail it over, and then patch the other one? Is that what we're saying to do?
Correct. The intention here is that not only can you make sure that your environment's up and available, but when you have to go through the upgrades or potential causes of production outages that are intended— upgrades is a perfect example, you can now do that without actually disrupting your monitoring environment.
The reason why I mention that is that we just recently had the Lab episode on upgrades un-mystified. And during the live chat—which again, live chat over there. During the live chat, a couple of the chatters were saying, "My management, my administration has just told me that I am not allowed to have any downtime anymore and I don't know how to do upgrades." They're all excited about the upgrades. They know that they're stable and they're working really well and they're excited about them. But they, now, can't get a window at all, so this would really serve them well.
So we've covered HA for failover and that piece speaks to what you've talked about, Leon, and what he's written books about in his Monitoring 101, 102 piece and so forth, and that's monitoring as a discipline. Because if you lose your monitoring aspect, all of a sudden, you lose that visibility into that piece. But Steven, there's another feature that enables another form of high availability for our end-users, right?
Correct, that's our new centralized licensing. We've built this into the Orion console. It's really easy to get to. You go from Settings to All Settings and then you scroll down and find your License Manager. Now, what this replaces is the Windows 32 app that you would usually log into on the Orion server and go through, add your licenses, activate your licenses. All of that's now done directly through the Orion UI. So the ability to come in here, see all of your licenses, what type it is— right now, we have these in evaluation. And I identify what the expiration, or the maintenance date, is associated with them. And then you can come in here, you can add new licenses, you can activate existing licenses. As well as you're going to get updates directly from the license management. So, no longer do you have to go in, get that new license, and then add that to your Orion server. Now, you just have it sitting in here and it's going to get that new update.
Because we all love managing licenses.
Or not at all love it. Yeah, this is going to be really wonderfully convenient and a much nicer way to have to deal with things. And it continues our trend to getting away from the 32-bit applets that you have to RDP to the server and just consolidating everything on that main portal, so that you can administrate that much easier. I think everybody who's watching this really appreciates it.
So, one of the buzz-worthy words that I've written about, part of the SOAR skill sets in the e-books I put about, and part of what you're going to write about, Leon, upcoming shortly, is this notion of automation. It allows us, as IT pros, to scale our environments, to handle that change, and that delivery that we have to continuously handle. You're going to talk about something special, right?
I am, and although I, for some reason, mentally associate it with SAM 6.3, I do want to be clear that the features we're about to show were included in SAM 6.2.4.
Yeah, that's right. We actually sneaked it in to 6.2.4.
This is why it's important to watch the hot fixes and the releases, read them, see what's going on. Because every once in a while, Steve and his sneaky friends will just slide in something, and when you see what it is, you're going to realize how amazing it is. So I'm going to...
You want to take a look?
Jump over here. I'm going to elegantly pirouette over here.
Oh, you snuck in there like a...
Yeah, like an elephant. Okay, so [laughs] the first thing is, we've updated the way that discoveries can happen and this is not specific to SAM particularly, but it is pretty spectacular. So here, I want to go to— I'm going to start a new discovery so I'm just going to go to Network Discovery. And I'm not going to tell you what we're doing yet. I just want to work through it. So I'm going to add a new discovery now and start it off. We have that graphic that tells us where we're about to go. It gives you a mental map. I want to discover—oh, look at that, you can scan an Active Directory. How incredible is that? So that's exactly what I want to do. The domain controller is going to be swdev.local.
So while you're doing that, real quick, I want to mention, the reason why this is here is system administrators, they may know networks, but they really, really know their Active Directory. So understanding where your servers are in your environment by natural application, I typically go to my active directory to find those. And that's essentially what we're doing here. We're allowing the system administrator to find their environments for monitoring through their Active Directory.
Oh, I was just going to say, yeah, user rights, user privileges, access control to all these applications that we've talked about.
And so many organizations are in the habit of their provisioning processes is focused around making sure it's in Active Directory, it's in the right location, and that's what we're going to see here. It's important to leverage it. So, the first thing I'm going to do is uncheck all of these, because the only place— even though we have servers in other places, the only place I want to get them from is under Austin. Not computers. I don't want to add all those, because those are either temporary or they could be PCs, or what have you. I just want servers. So I'm going to get about 82. Actually not about, I'm going to get exactly 82 servers in two organizational units. So I can import those servers. I'm going to hit Finish here. So now, all I've done is. Instead of specifying a list of IP addresses, or a seed router, or a subnet range, this just gives us another, and it is cumulative. You can do a seed router and a list of IP addresses and also Active Directory where you can just say, any computers that are in the OUs that I want. So that means that you can have scans that are location specific, you can have a discovery profile that only scans for this remote location or that one, and when it finds new ones. You can do all sorts of things like that. You can tell I'm excited about it.
Yeah, granularity of scale. We can go up and down that stack.
I'm just going to skip right past, we're not worried about agents in this particular case. And we're not going to worry about virtualization in this case because I want to get to another really cool aspect along the way. I'm not going to worry about Windows credentials, believe it or not. Here, under Monitoring Settings, this is another aspect that was added recently. Do I want to manually set up monitoring after the devices are discovered, or do I want to automatically monitor, meaning automatically select? So we took that last step that you had, where you pick the interfaces and everything, and we put it here. So I want to do this. And this is what's called a monitoring profile. I'm going to define these monitoring settings for this discovery. Which ports? Which hardware? Well, I don't want unknown. I might want virtual, I might want access or trunk, you're going to have to decide that. You have some advanced options here where you can have interface that has certain keywords or any of these other elements and you can be more and more specific about your VLANs or about you interfaces that you’re selecting. I'm going to leave it like this for right now. You can say which ones you want. Of course, we don't want to monitor memory or RAM disks, or unknown, or virtual memory as a volume. I don't understand why we do this but we do. And I'm...
Some people monitor it that way.
That's good. That's good. It always confuses me. I'm going to take off all the other— I just want my regular hard drives and I want Mount Points and NetworkDisk, we can leave those in there just for fun. Also, application monitoring. Again, as it's automatically discovering, do you want to automatically apply the AppInsight elements? I'll leave those on. There, now, nothing's happened because I haven't discovered yet. But now, when this discovery runs, it will automatically select based on those features.
And now you don't have to do it after the fact. It does it right now.
It's going to do that. And more importantly— so there's my discovery, and we're going to call it Leon's amazing discovery. There we go. But now, we have the scheduling. Instead of doing a discovery once, I want to do a discovery, let's say, daily. Run this every day, or I can even go to Advanced, and say, I want to add a frequency here. We're going to call this a monthly. Every month I want to discover because this is a remote site, they don't have that many changes.
That makes sense, because if you're dealing with a constant influx of putting in infrastructure, from the virtualization side— we always deal with what we call shadow IT, or additional VMs that get provisioned. And then they go out there, they become zombie VMs and so forth. Discovery is very important in dealing with those potential holes that can show up in your infrastructure.
Yeah, as we start to see sprawl of new systems that come out, even in distributed sites, this allows us to be able to scan and find them whenever they arise.
Again, the idea here is that you can set multiple discoveries. For larger environments, you can't go out and discover everything at once. You can segment things out whether it's by subnet or by organizational unit. As you see here, what I was setting up on the screen was that I want to discover every month on the first Monday of the month at 2:00 a.m. I can set an end date if it's only for a short period of time. So we've now added that frequency and I can set that schedule. Now, I have a few things here. I have the ability to scan Active Directory. I have the ability to pre-select what things are in and which things are out, in terms of when it's scanned, just automatically add it. I have the ability to set that schedule with incredible granularity in terms of when it's going to run. So that was all added not so long ago. Not part of 6.3, but it's been there for a while. But maybe you haven't poked around discovery lately and you may really want to take a look at that. But the automation doesn't stop there. This is all just one step on my way to world domination. Really, what I want to talk about now is the ability to apply application templates in an automated way. Now that these devices are in there—we saw the AppInsight, but certainly, I have my own custom templates, or we have the other templates, and we want to apply those. This is where we get really interesting. Now, I'm going to admit that what we're about to do, because Steve gave me a hard time as we were sort of brainstorming this, why don't we go full automatic? I'm going to take us the long way around the block but I'm admitting we're going really the long way around the block to do this.
But it's simply to show the power and what's capable, right?
Right. So I want to take this step by step. So here I am, looking at some of the custom properties that I've set up on my system and the one that I want to pay special attention to here is distro. We're back to the land of Linux here. I've set up a custom property, manual custom property, that says that this particular device is running Ubuntu or CentOS or Red Hat, or what have you. It's a manual one, just for the sake of argument. I also have a group. The group is called Ubuntu_Servers. It uses a custom query, and if I edit that custom query, just so we can see it's a dynamic query. And what it's doing is, it's querying for where the distro is set to Ubuntu. Nothing special about that, it could've been location, it could've been any other custom property, but that's the way it's working. So anything that has that custom property set to Ubuntu is automatically in that Ubuntu_Servers group. I also have a template and the template is called Ubuntu. And it is applied to—oh, look at that, it's a group. I didn't apply it to a server; I applied it to a group, a dynamic— in this case, dynamic group. So now, any device that is in that group gets this template applied. I don't have to say ‘apply.’ I don't have to do it on a schedule. I don't have to anything. So if I go over to my Application Summary page, you can see that I have this 01_Ubuntu template and it is applied to two devices here at the moment. But, I'm going to go in now and I'm going to go into my Manage Nodes. And looking at that SNMP, and I missed one. I want the ubuntu32. So I'm going to go in here. I'm going to edit the properties. And again, we're going to imagine that the server team, the Linux sever team, as part of their normal process, goes in and they set certain attributes about the device, whether it's production or dev or QA, whether it's in a DMZ location, whether it's high criticality, medium criticality. They might be setting any of those things or it could be synchronized from a CMBD. It could be any method, as far as that goes. I'm going to go into distro, which is, right now, set to none. I'm going to set it to Ubuntu and I'm going to save.
So what this is simulating is if something happened, something automatic in the environment, you've now simulated that capability.
That is the ultimate expression of it. What I'm saying, also, is that it could be a manual process, although the best example of it would be if there was no human intervention involved, if it was elegant simplicity and elegant automation all at once. So that's been set, that custom property's been set. If I go and look at the groups. If I take a look my actual group here and I look at the members, I can now see that ubuntu32 is part of the group. And, if I go to my Application Summary and we hit refresh, what we should see, although it hasn't started polling yet, is that lab.ubuntu32 is now, it's now had that template applied. I did not do that. At no time did my hands leave my wrists. So what you have the ability to do is, based on custom properties or any other element, you can have templates applied. And by the way, if I go and I remove that custom property, if I change it—for example, Tomcat, which isn't an Ubuntu server. If I remove that custom property, it will be removed. That template will be removed from it automatically. Which means that there's no more mass-applying, mass-unapplying, editing of things.
Yeah, a very powerful way to group tiered applications in there and subgroups within there. And you can do it automatically and group it on properties. In Leon's case, Linux-based. [Laughing]
Right, exactly. Because that's all I care about.
So real quick, I want to highlight again, those are some of the new features that were in 6.2.4 that some people might not have seen. Very specifically, that last one is assigning application templates to groups. Without that capability, you wouldn't be able to leverage that existing dynamic group that you had just created.
Right, so now let's talk through the ultimate— again, I took us really a long way around the block. Everything that we did was kind of manual about that, but there's a way to do this, which is really elegant. Talk us through this. We've got a chart that we can put up on the screen and just talk us through this.
All right. If this was done automatically, you would kind of see the process go through these steps. The first part would be--you would have a server provision to Active Directory. It would automatically get added to that organizational unit. And then, Orion scans that organizational unit. From there, that node then gets assigned into Orion. If there's any properties that got set, or maybe it's polling some information, or were just queried via the name, or information about the node itself, then that automatically gets added to the dynamic group through the query. And because that group has been assigned to an application template, that node and that application is already being monitored, as we saw on the screen earlier.
And there's other events that can actually change it throughout the lifetime of that device. For example, again, I talked about changing the criticality of the box. Maybe it starts off as a sev one but then it drops down to a sev three, and that's because the team manually changed it. But it could also be because you have integrated your CMDB with SolarWinds with Orion through the SDK. And if Patrick were here, he would immediately talk about SWIS and SWQL because it's his favorite thing to talk about. But you could integrate that and have those automatic updates happening. But you could also have things changing. One of my favorite but unsung alert actions is to change a custom property. You could, in response to an event, a trigger, a something--a detection of something, a value in a log file that you're monitoring, a configuration value in a log file that you're monitoring--have that create an alert. But it's not— you know we talked about the poke in the shoulder. It's not a poke in the shoulder. "Hey, go fix this." It is simply, "I detected something in a configuration log file on the box." And in response to that, I am now changing this custom property. And that custom property is causing all this whole other cascade to happen as well. So you can really automate the life cycle of a device using this automation.
See, what I love about this is you have a workflow for all these processes. And then the template and the combination of that gives you automation.
It's automatically monitoring and about the applications, which is app-centric troubleshooting.
You know, I've been watching these new features develop for a while, but it's still amazing to see the final version.
Believe me, it's great to see them finally get out in the wild and see how our customers are going to use them. Um, you realize we talked about other features besides Linux.
I think we should wrap this up before it gets too personal. For SolarWinds Lab, I'm Kong Yang.
And I'm Steven Hunt.
And that is Leon "Linux" Adato. Thanks for watching.