[Upbeat techno music]
Hello, everyone. I'm Kevin Sparenberg.
Hey, I'm Nikki Jennings.
And I'm Patrick Hubbard, and welcome to another episode of SolarWinds Lab. It's really great to have two special guests with us today. Thanks, Nikki and Kevin, for coming on.
You bet! Thanks for having us on.
Right, because we're going to be talking about the new features in SAM 6.2 that you guys didn't have time for in the previous episode when you showed the AppInsight view and the Orion agents.
Exactly, but first, a couple of introductions. You, of course, are... Man, it's really kind of complicated. You're a multi-product guru, and you're part of our product management team.
I like guru. I'm actually a former customer who went through and made the transition. I'm a former PowerShell enthusiast, network engineer, exchange administrator, and general geek.
Yep. And, of course, I think most of you know Nikki Jennings, who's been with us for what, about six years now?
Six and a half years.
And Nikki, you were the Vice President of Products for the systems products group, which means you're basically responsible for every feature in SAM.
Pretty much. Actually, we have a great team of product managers, developers, QA. I mostly do a lot of cat herding.
That's true, and it also means that you're Alter Ego's boss.
I am Alter Ego's boss.
So then, you can tell him he has to come and be on the show.
I don't think so. Not even I can make that happen.
Well, I'm going to find a way to make that happen one day, and when it does, you're going to want to be with us. And, of course, you know how to be with us live. If you don't, then you don't see this chat window over here to the side, which should be right there, and you can chat with us live while we're doing a show. And of course, the way that you do that, is to come by our homepage and sign up for reminders, and Kevin, where is that?
That's at lab.solarwinds.com. So, we're just pointing at it now? There's no arm swooshing?
I think he retired it after realizing it was a lot of geek jazz hands with the whole crew.
Okay, there are two main reasons I asked Nikki and Kevin to come on the show today for the hands-on demos. First, you guys are both huge fans of SAM and really give some different perspectives on it that I think they're really going to enjoy, but the main reason is, you guys are both former customers of SolarWinds and actually know what it's like using it in the field.
Yeah, so that makes us and Alter Ego and Leon?
Yeah, pretty much you guys liked it so much you bought the company. Where was I? Oh yeah, so the main reason that they're here as customers is that you guys were senior admins, even, and used these products—in some cases, some really early versions of the products— and used them in the real world. And the products, in fact, that we're getting the two most questions of after our last episode are, of course, SAM ...
Server Application Monitor.
Thank you for untangling that. You guys keep asking us to make sure we use actual product names, and SRM…
Which is the Storage Resource Monitor. It's basically an all-new storage module for the Orion platform.
That's right. So what are we going to look at today?
Well, I'm going to demo the new AppInsight module in SAM 6.2, AppInsight for IIS. It's what you'd expect like the other AppInsight modules for SQL, for Exchange, but there are a couple of other tricks that we can make really cool, as it relates to polling data over the wide-area networks and in the DMZ.
Wow, that's true, cool.
And I'm going to show you how to use the dozens of new resources in SRM, talk about how it's different from STM, and how it integrates with the new features and CORE, like the web-based alerting.
Cool. So let's start with the new IIS dashboard, and then we'll dive into SRM.
Great. You guys get into that, and I'll be back in a bit.
Okay, Nikki, so what are you going to show us today?
Well, I was a huge fan of AppInsight for SQL and then Exchange, and with SAM 6.2, there's a new module for IIS.
And for you guys that don't know what AppInsight is— actually, I bet right now they are furiously typing away in the chat, being the first to explain what AppInsight is, because people seem to love it, and you guys know more about it than anyone on the planet. Basically, those are the app-specific dashboards in SAM.
It's actually more than just that. AppInsight modules contain advanced metric collection and special display resources used to monitor complex applications. They also use more advanced communication protocols when they're available.
Right, like WinRM instead of WMI or SQL-specific queries to get data that you just wouldn't get any other way.
Exactly, and in the case of IIS, how many times do you find yourself going to IIS Manager on a server to get even basic metrics for the websites and the application pools?
Oh, no. I just pull the big IIS logs down directly, and ...
I know for a fact that you do not.
Because I'm really efficient?
Yeah, you're lazy.
Yeah, okay, so I am efficient, right? So, okay. Well, walk us through some of the new features and some of the new goodies in AppInsight for IIS and a couple of the other things in SAM 6.2.
Yeah. We are, in fact, going to need that.
All right, so I have my lab demo server here that's running IIS, and I have AppInsight for IIS monitoring that.
On the summary page, we have some pretty cool statistics that we're collecting.
And this is for one server.
This is for this particular server, correct. And all this is out of the box, so it's very comprehensive. So I've got all my sites here. I have the associated application pools. I have configuration and performance information. So, for example here, I'm going to look at the Current Connections, and so I'm going to drill down in here to see the number of authenticated and unauthenticated connections to that particular site. And so it'll give you that total value, but what's great about this particular view is that you're getting the expert knowledge, so if you're not familiar with IIS and all the fun things to monitor within it, this will give you basically insight information into what it means— and then remediation tips, as well.
Because, I mean, I think we all know IIS, but then when you get right down to it and there's a crisis, it's always that one weird value that you've never really took time to understand because it's always been fine. Kind of like with SQL Server, right? You think you know what all of those metrics are, and being able to actually see it listed, what usually causes it and what the typical remediation steps are really helped me a bunch.
The other thing you showed here, just for a second... Go back up here a little bit. That was really cool, was— Back up here to the top for application sites.
I mean, how many times do you just log into the IIS console just to get that basic information, much less whether or not you're actually getting your CPU information?
How do you know which sites are running on that particular system and what the status of the application pools are?
All right, so, this resource is pretty cool, because it gives us the top X, or top 20, as we have it here, for IIS rendering. So here, if I actually expand this, I've got the verbs—so, post, get, or delete. It'll provide the actual request date, the time frame, and then the client IP that's doing the request. One cool thing that we've added is a linear graph. So this is the extension of the baseline and ‘thresholding.’
Right, and what I like about that, and as you've seen through this episode, there are a number of new widgets that have been added to a lot of these resources that we've never seen before. And what I like about that one is, it actually exposes— or, what would Rob say, surfaces the baseline information to make it visible right there. So, you don't have to go look at it, and then the other thing here was, I mean, you just clicked on it to expand it. But each one of these is an application, so you can actually get it for each app, not just which one's running slowly, and I think you were going to scroll down here a little bit more. I interrupted you a minute ago, but down here at the bottom of the page...
Yeah, so, this is another great one. IIS Average Network Traffic resource. So this provides insight into the traffic that's traversing over that particular server by site or by source. So I can select—I can have a view here to see the traffic that's hitting that site, or I can select non-IIS traffic to determine if something else is running on that particular server that's impacting my websites.
And the main thing that's really useful there is, how many times do you have to try to untangle one particular site that is just hammering everything else. You've got the noisy neighbor problem.
Well, this way, you're going to immediately be able to see that and then decide which one you are going to put on its own VM, or at least redistribute it for somewhere else, and then right down here, you also get the same thing for CPU and memory.
As well again by site.
Again, by site. Yeah, and so you can isolate it, and it will give you the comprehensive view, so you've got all of the CPU, the physical memory, virtual memory, right here in this, or, as you said, just isolate it to determine what is consuming. The sites that are consuming resources.
And the tabs are for CPU, and then you can also do physical and virtual memory as well.
Just walk right through it. So here, we have the Processes and Services resource, so this will give you a list view and then walk you through each one by CPU load, physical, virtual memory, and the consumption.
Yeah, that's awesome.
All right, and then following that, I have the Event Long Message Detail resource. Great thing about this that I love, is that it's only going to show critical or warning events, so you don't have to sift through all the log details to understand exactly as you're trying to isolate and understand where that problem started.
Yeah, and that's actually coming out of the monitor here, so instead of piping all of those logs across the wire and parsing it on this side, you're only pulling out the ones that you are looking for, so it's also reducing the amount of data.
Exactly, so again, it's pooling all of that information about your IIS server into one single view, and allowing you to troubleshoot more quickly.
All right, so we have the environmental view, which is the Mini-Stack, and so this is a contextual mapping. Here I have the application, which is obviously IIS. As I mentioned previously, we have our transactions, so we're performance monitoring, monitoring other response time from different perspectives. I've got my server. It's running on a VirtualBox. It's connected to this particular host, and I can literally walk down this entire stack all the way to the storage array level.
And this time, I get to untangle you for a second, because we have been talking internally about stacks and AppStack, and you've heard us talking about it. And you'll remember going back almost a year ago, the first time that we showed the integration between STM, VMAN, and SAM, sort of all of it come together, so, we talk about it internally a lot.
So just remember, whenever we talk about stack, what we're really talking about is that joined view of all the resources that are related to a particular thing.
So it's the context of, this is my VM server and all the applications on it. This is my app and its transactions. This is my app, its VM, storage controllers, and all the way down to the spindles.
That's right, and again, the idea is from the application-centered view. So my application is having performance issues, where is it actually? Where do I start to troubleshoot those? How do I find the root cause?
All right. So now that we've done that, I'm going to actually walk into a site's details page, and again, just like any Orion product, the deeper you get, the more detailed of information that you have.
Okay, but I've got to stop you right here.
Oh, I know this is awesome.
Yeah. Well, this is really awesome, but the first thing about this is, the number one thing that you do to log into the IIS console is to do what?
Restart a site or an app pool.
Exactly, and so right there in the upper left-hand corner, I can restart and stop, and I can also unmanage those as well. And this one, I get ‘unmanage’ as an option because it's a website.
If it were a…
Application pool, I would actually get, what do you get. You get restart and...
Another thing I want to say about this is that, if you don't want to have, or if you don't want to give all your administrators this functionality, you can change that— change the permissions on that, to not allow them to go in here and make changes to your production servers.
I did not actually know that.
Or production website.
That's really cool.
All right, so side details. I've got the URL, I've got the log file path, so as you are troubleshooting further, you need to understand where to go get those logs. That information is quickly displayed there.
The bindings, yes, exactly. And you can actually launch and browse to that particular site from within this site. I have the connection information, and again, you can drill down into each of these. It's going to give you the expert detail level and remediation recommendations, providing there are any issues. I have request information here. And so on this page, I'm looking at the network IO, so I can see what's actually coming in and out of this particular server to understand any performance impacts as it relates to the network.
Isn't this just awesome?
All right, so one other cool feature here, is that we have the log size by file. So, how many times have your logs blown up and killed your server?
Never, especially in a VM. That has never, ever, ever happened.
I've never had to expand volumes.
So, we'll have this on all the details page. And then, of course, just like anything else in Orion, you can alert and report on it, so you can take action before you have a down system.
Well, that's great. If we look at the folder path here, it's telling me where that file actually is.
Because half the time, when you run out of space, it's because you set up IIS, you think you got your log files configured off somewhere where you've got them, but it's, you know, you don't care so much about the latency—
—of the storage. And it's like, it's not grade 10, it's just fine, and then you discover, no, it's actually on my system drive in the default location. No wonder I'm running out of space. So, not only am I running out of space, but was it where it was supposed to be in the first place?
Or even better, getting an alert and knowing where to go to actually take care of that, to make up, to create space for that volume.
That's a great point.
All right, so, we're back to the summary view for IIS on this particular server. I want to go into an application pool details page. So again, you have your remediation actions. You have Application Pool Details, Start Mode, very important to know: On Demand or Always Running. Lots of great data, but one of the things here that I wanted to point out is the Worker Process Activation Service.
These things can spin up and it's really hard to monitor all of them, and understand what the contention is on that system.
I think I might have wasted more time trying to untangle WAS than any one single thing, because it's really not included, even in the IIS GUI—
—Manager. It is available with PowerShell.
Mmhmm, that's right.
Aka, from our perspective, WinRM.
But unless you're sitting there constantly polling it, it really is a pain, and pulling it up here has really saved me a lot of the times. I mean, I still kind of manage a couple of our demo servers on the outside, but I'm not going to tell you guys which one ... [Patrick coughs] Chat. But it's just really great to be able to get access to that.
Yes, exactly, so you have all that information and again, you can set up your alerts and your thresholds. All right, so, we are actually looking at the summary view for that particular IIS server.
Okay, so not the application.
We're back on the actual server. Now, again, I can navigate to all those— the resources for IIS, as well as transactions.
And there's the Mini-Stack view to remind you that this thing is a part of something.
There's the Mini-Stack! It's always there. So one of the cool things is that, I'm actually polling via the agent. So if you're not sure what you're polling, then this is a great place to go and actually see all that information about that particular system— how often you're polling it, but specifically for this, we're polling by the agent. This server's located in a DMZ. I deployed that agent's secure encryption back to my main collector.
Well, and the other thing is, and this is where it's really important, and that's a great tip, is that especially in DMZ. One, you're hoping to have only one ACL that you're opening for the management port.
But the other thing is, just like AppInsight for Exchange, where we're using WinRM to effectively do PowerShell to get this rich information.
If you're doing all of that WinRM over the wire, it's not going to be nearly as efficient as letting the agency do it.
It's the latency that— exactly, that's right.
Or, of course, if you don't want to install the agent and it happens to be close, you can still direct monitor it, just like you would before.
And I think it'll use the same WinRM setup agent that the IIS— the AppInsight for Exchange uses, right?
It does, yes.
So you make it monitored, and then you click on add AppInsight for IIS, and it'll walk you through the process.
Through the deployment. That's right.
It's super simplistic.
Yeah, that's really awesome.
All right, cool.
Okay, Nikki, there's one thing I want to ask you about. I know what this is. They have not seen this before, because this is new to SAM 6.2.
Yeah, it's the Interface Downtime resource. So, we are polling this information via the agent, and this will give us information on what the status of that interface is. So if you are NMSP, and you're providing SLAs that you have to meet for your customers, this is a great resource and reporting mechanism to do that. Or if you're a customer and you want to know if your MSP or ISP is meeting their SLAs, this is an awesome resource for that as well.
It's really nice when you call support and ask about service quality and you have a chart. There's just no way that they can wiggle out of that.
It's a beautiful thing.
And of course, there's a little thing here you guys might have detected, is that this is not using NPM. Normally, interface-polling details would be a part of NPM. This is now included with SAM.
It's polling that either through the direct polling or through the agent method.
So you're going to be able to get that interface that's supporting this application now, not just for this, but any of the apps in SAM, as a part of this release.
That's right. You're not restricted to SNMP; so again, you can do that through the agent for this particular instance.
Okay. So, I'm used to AppInsight for Exchange and SQL Server. So, how's this one really different?
Why do you insist on asking leading rhetorical questions like that when you know the answer?
I don't know. It's a habit? I like to annoy guests with it? [Patrick laughs]
You be nice, especially to Kevin.
I will, when he comes on.
Orion agents, this is why this is different. Here, pull up that chart you made for the last episode. Where's IIS usually running?
In the DMZ.
Exactly, right here. So if you combine the two in AppInsight's Super Detail module for IIS with an agent using... Using... Come on, Patrick, this is awful.
Right, Swiss, on port 17777. You get both way more detailed monitoring and single-port access to make DMZ ACLs easier to manage.
That's right, and it applies to both servers in your colo, remote, cloud--wherever they happen to be--and it also makes them a lot more efficient on the wire by reducing the number of verbose queries that you're executing, especially like WinRM, for example, when you're actually doing, you know, essentially PowerShell Remote, that can eat up a lot. Plus, it's short-term unreachability tolerant, and you don't get the gaps in data or false alerts due to temporary lapses in reachability.
And he's back!
I stop flapping my gums for two seconds, and everybody freaks out.
No, your object status went into warn when you exceeded the unreachable counter-threshold.
So now, you've calculated a threshold for when I don't talk enough?
Of course not. It's done with auto baselining. [Patrick laughs]
All right, well, thanks Nikki so much for showing us all of that, and let's do this. There are a couple more features that are SAM 6.2 proper features that we didn't get a chance to talk about, so will you come back in a minute after Kevin and I talk about SRM?
Storage Resource Monitor.
Okay, Kevin, let's talk about SRM for a second.
The most obvious question, and I know the one that you guys are out there asking right now, is why build a brand-new module for the Orion platform instead of just extending the existing Storage Manager product?
Well, there are a couple of reasons, but first and foremost is that customers requested it. When people want it, we try to deliver it, and they were asking for storage on Orion.
Just a little bit. I do believe more than one of you posted something about that on THWACK.
Yeah. So, the other main reason is, although Storage Manager is very, very mature, it's a great product, and it works exactly what it's supposed to do, it's really not designed for application-centric stuff. It's designed for mostly reporting and baseline troubleshooting. It's LUN-centric and not app-centric.
That's a really great point, because you can monitor LUNs all day long, and as long as you don't have degraded application performance, I don't want to say it's an academic exercise. You're still doing capacity planning in a bunch of other things.
But, it's not that kind of critical, "oh no, why did this stop working?" until the application really begins to suffer.
Yeah, and that's kind of what we're trying to bridge this gap with, so when an application does get into trouble, it can be a trick to isolate storage as actually being the problem. But since we now have storage as part of this entire application stack, you can go in and notice that storage is a problem. Not only that, you can actually see if it turns out to be your root cause, quick and easy, using the Mini-Stack.
Right, and it also shows up, you know, with the Mini-Stack whether it's a part of that application. You get right to it. So let me ask you this. So, here's an example. You're just looking at a line.
Right? And you can measure utilization, capacity exhaustion, all day long, but the other thing is, other than going to unreachable, it really, from the STM perspective, doesn't have a state, per se.
Yeah, that's pretty much it. But if you actually make the storage application aware and the monitoring system that looks into all of that, you can actually find out LUNs that are in a warning or critical state, so you know when things start degrading before they just go offline. Right, so that's a big part of that, is that now you're actually getting state for storage components, like true state, because it's based on the state of the application and connected resource. And I think that actually goes for all of the other components, right? So controllers, volumes, and modules, which didn't have status really either before.
So, all of those components, you can determine what their state is.
So okay, but— Okay, there's one other thing, is that when we saw, I think Rob talked about it for NMP 11.5, and I know in the last episode, that there was also some demo of capacity planning.
Resource planning. Both for SAM and also for NPM, but I was thinking about that the other day. That's something that's coming from CORE, right?
And almost always, NPM and SAM drove— Cool features in CORE, but in this case, what good is capacity planning going to be on an interface? I'm not saying it's not good ...
But compared to everything else, it's not that great. So then, I was thinking about it, and I realized that SRM, the Storage Resource Monitor, actually drove this in CORE.
Yeah, this entire portion was built and delivered for SRM specifically and then CORE, so NPM and SAM get to leverage those settings.
So, ha-ha, guys. You didn't invent it here for CORE. That's really great. Okay, so you want to take a look at this?
Sure, we'll go ahead and dive in. So I'm really excited about a lot of the things that the new Storage Resource Monitor gives us. One of the things is out of the box, we do support many of the devices that all of our customers currently have. So, we are supporting most of the Dells, the EMCs, and the NetApps, and something they want to call attention to here, is that we do support NetApp in seven mode or classic mode, and we also support NetApp in cluster mode, so C-mode.
Right, and that's a big difference, because the original STM product doesn't do that, so that's an upgrade right off the bat.
Okay. So, there's ways to dig down into all of this information. We'll just take the NetApp cluster mode as an example. So, you can dig down through the virtual servers, through the volumes within those, all the way down to the individual volumes, and then separately, look at the storage pools, the aggregates, the LUNs associated with those aggregates and volumes, and they map backwards through the storage stack.
That's awesome. And it's pretty evenly normalized for all these different vendors, so you get one way of looking at this.
Yep. One of the real things we tried to stress in the product is that the statistics you're looking at here are universal across all device types.
Okay, walk me through some of the popups here, the hover-overs. Those are really great.
If we just pick one of the resources here to float over, you can go ahead and see that the VServer is in an OK status. It's running. NetApp can get the IP information. Not really running a lot of aggressive IOPS, but you do get to see the NAS capacity and the used capacity, which is also replicated over here in the All Usable Capacity Summary.
Right, and then if you drill, there might be a lot of them, obviously.
You could drill into any one of those individually and see the details.
So, if we jump back out to the EqualLogic site that we already have many things on, you can pick out, array one LUN, and you can see all of them. Now, the one thing we do is here is, because they can have many, many, many, is we do clip it. So, if you actually need to see the three additional ones that aren't in this view, hit on that, and it'll expand down for you. The mechanism that we use to actually gather this information really depends on the vendor. So, for these EqualLogics, we can actually get this information through SNMP.
Whereas for EMC, we have to leverage an SMIS provider.
Which is basically a small program installed on another machine somewhere that proxies connections and reporting back from the EMC storage devices.
Okay, so that's something that hasn't really changed. There are always going be vendors where you can connect directly to the system, where you actually have to go through a proxy, or there's a particular API. So that piece of it—I mean, if you're still using that provider, you're still going to have to use it.
So then, how do you untangle what you're going to need by a vendor?
Well, that can be a little tricky. For anyone who's actually dealt with storage from multiple vendors at the same time, we've tried to streamline all of that with our new Add Storage Object wizard.
So to get to that, all you do is go to your main settings page, and in the top, with Getting Started with Orion, it actually has an Add Storage Device option. Just like you would add a node in the past, although it's a little more difficult.
These are just so great. It's funny, because the first time that I was looking at this, you guys weren't done with this yet. And it was amazing to see them finally get the final art here, because I thought, okay, it doesn't matter, you can give me text, but to actually show the pictures of how this is organized is really, really helpful.
Yeah. So, most of them either talk directly over SNMP, or directly to some type of API that runs on the device itself.
That's not always the case, though. And if we just, for an example, pick EMC, you can see here that we actually spell out what you need to put in there.
You're going to need SMI-S somewhere.
You need SMI-S somewhere, and you'll have to configure that, but if you don't know how to configure that, because let's say you're more a network admin or you're more an application admin…
What do you mean more?
I mean, you are!
Everyone wears hats.
Listen, I know that there are guys out there who specialize in storage. I have met a few of them, they are amazing, they are deliberate and careful, but the fact is, every time I've had to mess with it, I have not understood everything about certain— Wait, I didn't say that out loud. I have not known as much as I wished...
You heard him, you heard him.
…That I knew about storage. And I have had to go figure it out.
So I'm way less likely to drop, I don't know, a LUN for the finance team this way.
Well, I suggest against that in the general scheme of things. But one of the things here, because other the extra layer of complexity that can come in with something like an SMI-S provider, because it's not running on-board, and you can't just talk directly to it. We actually gave you help links throughout this entire add wizard. So if you get to, let's say, for this one, you get to an EMC VNX, and you're not 100% sure you've got it configured right, just use this as a checklist.
And they're specific to each one, so you know where to start.
Wow, that's great.
So, after you've taken the time, and added your storage device, you can actually go in here and jump immediately to the Summary, Performance Dashboard, or Capacity Dashboard. They're drastically different views of all of your information rolled up into one. We're going to start with the Performance Dashboard, because more than likely, if you have a problem with an application, it's not always due to capacity. More often, it's due to performance problems.
So, that being said, you can see here, in one quick view, the number one problems: sorted by latency first, and then by IOPS, showing how much traffic and how much data is going across into these particular storage objects. And determine if, well, maybe this one for our Tokyo Hyper-V Cluster 2. Well, it's got some really high latency. I don't want to put anything there that needs a quick IO.
In other words, don't put SQL in this.
Yeah, probably not.
And the other thing I noticed, just sort of as an aside. And I just noticed looking at it, and it's going seem like a small thing, but if you look down here under the NAS Volumes by Performance storage pool, the little storage pool icons here— it just reminds me that I am not looking at the old STM integration module here, right? Because we were giving you things like IOPS, and some of the other performance about your storage arrays. It was integrated through Swiss, pulled up directly into a lot of the apps, but we didn't have the AppStack View yet, and that data was being pulled by STM and then polled in, but in this case, these are just regular native— Orion resources. You can drop them onto any resource that you want, storage or otherwise. If they're details, you're going to put them onto a details page like this. So just looking at this icon, it's hard for me to realize just how much capability there is in this version. Now, a lot of it, though, was because you inherited a lot of capability in STM. What's the version number for this?
This is 6.0.
So we're launching the version 6…
We are launching 6.0.
Of a brand new product.
Because it's also released as 6.0 of—
Which is now called what?
Storage Resource Monitor on the Profiler.
Storage Resource Monitor on the Profiler, so you basically have SRM.
And SRM Profiler.
That's really—that's great. Okay. So, just so I understand, you still get, if you buy SRM, you get what is now SRM Profiler, which is the old STM product in addition to SRM?
You are absolutely correct.
If you are a current STM customer, you will get SRM for free.
So basically, they're the same thing, and you get both versions, and you are probably going to use this if you're already using the Orion platform.
But if for some reason you need STM, it's still available to you.
And they can be both run simultaneously.
Oh, that's just great.
Capacity is also something you really want to monitor with your storage environment, because although it may not be the root cause of general bad performance, it can take systems down pretty quickly.
Yeah, that'll be one of the first thing that— It seems like an obvious thing, by the time you troubleshoot down to it. Oh, I ran out of space on— But how many other things do you end up looking for first to figure out why an app is down or running slowly?
And the other problem with that is, a lot of times, you can take most of these storage objects and actually oversubscribe them, which means you provision a thin LUN, about a couple of hundred gigs, and you start filling it up. Then you provision another one, another one, and then you've overprovisioned it, and you don't realize it, until all of the sudden, it's down and gone.
That's absolutely true.
Yep. So, we actually have the spark charts built into all of these resources. You can cycle through all of them.
And let's just say that again.
I have said spark chart before when I was talking about the combined stack chart view, sort of the tape chart view, in AppInsight modules before.
The ticker tape one.
The ticker tape one.
That is a spark chart.
And it's just great, because that spark chart is another one of the widgets that we've added in SRM that's now going to be available in CORE, so I'm looking forward to seeing that all over the place.
Sure. So, it's actually a really great little feature. You can go in; you can dig into any individual one. Obviously, both sides are clickable, if you want to dig down into them. It does also tell you whether it happens to be a NAS volume you happen to be provisioning, versus a complete array or any of the other information that's really interesting for that.
For a noisy neighbor, that is so handy.
Absolutely, especially if you have something for the array side. You start seeing capacity maxing out on this, then you know that, well, probably all the LUNs underneath of that particular storage array are going to get close to their max as well, and of course, all of this can be alerted on... Like we see on the right-hand side. These are all alerts that come from the CORE, from the Manage Alerts from the web UI.
Yep. So the big thing you get from Storage Resource Monitor is that bottom portion of your environment view, or the AppStack view. So, one of the things that I really like about this, that I would use it for frequently if I was in this is, I'm a storage admin, and let's say I'm going through, and I've got some work I need to do on a specific storage array. So I'll pick this one out of hand, and then one quick refresh, and I know every single thing that is affected in my environment top to bottom by that particular storage device.
If there was only some way to isolate that view. There's a lot grayed-out stuff. What would I want to do?
Well, thankfully enough, we've actually thought about that, so if you actually go all the way back to the top, we've got the little eyeball, and we call it Spotlight. So we've hit Spotlight, and everything else that's been kind of grayed out just drops off.
Oh, but if only there was a way that I could save that, and some way that I can open that view directly.
That's something that customers have also asked for, so we've got this layout. And if we really like it, we can go up here, and we can save it as a new layout, name it, and then every time we come back here, we can pop to that one.
Okay, so in defense of what I said before about not being a storage geek…
I could almost be perceived as a storage geek if I could just use that for inventory management in my storage resources.
Because then somebody asks you a question about storage, and, oh, you know, man? I can actually tell you exactly which elements are a part of that application.
Absolutely. And one of the things that I like is really for requests for change. If you've got a big change coming in, you've got a major firmware edition coming up, you've got to update this, you've got to update that, it's going to take down devices, and sometimes mapping those back from the bottom all the way to the top is very, very, difficult, but the Environment View saves you.
How awesome is it, because you know what's being affected, to be that person? Normally it's sort of that oracle of knowledge admin who you say, I'm going to down server x, y, z, and they kind of lean around the side of their desk and say... That's going to knock off these four applications. You wonder how they know. This would actually be a way to do that, so you can kind of secretly lean out there and say... Here's an inventory of exactly what's going to be affected. Maybe you want to actually create a change management window for that.
Yeah, more than likely. A lot of people will always go in and say, oh, well, it's not going to affect any production apps. This'll let you prove that, one way or another. So there's one thing I have to show you, and that happens to be when you dig down all the way down to the LUN Details View. You can actually get a lot of performance metrics specific to this LUN. A lot of that information is represented right here on the main page. These ticker tape charts, like we mentioned before, are... We've got your total IOPS through here, but sometimes, you don't need total IOPS. Sometimes you need to be a little more specific.
I clicked it. [Patrick gasps]
So now, we have the read versus your writes.
Oh, I just love this, because, okay, this, again, is a brand new resource, and not only that, it's a new type of chart, driven into CORE by SRM.
So here, if you... You know, you hovered over it before, so it's just like the ticker charts in... AppInsight, for example, right?
But they've added the... Scalable charts on the bottom that goes all the way across and then this Expando... You can actually walk through that one at a time, because half the time, I want to see what my read performance is for IOPS, but I don't care about everything else. But it's nice to look at throughput next to that.
Really, really handy. I don't have to look at 13 things all at once, but I can expand just the areas I want to see. Oh wait, you want to look at read versus write IOPS?
There's actually a better resource.
Well, let's see.
So, all you do is click on the rate details from the previous page ...
So, we get the Rate Details View.
We get the Rate Details View; we get the Latency Histogram across, which is actually really nice to let you know whether or not you're hitting latency all the way across that particular rate group.
Color-coded bars by performance—is that new?
Yes, it is. [Patrick laughs] And also, it'll actually tell you the number of LUNs on there, so if these ones are showing problems, you've got—oh, 11 of them are calling out for information. This one, you only got the one that's coming, or two that are coming up. Same thing applies at this level. You can break out the IOPS into the reads versus the writes, but let's say you really just want read versus write. So from the LUN Details page, I can click all the way up to the Array Details. Just click on the name of the array, and it takes you out to a view similar to this. Now, one of the things we've added to this is this large Array Status selection here, which will actually break down with float overs and various points, so you've got all the hover-over information used in collecting all of the other portions of the applications.
Okay, the SAM team is going to have to figure out what to do with this. Because right now, this is the only place we're using this new resource. But what I love about this is, and it's—you know, when we looked at this yesterday, it was actually color-coded because it was in a warn state. You get a color-coded numerical value, color-coded bars with hover-over information. You get the percentage written out, and you get the bar, so there's like 20 different pieces of information with a time zone selector or a time window selector, all in that one resource. That one is really, really handy, because how many times do you go right down to that array, and it's the very first thing you need to do is just: how is this thing performing? What is the total IOPS that I'm getting? How much throughput do I have? And what does it look like over time?
Yep. And the one thing you did forget to mention is, you actually get the floating icons for critical or warning states, which can float over and then dig directly into the alert for it.
Ah, even better. And then of course over here on the side, Mini-Stack again, because now I'm just wondering which applications are affected by that and which VMs, so I can see that right there.
Yeah, so you know if you're having a major problem down at the storage level, it's going to affect all of these particular layers. But let's say you're actually interested in your reads versus your writes, because sometimes, depending on how you build your storage, one's better than the other. Well, if you're actually curious, you can look at them together on the same chart broken down by whatever hour selection you want, and, just like the other charts, scale it out as much as you want, dig into whatever detail level you want.
God, that's just great. All right, so the question then is ...
I think this is pretty cool. As an Orion, person, I am mostly likely to want this on my Orion installs, as opposed to having it in STM. Nothing against STM storage. It's a great product. But ... We've been talking about the new web-based alerts; I am assuming that we're going to inherit a lot from the new alerting engine.
We inherit everything from the alerting engine. All of the storage alerts are built on the new alerting engine.
How many new alerts come out of the box?
That's really great. And they're not just for volumes and LUNs. They're also for what?
They're also for performance on all of those, and they're also for the array level. They're for the provider level— so if, as we talked about earlier, if you have that problem talking to EMC through their provider, you can actually be alerted when that particular provider goes down.
Okay, that's handy. And then, I guess the last thing is, you're also getting status. We talked about that before, so now you are actually getting real status information that you can do status-based alerting on any of those components.
That's a huge new enhancement, and I wish we had time to dig in there. I promise we're not dodging that. It works just like everything else, because all of these features in the web-based alerting engine that we showed in the episode before last with Rob ... This was designed to support this, and it's the reason we spent two years, almost, working on the new alerting engine.
Yeah. Because it needed to be able to encapsulate all this functionality, knowing that we were also going to do SRM. SRM was a big driver for that, because it really extended it, and there was no way that the old advanced alert engine was going to be able to support it.
So going back to the, you know, do we start with something new, or do we just expand what we had? It was just time for us to come up with that engine.
Yeah, that's great. And then the last thing, of course, is going to be what?
Reporting, as it always is.
Yes, because you need to occasionally do inventory, volume utilization, consolidation reports for managers, and…
Yep. And one of the greatest things about this is, because you have to do that thing for various manager levels, you deal with your CIO, and you deal with the storage administrators, and possibly you're dealing with application owners, people that just run an app on a server, and that's their entire purview. You can set these up using the advanced reporting engine and actually send them scheduled reports, with either links to the pages or actual attachments, and they can actually go through and look at storage for their particular windows.
And when you generate those reports, because you have application awareness, it's easier to generate the components for a particular application or a service that you're offering— not just, hey, I've got a spreadsheet, and I went and decided on my own that these storage elements are actually supporting these applications. You can go to an app owner, for once, which, unfortunately, app owners seem to have all the juice when it comes to budget, right? [Kevin laughs] You can actually go to that person and say, your app performance is being affected by the performance of your storage, and maybe we want to talk about SSD, that's something that's going to be a long conversation, or it could be a long conversation.
It could be a long conversation. But now they have the ammo to go in and defend it one way or the other.
And you can actually show performance maybe of another application that has transition, to give them some expectation of what they're going to get when they spend the money to upgrade their storage.
Wow, that's just great.
Hey, that was great! Thanks, Kevin.
Oh, it's no problem. I'm glad you didn't try to cram all of that into the last episode along with AppStack.
That would have been a lot.
Yeah. Customers have been asking for an Orion-based storage monitor for literally years, so there's a lot of features to cover.
Yeah, and it's basically that everything is in STM that you're probably familiar with, now called SRM Profiler, could do, plus a bunch of other new app-centric features that STM just couldn't do.
Yeah, that's correct. I feel like I ate up the entire episode, but there are a couple of new SAM resources you wanted to cover before we wrapped, right?
A couple that I wanted to cover. These will only take a second, but lots of you just asked for them, so let's take a look.
All right, make some room.
All right, so, real quickly here, I just wanted to show you guys this because it's pretty cool. So we're on the summary page for AppInsight for IIS. I'm going to select the Transactions tab, and that's going to take me over to a view. So we're using Web Performance Monitor. So, it's better, deeper integration, so I'm able to have an understanding of how my users are impacted from various locations.
Okay, but the thing that I like about this is normally, you would go up to the web tab, and then you'd go find that server or find the transactions that were associated with the server, you get the exact same information, but here, you're not getting more than you had. You just don't have to go anywhere to find it.
You don't have to piecemeal it together.
That's right. And then you've got the Mini-Stack View right there on the top, so that it's also obvious that the servers that are involved all the way down to the ...
That's right. The web performance transactions will be populated in the mini style.
Yeah, that's awesome.
So lastly, real quick, I'm going to go into the new web-alerting engine in the Orion core platform and talk about this real quickly, because I think it's just super awesome. So, we have an alert configuration here, and essentially, what this is doing is restarting the application pool when it goes into a down state.
What, you'd want to actually do that?
Yeah! Save some time. [Patrick and Kevin laugh] Unless you like to restart application pools all the time.
So I'm going to go into the Trigger Action here, and this is where you define the application pool Restart Failed IIS Applications.
And you can also obviously parse through your variables here, so when you get a notification that an application pool went down, and a restart was actually executed ...
Yes, I'm not sure how much Leon is actually correct that that verbose messaging is all his. And then the last thing in there is, yeah, you've got the Start Application Pool happening right after the notification. So this one, if you were going to search for it, is AppInsight for IIS Application Pool Restart, right?
Okay. And you just clicked on it and drilled in here, but yeah.
That's right. Right, so you set up the alert, and you're on your way, and it saves some time.
That's just really great. Yeah, it's so funny. Those new alert actions that are a part of web-based sorting, we all get to--all the modules get to inherit it.
I know Rob thinks that it's all his, but no. I mean, it's definitely, I can see why they spent a couple years working on it. Thank you very much for that.
There's one other thing that I wanted to give as an example of customization, and Kevin, don't hate me on this. I think you guys are really going to get a kick out of this. So let me show you what this actually looks like here. And now, we showed you the capacity planning before. This is actually a capacity planning summary page that Kevin built on here. It does not come out of the box, but he built this in, what, like five minutes?
Five minutes, yeah.
And I wanted to show this to you so that you can be thinking about how you can reuse those capacity-planning resources, since that's something new that comes with this release. You know, here we've got when basically we're going to run out of space for CPU. You know, when we're going to go in a warning state, when we're going to be critical, when we're going to be at capacity by server, top tens, like we would normally get, and then right below it. Wow, we're going to see the same thing for memory. When we're going to run out of disc. Oh, man, we got a lot that we're just running out of right now.
Things you need to work on now!
Yeah, I think we need to head down there and swap out some disc. And then our interface capacity, right? But the cool thing here is that all of these are the exact same resource, and you went in, basically dropped four of them on, or did one and then copied it four times ...
Just did one, and then just hit duplicate, duplicate, duplicate.
Yeah, and then you just selected which one of the metrics to include. I mean, you could throw them all on there at once. But you actually went ahead and just said, give me a table view with all of that, and then you now ended up with all four of those on the same page. Now then, the other thing, and this is where he's going to hate me, is over here on this side, Building a Custom View. This looks like it should be in the product. I have a feeling you're probably working on that. [Nikki laughs] But what this thing really is, is he was working with--A group where he wanted to make it easy for them to add them, so this is HTML, and I don't know if you exactly cut and paste this out of the Help resource, but I think it's something like that. And believe it or not, this is just a custom HTML resource with those elements added, so the cool thing about this is, if you were working with someone on your team who maybe isn't an expert in building views, or maybe they're a manager or a non-technical resource. You can actually, essentially create how to work with the data that you're giving them, right there in the view, and throw it in as a custom HTML resource. But anyway.
Oh, Nikki, one more thing.
Not unlike the Windows service stop default alert, if you actually go in and turn on the default alerts for app pools stopped or websites stopped, they will automatically restart those, which, 90% of the time, is what you want. But if you're in the middle of doing maintenance ...
Your best deal is actually to flag those nodes, or the applications under them, as unmanaged.
That is an excellent point. Thank you. Well, I think that's it for today.
Yeah, and you know, the funny thing is, the episode before, when Rob and Leon were doing the demo of the new web-based alert manager. I wanted to jump in and actually point out that start app pool action, because there are now so many different actions that have been added, and it's just really, really great to see that. So, thanks again for the demo.
Kevin, can you think of anything else you want to do?
No, I'm good. My only note is to check out the SRM forum on THWACK, if you have questions and want to let us know how it works for you.
Cool. And I mean that's such a great point. So much of what we do, and especially this show, is driven entirely by feedback, and THWACK is the best way to get it to us. Now, of course, the best way to get it to us is to swing by lab.solarwinds.com. Give us feedback about the shows, tell us what you want to see in upcoming episodes, and you can also sign up for reminders about upcoming episodes. And then make sure that you're here for our live chat over here to the side of the window. I think that's about it.
All right, well let's close this sucker out.
All right, I'm Patrick Hubbard.
And I'm Kevin Sparenberg.
And I'm Nikki Jennings. Thanks for watching SolarWinds Lab. [upbeat techno music]