This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

My take on a crisis dashboard - Don't overdo it, know what's changing

It's fun to MONITOR ALL THE THINGS, right? Right. Always. Yes. Please give me more things to monitor. There's a trick to translating all this data into something useful, though, and as monitoring gurus we excel at this. SolarWinds provides us with the dashboarding tools we need to summarize important data quickly and efficiently, and during this global work-from-home experiment there are a handful of key metrics that us operations folks are keen to keep an eye on. Every organization is going to be facing this challenge in one way or another and, more than likely, the data that's important to you is different than the data that's important to me. Let's explore the latter and I want you to tell me how it compares to the former.

My organization owns and operates schools in 9 states, and last week Ohio was the first of them to announce they would close their schools state-wide. This more-or-less confirmed our suspicions that we would quickly be shutting down operations in our school buildings. We had already been discussing the company’s pandemic plan earlier in the week and made the decision to expand the subnet for our remote VPN users, since we have approximately 6,000 employees and it was only a /23. This ended up being a really, really good decision.

Our network is a big hub and spoke, and any users from the spokes who want to work remotely must connect to the VPN via the hub. This means that files pulled from spokes over the VPN consume bandwidth in the datacenter. Lucky for us we use split-tunnel on our remote user VPN’s, so that takes a significant amount of strain off the hub. Regardless, we put a plan in place to double the bandwidth but leadership wanted to justify it first. You’re telling me there’s a chance to have a gig at our core and all I need to do is send someone a chart? Cue SolarWinds.

As for collaboration, when people ask what tools we are using my answer is always “yes”. There is no mandated collaboration tool that the organization has chosen, so it’s a bit hectic. In the last week I’ve seen meetings via Skype for Business, Slack, Teams, Zoom, Google Hangouts, and Adobe Connect. Safe to say that there’s going to be a push to consolidate once this whole thing blows over, however this doesn’t impact me at all since everyone is using these tools from home. There’s not really any monitoring I can provide that would be useful.

Luckily that’s where the story ends for me. There are dozens of others in IT who are scrambling to put together e-learning initiatives with Google Classroom, deploy Chromebook agents for content filtering off-site, support ten different collaboration platforms, integrate new applications, and much more. 100% of this content is hosted in the cloud and there isn’t much I can do from a monitoring perspective to provide insight for these services. Our monitoring watches the datacenter/main office (which share a building and infrastructure) and our schools (which are empty). The rest of the services being used by the company are “business as usual” and aren’t affected by the current situation, so there’s no need to watch them any closer than we already are. The context for our monitoring has always been focused around the schools and the datacenter, so my dashboard features the latter, but only for services that will be impacted by the sudden turn of events. The metrics we’re focusing on are:

  • Total concurrent VPN users (Cisco ASA UnDP 1.3.6.1.4.1.9.9.392.1.3.1)
  • Total concurrent connections through the Core ASA
  • Avg/Peak bandwidth on both datacenter circuits
  • CPU/mem on core networking hardware

Crisis Dashboard.jpg

 

PS. Kudos to  and his post about custom colors in Perfstack

Parents Reply Children
No Data