I think I can speak for many people in IT, in saying that when crises happen (COVID-19, large scale weather emergencies, power outages, etc.) there's always an added level of stress added. Your carefully crafted business continuity plans are put to the test and you hope that your infrastructure can handle the burden. Keeping people informed is crucial when operating in crisis mode.
Recently, Microsoft posted a great article on how to build a crisis management site. I love this idea, but it's not really an IT resource is it? It's great for policies and updates, but doesn't give me the information that I need to continue to do my job effectively in stressful times.
If you are anything like me, you're busy working the issues and don't have time for a million questions. We always say that a picture is worth a thousand words. So with that in mind what's a good dashboard worth? 6.023 x 10^23 words?
In my previous role, these are what I think resources like these would have been best on a dashboard for my company:
Now, this is just what I came up with on the fly and it can't/won't apply to everyone. So my question to you is this: What makes a good dashboard for your company when you are in crisis mode? This is a discussion not a think-piece, so do not hold back on the comments and if you can share pictures, I (and the community) would love to see it.
Our network is a big hub and spoke, and any users from the spokes who want to work remotely must VPN in via the hub. This means that files pulled from spokes over the VPN consume bandwidth in the datacenter. Our top metrics are:
- Total VPN users
- Avg/Peak bandwidth on datacenter circuits
- CPU/mem on core networking hardware
Nothing fancy at all. Simple but effective. Luckily for our organization (K-8 education) a vast majority of our resources are on the internet, so there isn't a whole lot to monitor. We're not hosting our own collab services and our buildings are empty...
I'm curious - are you users using split-tunnel VPN or a tunnel-all VPN approach? Would that change what you are working on?
We're using split tunnel, thank goodness. If we weren't then the datacenter would be overrun before 8am. If we weren't it would have expedited our order for more bandwidth into the building. It would also likely mean we had made some conscious decision to monitor our users way more than we currently do, which would probably mean I'd be interested in some kind of flow data for context on what the traffic is.
Love this dashboard! Is this custom built or something OOB? I am particularly interested in the AnyConnect chart you have here. Are you reporting on Active Sessions from your ASAs? If you have more than 1 ASA, did you have to create a custom poller to report a total? I know its a busy time for all of us so drop a line whenever you free up. No rush.
We're currently focusing our Pandemic response monitoring on Active VPN/Remote Users and our WAN Circuits utilization..
1. Active VPN/Remote user monitoring.. We're a Palo Alto Shop so we're using Global Protect VPN client so we monitoring our GP GW with OID - 220.127.116.11.4.1.25418.104.22.168.5.1.3 panGPGWUtilizationActiveTunnels . Which we using to create different graphs based on Total count overall and we also break down the graphs by operational regions using a custom property..
2. WAN Circuits are being selected by using another custom property applied to the interface that the provider is connected to .. Then we filter the gadget by SWQL i.CustomProperties.WAN_Monitor_Port = '1'
We're evolving the views/dashboard as we run is items that we need to keep a closer view as we never been through a global issues like this. We're also hitting new High watermark daily and going into uncharted territories. Nothing is normal any more..
Here's our home page view..BTW, our solarwinds environment only monitors the actual network, no servers other than Solarwinds itself.
Let me know if you have any questions..
Stay Healthy.. Ken
We are also focused on monitoring our VPN nodes, WAN routers and circuits and active users connections
I wanted to create an additional chart with Current Users connections instead of a table, but couldn't find correct oid for checkpoint vpn (its not 22.214.171.124.4.1.26126.96.36.199.3). So I just went for 188.8.131.52.4.1.2620.500.9000 and counted the entries.
I've been working on a multi tab dashboard (Summary, Leadership, Virtualization, WAN, & our Line of Business Stacks) with my own view set to NOC rotation most of the time. I've found new ways of manipulating Orion's tools and a few headscratching moments but am thrilled to have a spot that's more than the default view.
I'm still working out how to do custom components for perf monitoring as well as glean active/disconnected session perf counters but so far I've got these up and running (sorry for all the empty fields at the moment..but infosec right?)
This has been a very difficult time. I work for 911 - so each and every one of our dashboards if so very critical! There are only 14 people responsible for the day to day operations; there are 8 of us in IT.
I monitor the storage, would you believe that I have both the old VMAN Appliance running along with the new VMAN upgrade integrated with Orion. I loved the VMAN Appliance dashboard so much that I will continue to run it side by side with Orion as long as I can!
I watch the NetApp storage repository with both SolarWinds and OnCommand Unified Manager.
I monitor the Public Safety Answering Points (PSAP) - 911 phone systems located throughout the County of El Paso.
We have such a unique community where all the First Responders are all using the same phone, application and radio system, which allows us to service the community much more efficiently than those communities that have multiple distributed systems.
I monitor Time Skew on all the clients and servers running the application; the application, phone and radio must always have the same time due to legal ramifications.
Alerting goes to all 8 of us in IT; it took quite some time to tune the alerts so that we were not inundated with useless alerts. I am quite proud of the deployment as we are now proactive rather than reactive! I run SAM, NPM, NetFlow, DPA, Config Manager, VMAN, IPAM, Web Help Desk, and DameWare! THANK YOU SOLARWINDS FOR MAKING MY JOB FUN AND SO MUCH EASIER!
Every day is critical and now we are jumping hoops to accommodate the Emergency Operations Center as they hunker down to address the pandemic crossing the country.
Here are a couple of my favorite views:
911 Phone Network: (Its not a Pentagram!)
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.