This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Crisis Dashboards

I think I can speak for many people in IT, in saying that when crises happen (COVID-19, large scale weather emergencies, power outages, etc.) there's always an added level of stress added.  Your carefully crafted business continuity plans are put to the test and you hope that your infrastructure can handle the burden.  Keeping people informed is crucial when operating in crisis mode.

Recently, Microsoft posted a great article on how to build a crisis management site.  I love this idea, but it's not really an IT resource is it?  It's great for policies and updates, but doesn't give me the information that I need to continue to do my job effectively in stressful times.

If you are anything like me, you're busy working the issues and don't have time for a million questions.  We always say that a picture is worth a thousand words.  So with that in mind what's a good dashboard worth?  6.023 x 10^23 words?

In my previous role, these are what I think resources like these would have been best on a dashboard for my company:

  • Map with WAN Links
  • Current VPN user count
  • Number of bad VPN authentications
  • Current Citrix sessions count
  • Internet circuit utilization
  • Some specific NetFlow for my Internet links
  • Some load balancer metrics

Now, this is just what I came up with on the fly and it can't/won't apply to everyone.  So my question to you is this: What makes a good dashboard for your company when you are in crisis mode?  This is a discussion not a think-piece, so do not hold back on the comments and if you can share pictures, I (and the community) would love to see it.

  • Our network is a big hub and spoke, and any users from the spokes who want to work remotely must VPN in via the hub. This means that files pulled from spokes over the VPN consume bandwidth in the datacenter. Our top metrics are:

    - Total VPN users

    - Avg/Peak bandwidth on datacenter circuits

    - CPU/mem on core networking hardware

    nickzourdos_0-1584467371840.png

    Nothing fancy at all. Simple but effective. Luckily for our organization (K-8 education) a vast majority of our resources are on the internet, so there isn't a whole lot to monitor. We're not hosting our own collab services and our buildings are empty... 

  • We're currently focusing our Pandemic response monitoring on Active VPN/Remote Users and our WAN Circuits utilization..

    1. Active VPN/Remote user monitoring.. We're a Palo Alto Shop so we're using Global Protect VPN client so we monitoring our GP GW with OID - 1.3.6.1.4.1.25461.2.1.2.5.1.3  panGPGWUtilizationActiveTunnels .  Which we using to create different graphs based on Total count overall and we also break down the graphs by operational regions using a custom property..

    Active Tunnels.png

    2. WAN Circuits are being selected by using another custom property applied to the interface that the provider is connected to .. Then we filter the gadget by SWQL i.CustomProperties.WAN_Monitor_Port = '1'

    custom property.png

    WAN Circuits.png

     We're evolving the views/dashboard as we run is items that we need to keep a closer view as we never been through a global issues like this. We're also hitting new High watermark daily and going into uncharted territories. Nothing is normal any more.. 

    Here's our home page view..BTW, our solarwinds environment only monitors the actual network, no servers other than Solarwinds itself.

    2020-03-17_13-52-42.png




    Let me know if you have any questions..

    Stay Healthy..  Ken

  • I'm curious - are you users using split-tunnel VPN or a tunnel-all VPN approach?  Would that change what you are working on?

  • We are also focused on monitoring our VPN nodes, WAN routers and circuits and active users connections

    I wanted to create an additional chart with Current Users connections instead of a table, but couldn't find correct oid for checkpoint vpn (its not 1.3.6.1.4.1.2620.1.1.25.3). So I just went for 1.3.6.1.4.1.2620.500.9000 and counted the entries.

    jpaluch_0-1584471774051.png

  • We're using split tunnel, thank goodness. If we weren't then the datacenter would be overrun before 8am. If we weren't it would have expedited our order for more bandwidth into the building. It would also likely mean we had made some conscious decision to monitor our users way more than we currently do, which would probably mean I'd be interested in some kind of flow data for context on what the traffic is. 

  • I have only been with the new company for a few weeks and need to get with the those that maintain Orion to see how I can assist.

    Paul

  • Hey Nick.

    Love this dashboard!  Is this custom built or something OOB?  I am particularly interested in the AnyConnect chart you have here.  Are you reporting on Active Sessions from your ASAs?  If you have more than 1 ASA, did you have to create a custom poller to report a total?  I know its a busy time for all of us so drop a line whenever you free up.  No rush.

    Thank you.

  • I've been working on a multi tab dashboard (Summary, Leadership, Virtualization, WAN, & our Line of Business Stacks) with my own view set to NOC rotation most of the time. I've found new ways of manipulating Orion's tools and a few headscratching moments but am thrilled to have  a spot that's more than the default view.

    I'm still working out how to do custom components for perf monitoring as well as glean active/disconnected session perf counters but so far I've got these up and running (sorry for all the empty fields at the moment..but infosec right?)

    Snag_28b9d53e.pngSnag_28b9d406.png

  • This has been a very difficult time. I work for 911 - so each and every one of our dashboards if so very critical! There are only 14 people responsible for the day to day operations; there are 8 of us in IT.

    I monitor the storage, would you believe that I have both the old VMAN Appliance running along with the new VMAN upgrade integrated with Orion. I loved the VMAN Appliance dashboard so much that I will continue to run it side by side with Orion as long as I can!

    I watch the NetApp storage repository with both SolarWinds and OnCommand Unified Manager.

    I monitor the Public Safety Answering Points (PSAP) -  911 phone systems located throughout the County of El Paso.

    We have such a unique community where all the First Responders are all using the same phone, application and radio system, which allows us to service the community much more efficiently than those communities that have multiple distributed systems.

    I monitor Time Skew on all the clients and servers running the application; the application, phone and radio must always have the same time due to legal ramifications.

    Alerting goes to all 8 of us in IT; it took quite some time to tune the alerts so that we were not inundated with useless alerts.  I am quite proud of the deployment as we are now proactive rather than reactive!  I run SAM, NPM, NetFlow, DPA, Config Manager, VMAN, IPAM, Web Help Desk, and DameWare!  THANK YOU SOLARWINDS FOR MAKING MY JOB FUN AND SO MUCH EASIER!

    Every day is critical and now we are jumping hoops to accommodate the Emergency Operations Center as they hunker down to address the pandemic crossing the country.   

    Here are a couple of my favorite views:

    911 Phone Network: (Its not a Pentagram!)

    zennifer_0-1584640663722.png

    Data Network:

    zennifer_1-1584640733600.png

  • Sure that's not a pentagram... 

    And this isn't a Chupacabra

    220px-Chupacabra_(artist's_rendition).jpg