cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Advice on Managing and Monitoring Complex Networks

Level 11

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

Here is an interesting article from Federal Technology Insider on monitoring complex networks, as presented by one of our customers during our Federal User Group meeting earlier this year.

When it comes to monitoring and managing complex IT networks, it would seem that the ideal solution would be to build custom tools uniquely suited for the environment.

Or is it?

As David A. Richards, Senior Technical Manager EOSS/GuardNet, one of the largest Department of Defense (DoD) networks, shared recently at the SolarWinds Federal User Group, “When it comes to ensuring the safety, security, and continuous operation of GuardNet, it’s vitally important that we be able to customize our tools to achieve organizational objectives and make actionable decisions.”

His team found themselves in a position where they were inundated with information and couldn’t find the proverbial needle in the haystack. The opportunity to break out of the cycle of information overload, analysis paralysis, and circular discussions came in the form of network management and monitoring tools from SolarWinds.

While the assumption might be that ‘out-of-the-box’ tools wouldn’t be able to cope with the rigors of a complex environment like GuardNet, the tools came with a significant strategic advantage: native customization capabilities.

Starting with a customer-generated architecture diagram, Richards and his team were able to rebuild the network to meet not only today’s needs, but also to prepare for additional demands on the network that will come as the DoD rolls out the Joint Regional Security Stacks (JRSS). The JRSS will add more nodes on a global scale and will also require compliance with new DISA security standards that will apply across the DoD, including GuardNet.

So what advice does Richards have for other government IT leaders who are responsible for complex networks?

  • Stop thinking in terms of single devices. Start thinking of the network as an ecosystem and identify dependencies within the ecosystem.
  • Use monitoring tools to help visualize the network. Draw a map, color code it, and share it.
  • Identify patterns of failure and recurring problem areas, overlay them on the map, and target those areas for remediation.
  • Move from a technical diagram to a format that can communicate the business value to secure funding for additional network monitoring tools that can help automate routine tasks, such as load balancing and patch updating.

These lessons were invaluable to Richards and his team during last year’s hurricane crisis in Puerto Rico. Following Hurricane Maria, sites in Puerto Rico could no longer monitor GuardNet. Richards and his team ensured that sites on the mainland were able to add that workload. While the initial step was just to see what parts of the network and devices were up or down, they were able to quickly access credentials and add specific device monitoring and management to help ensure continuity of operations.

As Richards shared, “The ability to create a regional view of the situation in a very short period of time gave better insight into areas of most damage and criticality and got us on the right track to normal operations much more quickly than anyone anticipated.”

Learn about solutions that offer network management and monitoring in any circumstance here.

Find the full article on Government Technology Insider.

  The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates.  All other trademarks are the property of their respective owners.

7 Comments

Finding that needle in the haystack is a challenge for us.  I'm our SolarWinds SME and am stuck between that rock and that hard place where I'm promoting the single pane of glass as an efficient way to see everything with one front end that aggregates info from multiple (SolarWinds) tools, while simultaneously experiencing slow performance and inaccurate data from our SW environment.

Problem #1 is NPM doesn't behave as quickly as our users demand--they've become accustomed to "Google-like" performance; they expect SolarWinds to be as fast as Google, but we don't spend the millions (or billions?) of dollars that Google does on infrastructure.

It's not good when tools like my UDT take eight seconds or longer to display data.

Problem #2 is that UDT's data is missing or obviously incorrect.  Try getting support from staff and making efficient progress when we see that kind of glaring mistake.

This means we're researching alternate stand-alone tools that cannot be integrated into a single pane of glass.  This can be less efficient at presenting the whole picture quickly.  This won't help us reduce MTTI and be more efficient about assigning responsibility to the appropriate team.  Which means more teams will be working simultaneously (or worse--sequentially) to diagnose the problems and move responsibility to the next team down the line that might be able to fix the issues.

Instead, I'm being told that other best-of-breed products (which are snappy fast and which provide accurate information) will take up the slack we experience with slow NPM load times and erroneous information presented by UDT.

I can't argue with this philosophy when I see bad data reported and when NPM is sluggish.

Happily, SolarWinds has set up several meetings with us to analyze our deployment and make suggestions or corrections to our environment.

Here's hoping my users will soon experience accurate data and snappy web pages so I don't have to start building a new collection of stand-alone products to replace what SW's suite is supposed to be doing for us.

If things improve, maybe we'll be able to leverage SolarWinds to quickly find that needle in the haystack.  I don't like to think of not having one source of truth for all things monitoring, nor of having to branch out into places other than Thwack to get helpful input and answers.  A great example of something new to me from Thwack--at no charge--is this cool custom HTML written by mblackburn​ and shared to us via wluther​:

pastedImage_1.png

It can be found here:  Using Your Custom HTML Resource To Properly Display SWQL Query Results

I want to keep this kind of coolness going in my organization!  It's the kind of thing that this article talks about--color-coded monitoring tools that display patterns which help us troubleshoot and identify major sources of unhappiness--and correct them.

Level 10

thanks rschroeder​ for the feedback

Level 14

Thanks for the article. 

Level 20

Yep RMF is making all information systems on DoD networks a lot more complicated recently.

Level 13

Good Article. My only issue is with the color coding bit but that's because I'm color blind.

Level 13

Thanks. Always interesting to hear what's going on in that space.

MVP
MVP

Good article

About the Author
Paul Parker, a 25-year information technology industry veteran, and expert in Government. He leads SolarWinds’ efforts to help public sector customers manage the security and performance of their systems by using technology. Parker most recently served as vice president of engineering at Infoblox‘s federal division. Before that, he served in C-level or senior management positions at Ward Solutions, Eagle Alliance and Dynamics Research Corp.