Strategies for Scaling the World’s Largest Networks

By Joe Kim, SolarWinds Chief Technology Officer

It can be truly astounding to think about the scale of today’s largest government networks, which are growing larger and more complex every day.

As a public sector IT pro, it may seem like an impossible challenge to manage this growing behemoth. Ever-increasing numbers of network devices, servers, and applications give you less leeway for downtime, hiccups, or problems of any sort.

There is a range of strategies that government IT pros can employ to support network growth and scalability while helping to ensure that all architectural and infrastructural requirements are met, and system failover scenarios are accounted for.

As the IT environment expands, it becomes more important for monitoring and management systems to scale to keep up with growth. Most monitoring systems are built with the following elements, each with its own requirements and challenges to scale:

  • A server that hosts the monitoring product and polls for status and performance
  • A database where the polled information is stored for historical data access and reporting
  • A web console for software management, data visualization, and reporting

Within this environment, three primary variables will affect a system’s scalability:

  1. Infrastructure size: The number of monitored elements (where an element is defined as a single, identifiable node, interface, or volume), or the number of servers and applications that can be monitored.
  2. Polling frequency: The interval at which the monitoring system polls for information. For example, statistics collected every few seconds instead of every minute will make the system work harder, and requirements will increase.
  3. The number of simultaneous users accessing the monitoring system.

Those are the basics of understanding the feasibility of scalability. Now, let’s move on to ways to manage that environment.

A command center is particularly well suited to agencies with multiple regions or sites where the number of nodes to be monitored in each region would warrant both localized data collection and storage. It works well for regional teams that are responsible for their own environments and require autonomy over their monitoring platform. While the systems are segregated between regions, all data can still be accessed from the centrally located console.

Additional scalability tips

There are several additional strategies that will help manage an agency’s growing infrastructure:

Add polling engines: Distributing the polling load for the monitoring system among multiple servers will provide scalability for large networks.

Add web servers: Additional web servers can help support increasing numbers of concurrent monitoring sessions, helping to ensure that more users have uninterrupted web access to network monitoring software.

Add a failover server: To help ensure the monitoring system is always available, install a failover mechanism that will switch monitoring system operation to a secondary server if the primary server should fail.

Agency networks will certainly get large. It's the nature of an increasingly technically driven government. While it may seem overwhelming, implementing these few tactics will help IT managers embrace the growth and ultimately realize its value.

Find the full article on Government Computer News.

Thwack - Symbolize TM, R, and C