CYA - Cover Your Assets (By Securing Them)

I'm not aware of an antivirus product for network operating systems, but in many ways, our routers and switches are just as vulnerable as a desktop computer. So, why don't we all protect them in the same way as our compute assets? In this post, I'll look at some basic tenets of securing the network infrastructure that underpins the entire business.

Authentication, authorization, and accounting (AAA)

Network devices intentionally leave themselves open to user access, so controlling who can get past the login prompt (authentication) is a key part of securing devices. Once logged in, it's important to control what a user can do (authorization). Ideally, what the user does should also be logged (accounting).

Local accounts are bad, mkay?

Local accounts (those created on the device itself) should be limited solely to backup credentials that allow access when the regular authentication service is unavailable. The password should be complex and changed regularly. In highly secure networks, access to the password should be restricted (kind of a "break glass for password" concept). Local accounts don't automatically disable themselves when an employee leaves, and far too often, I've seen accounts still active on devices for users who left the company years ago, with some of those accessible from the internet. Don't do it.

Use a centralized authentication service

If local accounts are bad, then the alternative is to use an authentication service like RADIUS or TACACS. Ideally, those services should, in turn, defer authentication to the company's existing authentication service, which in most cases, is Microsoft Active Directory (AD) or a similar LDAP service. This not only makes it easier to manage who has access in one place, but by using things like AD groups, it's possible to determine not just who is allowed to authenticate successfully, but what access rights they will have once logged in. The final, perhaps obvious, benefit is that it's only necessary to grant a user access in one place (AD), and they are implicitly granted access to all network devices.

The term process

A term (termination) process defines the list of steps to be taken when an employee leaves the company. While many of the steps relate to HR and payroll, the network team should also have a well-defined term process to help ensure that after a network employee leaves, things such as local fall back admin passwords are changed, or perhaps SNMP read/write strings are changed. The term process should also include disabling the employee's Active Directory account, which will also lock them out of all network devices because we're using an authentication service that authenticates against AD. It's magic! This is a particularly important process to have when an employee is terminated by the company, or may for any other reason be disgruntled.

Principal of least privilege

One of the basic security tenets is the principal of least privilege, which in basic terms, says Don't give people access to things unless they actually need it; default to giving no access at all. The same applies to network device logins, where users should be mapped to the privileged group that allows them to meet their (job) goals, while not granting permissions to do anything for which they are not authorized. For example, an NOC team might need read-only access to all devices to run show commands, but they likely should not be making configuration changes. If that's the case, one should ensure that the NOC AD group is mapped to have only read-only privileges.

Command authorization

Command authorization is a long-standing security feature of Cisco's TACACS+, and while sometimes painful to configure, it can allow granular control of issued commands. It's often possible to configure command filtering within the network OS configuration, often by defining privilege levels or user classes at which a command can be issued, and using RADIUS or TACACS to map the user to that group or user class at login. One company I worked for created a "staging" account on Juniper devices, which allowed the user to enter configuration mode and enter commands, and allowed the user to run commit check to validate the configuration's validity, but did not allow an actual commit to make the changes active on the device. This provided a safe environment in which to validate proposed changes without ever having the risk of the user forgetting to add check to their commit statement. Juniper users: tell me I'm not the only one who ever did that, right?

Command accounting

This one is simple: log everything that happens on a device. More than once in the past, we have found the root cause of an outage by checking the command logs on a device and confirming that, contrary to the claimed innocence of the engineer concerned, they actually did log in and make a change (without change control either, naturally). In the wild, I see command accounting configured on network devices far less often than I would have expected, but it's an important part of a secure network infrastructure.

Network time protocol (NTP)

It's great to have logs, but if the timestamps aren't accurate, it's very difficult to align events from different devices to analyze a problem. Every device should be using NTP to ensure that they have an accurate clock to use. Additionally, I advise choosing one time zone for all devices—servers included—and sticking to it. Configuring each device with its local time zone sounds like a good idea until, again, you're trying to put those logs together, and suddenly it's a huge pain. Typically, I lean towards UTC (Coordinated Universal Time, despite the letters being in the wrong order), mainly because it does not implement summer time (daylight savings time), so it's consistent all year round.

Encrypt all the things

Don't allow telnet to the device if you can use SSH instead. Don't run an HTTP server on the device if you can run HTTPS instead. Basically, if it's possible to avoid using an unencrypted protocol, that's the right choice. Don't just enable the encrypted protocol; go back and disable the unencrypted one. If you can run SSHv2 instead of SSHv1, you know what to do.

Password all the protocols

Not all protocols implement passwords perfectly, with some treating them more like SNMP strings. Nonetheless, consider using passwords (preferably using something like MD5) on any network protocols that support it, e.g., OSPF, BGP, EIGRP, NTP, VRRP, HSRP.

Change defaults

If I catch you with SNMP strings of public and private, I'm going to send you straight to the principal's office for a stern talking to. Seriously, this is so common and so stupid. It's worth scanning servers as well for this; quite often, if SNMP is running on a server, it's running the defaults.

Control access sources

Use the network operating system's features to control who can connect to them in the first place. This may take the form of a simple access list (e.g., a vty access-class in Cisco speak) or could fall within a wider Control Plane Policing (CoPP) policy, where the control for any protocol can be implemented. Access Control Lists (ACLs) aren't in themselves secure, but it's another step to overcome for any bad actor wishing to illicitly connect to the devices. If there are bastion management devices (aka jump boxes), perhaps make only those devices able to connect. Restrict from where SNMP commands can be issued. This all applies doubly for any internet-facing devices, where such protections are crucial. Don't allow management connections to a network device on an interface with a public IP. Basically, protect yourself at the IP layer as well by using passwords and AAA.

Ideally, all devices would be managed using their dedicated management ports, accessed through a separate management network. However, not everybody has the funding to build an out-of-band management network, and many are reliant on in-band access.

Define security standards and audit yer stuff

It's really worth creating a standard security policy (with reference configurations) for the network devices, and then periodically auditing the devices against it. If a device goes out of compliance is that a mistake or did somebody intentionally weaken the device security posture? Either way, just because a configuration was implemented once, it would be risky to assume it had remained in place from then on, so a regular check is worthwhile.

Remember why

Why are we doing all of this? The business runs over the network. If the network is impacted by a bad actor, the business can be impacted in turn. These steps are one part of a layered security plan; by protecting the underlying infrastructure, we help maintain the availability of the applications. Remember the security CIA triad —Confidentiality, Integrity, and Availability? The steps I have outlined above—and much more that I can think of—help maintain network availability and ensure that the network is not compromised. This means that we have a higher level of trust that the data we entrust to the network transport is not being siphoned off or altered in transit.

What steps do you take to keep your infrastructure secure?

Thwack - Symbolize TM, R, and C