Monitoring Central

2 Posts authored by: Dez Employee

Keeping a network up and running is a full-time job, sometimes a full-time job for several team members! But it doesn’t have to feel like a fire drill every day. Managing a network shouldn’t be entirely reactive. There are steps you can take and processes you can put in place to help reduce some of the top causes of network outages and minimize any downtime.

 

1. The Problem: Human Element

The dreaded “fat finger.” You’ve heard the stories. You may have done it yourself, or been the one working frantically late into the night or over a weekend to try to recover from someone else’s mistake. If you’re really unlucky (like some poor employee at Amazon® last spring), the repercussions can be massive. No one needs that kind of stress.


The Protection:
First, make sure only the appropriate people have access to make changes. Have an approval system built in. And, since even the best of us can make mistakes, ensure you have a system that allows you to roll back changes just in case.

 

2. The Problem: Security Breaches

Network security is becoming more and more critical every day. People trying to break the system get better, and privacy needs for users gets higher. There are many critical elements to trying to keep your network secure, and it’s important not to miss any. It doesn’t do any good to deadbolt your door when your window is wide open.

The Protection:

Protect your devices from unauthorized changes. Monitor configurations so you can be alerted to any changes, see exactly what was changed, and know what login ID was used to make the change. Also, you should be regularly auditing your device configurations for vulnerabilities. Whether you have custom policies defined for your organization or need to comply with HIPAA, DISA STIG, SOX, or other industry standards, continuously monitoring your devices to help ensure your network stays compliant is one way to help.

 

3. The Problem: Lack of Routine Maintenance

Over time, networks can become messy and disorganized if there aren’t standards in place, increasing both the risk of errors and the time needed to resolve them.

 

The Protection:

Network standardization simplifies and focuses your infrastructure, allowing you to become more disciplined with routines and expectations. Naming conventions, standard MOTD banners, and interface names are just a few things you can do to help troubleshoot and keep a balance within your team and devices, allowing for better management and less human error.

 

4. The Problem: Hardware Failures

It’s not if hardware will fail, but when. Are you ready to make a speedy recovery? When a device unexpectedly goes down, it can have a big impact, depending on which device it is and what redundancies you have in place.

 

The Protection:

Ensure that you can quickly recover devices or bring a replacement online by having device configurations automatically backed up so you can quickly bring new devices online.

 

5. The Problem: Firmware Issues / Faults in the Devices

When you support hundreds of devices, required firmware updates can be tedious, and executing commands over and over increases the risk of error.

 

The Protection:

With network automation, you can easily manage rapid change across complex networks. Bulk deploy configurations to ensure accuracy and speed up deployment times.

 

Increase your uptime and reduce the challenges of keeping your network running smoothly so you can focus on other projects. With SolarWinds® Network Configuration Manager, you can bulk deploy configuration changes or firmware updates, manage approvals, revert to previous configurations, audit for compliance, and run remediation scripts. Take action today to reduce these five causes of network outages.

We just can't have anything nice, now can we?  Oh, well. We knew there would be new vulnerabilities and ransomware attacks in 2018. However, this time hardware is the culprit, and patching is not going to be a cure-all for the situation. Consider yourself warned: expect more slowdowns in 2018.

 

Stop and think about this for a second: as the days progress, we are literally learning how much this new vulnerability impacts us. Anyone who says they have the full solution is not being honest with you or themselves. What I would like to do is help you to see how you can use the tools you likely already have to make you more aware of past, present, and future vulnerabilities and threats. That said, let's move on to the importance of using SolarWinds tools to do just that.

 

SolarWinds® Patch Manager will allow you to update your Windows® machines to their Microsoft® patches. If you are currently using this product, you should already be scheduling and looking for these. I discovered that there can be some issues with third-party Windows antivirus or you might get the BSOD. Read more here, because the awesome chart helps clarify these issues and how to prevent them from happening to you.

 

Further, Patch Manager will allow you to schedule and report on your Windows devices regarding updates. The reporting is key to showcase your compliance and, in this case, start your baseline. Plus, just because you update your devices does not mean you are 100% in the clear. Updating your third-party packages is an added bonus with Patch Manager, a fact that is often overlooked though desperately needed.     

 

SolarWinds® Server & Application Monitoring (SAM) will help you validate your business, yourself, and your vendor support for any degradation that patching may have on your applications. This is something you will want to have in place as soon as possible. It allows you to see any anomalies that may present themselves to your applications after the patching is applied. And because SAM is multi-vendor, you’ll be able to address even broad-scale hardware issues. The avid SAM users among you will likely know even more tricks for using the software, and I encourage you to share your knowledge in the comments to help us all be more aware in terms of application-centric monitoring.

 

SolarWinds® Network Configuration Manager (NCM) comes helps when there are firmware upgrades\updates that need to be applied to impacted network devices. It also helps you to roll these out. There is a compliance reporting function built into NCM that will assist with audits automatically. Remember, this incident is ongoing, which makes NCM’s ability to import very helpful. In fact, you can plug into firmware vulnerability warnings provided by the National Institute of Standards and Technology (NIST). This puts you even further ahead of future vulnerabilities.

 

SolarWinds® Network Performance Monitor (NPM) is all about the baseline. If you have ever been to one of our SWUGs, you have heard me preach endlessly about baselines and their extreme importance. However, I understand that sometimes you need black and white in front of you to truly understand this. The mindset I’m currently following regarding this vulnerability looks something like this:

 

  1. Patched and we have our checkbox
  2. Monitoring our application performances
  3. Ready for updates to needed network devices
  4. Monitoring the common vulnerabilities database
  5. Waiting for any anomaly that may present its ugly face (my favorite)

 

 

We can now show that we have implemented the patching to put a Band Aid® on the issues that could present themselves. However, as I’ve already mentioned, this is not a full fix. A hardware option would be the best solution, but is obviously not available to billions of devices at this time. YOU ARE THE THE FIRST RESPONDER!

 

Using NPM in combination with the other tools that I have outlined allows you to verify the patching and the results. Also, if there are ticks or drops or spikes that do NOT match your current baseline, you can share that solid reporting and documentation with your vendor to work out the possible issue, which makes you part of the solution. Is there anything better than working at the edge of technological advancements to create countermeasures to vulnerabilities? NO. The answer is a solid NO.

 

If you don’t already have it in place, set up threshold alerting and monitoring on critical devices that are housing your applications. That helps ensure that you are alerted to anything out of the ordinary, allowing you to get things back on track. It also shows your team and other departments that you are fully invested in the integrity of application uptime and performance. Also, if you have DevOps, you really need the documentation and baselines to prove that perhaps the performance issue is not the in-house application, but an actual patching issue. That, right there, can save a lot of unneeded cycles through rabbit holes.

 

Please let me know if you have additional ways to protect and help through these beginning stages of 2018 vulnerabilities. The ideas we share could literally help the many of you who act as a one-person army fighting your way to the top!

 

Thank you all for your eyes,

~Dez~

 

In case you’d like more information on any of the products mentioned above, check these out:

 

SolarWinds® Patch Manager

SolarWinds® Server & Application Monitor

SolarWinds® Network Performance Monitor

SolarWinds® Network Configuration Manager

 

Other resources:

 

https://www.pcworld.com/article/3245606/security/intel-x86-cpu-kernel-bug-faq-how-it-affects-pc-mac.html

https://www.nytimes.com/2018/01/03/business/computer-flaws.html

 

Check out our Security and Compliance LinkedIn® Showcase Page for ideas on how to socialize this content: https://www.linkedin.com/showcase/solarwinds-security-and-compliance/

Follow our Federal LinkedIn page to stay current on federal events and announcements: https://www.linkedin.com/showcase/4799311/

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.