cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Four Ways Federal IT Pros Can Help Their Networks Thrive

Level 11

Regardless of which new technologies federal network administrators adopt, they will always need dependable, consistent, and highly available solutions that keep their networks running -- constantly.

Sadly, that’s not always the reality.

Last year's survey of federal IT professionals by my company, SolarWinds, indicated that performance and availability issues continue to plague federal IT managers. More than 90 percent of survey respondents claimed that end-users in their organizations were negatively impacted by a performance or availability issue with business-critical technology over the past year, and nearly 30 percent of respondents claimed these issues occurred at least six times.

What can IT pros do about this?

Simplify

Don’t worry about deploying everything in one fell swoop. Instead, take a piecemeal approach. Focus on a single implementation and make sure that particular piece of technology absolutely shines. The trick to this strategy is keeping the big picture in mind as the individual pieces of technology are deployed.

Monitor

Network monitoring is a must. To do it properly, start with a baseline diagnostic that assesses the overall network performance, including availability and average response times. Once this baseline is established, look for anomalies, including configuration changes that other users may have made to the network. Find the changes, identify who made them, and factor their impact into the performance data as you identify problems and keep the network running.

Plan

Make no mistake: errors will happen, and it’s important to have a plan in place when things go south. That plan should be comprised of three facets: technology, people, and process.

First, a well-defined technology plan outlines how to best handle the different components of the network infrastructure, including monitoring and building in redundancies. That means having a backup for equipment that’s core to an agency’s network traffic.

Second, make sure the IT staff includes several people who share the same skillset and expertise. What happens if a key resource is out sick or leaves the organization? All of that expertise is gone, leaving a very big knowledge gap that will be hard to fill.

Third, develop a process that allows for rollbacks to prior configurations. That’s an important failsafe in case of a serious network error.

Interact

IT professionals need to understand organizational objectives to accomplish their own goals, which include optimizing and securing a consistently dependable network. Doing that is not just about technology. It also requires the ability to communicate freely with colleagues and agency leadership so that everyone is working toward the same goals.

CIOs must build a culture that is barrier-free and allows for regular interaction with other business leaders outside the technology realm. After all, isn’t that network or database that the IT staff manages directly tied to agency performance?

Having everything run perfectly all the time is an impossible dream. However, six nine’s of uptime is certainly achievable. All it takes is a little bit of simplification and planning, and a whole lot of technology and teamwork.

Find the full article on GNC.

Interested in this year’s cyber security survey? Go here.

12 Comments
MVP
MVP

Good post..

It really mirrors what we face out here...but in the end, IT is IT is IT. We all have the same basic challenges, The federal level has more beginning with the fact that they work at the speed of Government...thus some changes are less likely to happen on a more timely basis.

Level 14

Great write up.  I agree with all four points.  Jfrazier​, work at the speed of Government.  So very true.

Always looking for the niche markets. 

This post made me think--and react to its content.

A top IT leader once advised me "Don't add new technologies or systems until what you have in place is fixed and running without errors.  Once you have your current systems running smoothly, only then can you afford to add more tasks, hardware, technologies, etc. to your plate."

Good advice.  It's not always practical, and in retrospect it's hard not to react with a "Help me practice what you preach" though.  But we all have our challenges, and the concept of not expanding until things are running smoothly is a good one.  In a large organization, and without the appropriate amount of staff, this can be quite a challenge to accept--and enforce.  Imagine telling a few thousand employees--and many more customers--you are implementing a new-technology-freeze just until we get things working more smoothly.  A month, three months, a year . . . whatever it takes.

I think you'd find that people would push back harder and harder the longer the new tech is postponed.  Yet that pushback should be apparent to management, and should help all parties involved better understand the challenges you're facing.  Which should result in more help, more professional services, and ultimately a better environment in which to work.  Hopefully standardizing can be a big part of that change.

This week I spoke with some Networking peers about standardizing their environment and I learned some of their frustrations.  Their service provider did not standardize on equipment and configurations, and it results in a lot of extra down time.  Standardizing on the fewest amounts of different network platforms results in improved up time, decreased troubleshooting time, and improved security and customer satisfaction.  The story I heard from my peers was that they visited four remote WAN sites in one day as a favor for their WAN Service Provider, to install a new AdTran device at each site to improve the WAN services.  They found a totally different deployment at every site.

My first reaction was "Do you really have time for that?  Isn't that the Service Provider's responsibility?"  Then I learned the provider hadn't been getting to the job for a year.  My friends took it upon themselves to help get the ball rolling, and when they visited each site they discovered the Provider had not properly pre-configured the WAN to accept the new hardware, and had differing termination hardware in each location, and each location--although identical in service and region and environment--had completely unique setups and brands of boxes to attach to.

Each site that was visited had extended down time as a result of the Service Provider claiming they were all preconfigured properly, but my peers discovered each location needed very different configurations due to having different hardware and setups.  Standardization would have benefited my peers' business uptime, as well as the amount of problems their service provider was creating for themselves.

And that uptime counts against each business's success.  I strive for Five 9's.  When I read above that "Six 9's" is achievable, I was  . . .  shocked?  Astonished?  Dismayed?

I look at Five 9's as fifty seconds of down time in a year.  That makes Six 9's about 5 seconds of down time--over an entire year.  I have close to 50,000 active Ip addresses across five states, supported through almost three hundred network rooms, relying on technologies from T1 to asymmetrical DSL to 40 Gig Fiber into MPLS clouds.

If each of my sites had dual-supervisor L3 chassis switches with Enterprise Licenses on all of them (so I could route MPLS and BGP in a hitless ISSU environment), I could not achieve Six 9's.

What would Six 9's require?

  • No site could ever have any WAN or LAN problems
  • That would mean each site requires at least two or more WAN feeds via different providers per site.  And with many sites having less than ten employees, that's not an expense those sites can accept.
  • I'd also need every rural power co-operative to have dual power runs to each of my sites
  • Every WAN provider would be required to use unique services (no sharing services with a common provider upstream, like Verizon or ATT)
  • All WAN services lines must be laid in different ditches in different trenches--so a single backhoe couldn't take out both fibers at once

The resilience and redundancy for Six 9's . . . I don't know how that's possible in a practical and affordable method; at least, not in the environment I support today.  I'd love to hear more about that topic.  Maybe "Six 9's" has no upgrades?  Or maybe there are never power or WAN service provider problems, and no faulty network hardware or buggy network code?

But still, this was an interesting Geek Speak article--thanks for sharing it!

MVP
MVP

Once had an IT director promise 5 "9's" to the business.  Between vendor software bugs(application, san switch, etc.), telco issues and a variety of "mishaps", it was blown in the first quarter. That year we didn't even make 4 "9's".  While those numbers are great to strive for and can be attained...it is not easy or cheap.  It also takes a serious commitment to standardization and change management.

Level 14

Performance and availability issues...say it isn't so.    I agree with all your points.  So true

Level 9

Good plan.

It's true that Change Control--on all parties' sides--can seriously improve up time. 

And Six 9's might be achievable if everyone agrees that it can't be an absolute measurement of all down time.  Instead, everyone must agree that service lost due to planned / scheduled maintenance isn't counted against the Six 9's.

I'll qualify that slightly--if no end-users are affected by the scheduled maintenance windows, then the down time can be exempted from inclusion in the Six 9's calculation.  However, if planned down time DOES affect users, then that should be included (in my opinion) against the down time record.

Level 14

Well said.

Level 10

Nice read, agreed on all points.

Level 21

I couldn't agree more, especially with the "simplify" part.  As IT people we often get carried away with the technology and fail to keep things simple when in almost all cases keeping this as simple as possible is best.

Level 20

it's easy to see in node details those 9's