Skip navigation
1 2 3 Previous Next

Geek Speak

1,993 posts

The story so far:


  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)


The holidays are approaching, but that doesn't mean a break for the network team. Here's the fourth installment of the story, by Tom Hollingsworth (networkingnerd).


The View From Above: James (CEO)


I'm really starting to see a turn around in IT. Ever since I put Amanda in charge of the network, I'm seeing faster responses to issues and happier people internally. Things aren't being put on the back burner until we yell loud enough to get them resolved. I just wish we could get the rest of the organization to understand that.


Just today, I got a call from someone claiming that the network was running slow again when they tried to access one of their applications. I'm starting to think that "the network is slow" is just code to get my attention after the unfortunate situation with Paul. I decided to try and do a little investigation of my own. I asked this app owner if this had always been a problem. It turns out that it started a week ago. I really don't want to push this off on Amanda, but a couple of my senior IT managers are on vacation and I don't have anyone else I can trust. But I know she's going to get to the bottom of it.



The View From The Trenches: Amanda (Sr Network Manager)


Well, that should have been expected. At least James was calm and polite. He even told me that he'd asked some questions about the problem and got some information for me. I might just make a good tech out of the CEO after all!


James told me that he needed my help because some of the other guys had vacation time they had to use. I know that we're on a strict change freeze right now, so I'm not sure who's getting adventurous. I hope I don't have to yell at someone else's junior admin. I decided I needed to do some work to get to the bottom of this. The app in question should be pretty responsive. I figured I'd start with the most basic of troubleshooting - a simple ping. Here's what I found out:


icmp_seq=0 time=359.377 ms

icmp_seq=1 time=255.485 ms

icmp_seq=2 time=256.968 ms

icmp_seq=3 time=253.409 ms

icmp_seq=4 time=254.238 ms


Those are terrible response times! It's like the server is on the other side of the world. I pinged other routers and devices inside the network to make sure the response times were within reason. A quick check of other servers confirmed that response times were in the single digits, not even close to the bad app. With response times that high, I was almost certain that something was wrong. Time to make a phone call.


Brett answered when I called to the server team. I remember we brought him on board about three months ago. He's a bit green, but I was told he's a quick learner. I hope someone taught him how to troubleshoot slow servers. Our conversation started off as well as expected. I told him what I found and that the ping time was abnormal. He said he'd check on it and call me back. I decided to go to lunch and then check in on him when I got finished. That should give him enough time to get a diagnosis. After all, it's not like the whole network was down this time, right?


I got back from lunch and checked in on Brett The New Guy. When I walked in, he was massaging his temples behind a row of monitors. When I asked what was up, he sighed heavily and replied, "I don't know for sure. I've been trying to get into the server ever since you called. I can communicate with vCenter, but trying to console into the server takes forever. It just keeps timing out."


I told Brett that the high ping time probably means that the session setup is taking forever. Any lost packets just make the problem worse. I started talking through things at Brett's desk. Could it be something simple? What about the other virtual machines on that host? Are they all having the same problem?


Brett shrugged his shoulders. His response, "I'm not sure? How do I find out where they are?"


I stepped around to his side of the desk and found a veritable mess. Due to the way the VM clusters were setup, there was no way of immediately telling which physical host contained which machines. They were just haphazardly thrown into resource pools named after comic book characters. It looked like this app server belonged to "XMansion" but there were a lot of other servers under "AsteroidM". I rolled my eyes at the fact that my network team had strict guidelines about naming things so we could find it at a glance, yet the server team could get away with this. I reminded myself that Brett wasn't to blame and kept digging.


It took us nearly an hour before we even found the server. In El Paso, TX. I didn't even know we had an office in El Paso. Brett was able to get his management client to connect to the server in El Paso and saw that it contained exactly one VM - The Problem App Server. We looked at what was going on and figured that it would work better if we moved it back to the home office where it belonged. I called James to let him know we fixed the problem and that he should check with the department head. James told me to close the ticket in the system since the problem was fixed.


I hung up Brett's phone. Brett spun his chair back to his wall of monitors and put a pair of headphones on his head. I could hear some electronic music blaring away at high volume. I tapped Brett on the shoulder and told him, "We're not done yet. We need to find out why that server was halfway across the country."


Brett stopped his music and we dug into the problem. I told Brett to take lots of notes along the way. As we unwound the issues, I could see the haphazard documentation and architecture of the server farm was going to be a bigger problem to solve down the road. This was just the one thing that pointed it all out to us.


So, how does a wayward VM wind up in the middle of Texas? It turns out that the app was one of the first ones ever virtualized. It had been running on an old server that was part of a resource pool called "SavageLand". That pool only had two members: the home server for the app and the other member of the high availability pair. That HA partner used to be here in the HQ, but when the satellite office in El Paso was opened, someone decided to send the HA server down there to get things up and running. Servers had been upgraded and moved around since then, but no one documented what had happened. The VMs just kept running. When something would happen to a physical server, HA allowed the machines to move and keep working.


The logs showed that last week, the home server for the app had a power failure. It rebooted about ten minutes later. HA decided to send the app server to the other HA partner in El Paso. The high latency was being caused by a traffic trombone. The network traffic was going to El Paso, but the resources the server needed to access were back here at the HQ. So the server had to send traffic over the link between the two offices, listen for the response, and then send it back over the link. Traffic kept bouncing back and forth between the two offices, which saturated the link. I was shocked that the link was even fast enough to support the failover link. According to Brett's training manuals, it barely met the minimum. We were both amused that the act of failing the server over to the backup cause more problems than just waiting for the old server to come back up.


Brett didn't know enough about the environment to know all of this. And he didn't know how to find the answers. I made a mental note to talk to James about this at the next department meeting after everyone was back from vacation. I hoped they had some kind of documentation for that whole mess. Because if they didn't, I was pretty sure I knew where I could find something to help them out.


To be continued...

Wow, can you believe it? 2016 is almost over, the holidays are here I didn’t even get you anything!   It’s been a bit of a wild rollercoaster of a year through consolidation, commoditization, and collaboration!


I’m sure you have some absolute favorite trends or notable things which have occurred here throughout 2016.  Here are some that in particular have been a pretty recurring trend throughout the year.



  • Companies going private such as Solarwinds (closed in February), DellEMC (closed in September)
  • Companies buying other companies and consolidating industry like Avago buying Broadcom (Closed Q1), Brocade buying Ruckus (Closed Q3), Broadcom buying Brocade (Initiated in October)
  • Or companies divesting of assets like Dell selling off SonicWall and Quest, and Broadcom selling off Brocade’s IP division



Alright so that’s some of the rollercoaster at least a small snapshot of it, and the impact those decisions will have on practitioners like you and I only time will tell (I promise some of those will be GREAT and some of those, not so much!)


But what else, what else?! Some items I’ve very recently discussed include.



All three of these net-net benefit in the end really means that we will continue to see better technology, with deeper investment and ultimately (potentially) lower costs!


On the subject of Flash though if you haven’t been tracking the Density profiles have been insane this year alone and that trend is only continuing with further adoption and better price economics with technology like NVMe.  I particularly love this image as it reflects the shrinking footprint of the data center while reflecting our inevitable need for more.


Moores Law of Storage.png



This is hardly everything that happened in 2016 but these are particular items which are close to my heart and respectively my infrastructure.   I will give a hearty congratulation to this being the 16th official “year of vdi” a title we continue to grant it yet continues to fail to fulfill on its promises.  


Though with 2016 closing quickly on our heels there are a few areas you’ll want to be on the watch for in 2017!


  • Look for Flash Storage to get even cheaper, and even denser
  • Look to see even more competition in the Cloud space from Microsoft Azure, Amazon AWS and Google GCP
  • Look to Containers to become something you MIGHT actually use on a regular basis and more rationally than the very obscure use-cases promoted within organizations
  • Look to vendors to provide more of their applications and objects as Containers (EMC did this with their ESRS (Secure Remote Support)
  • Obviously 2017 WILL be the Year of VDI… so be sure to bake a cake
  • And strangely with the exception of pricing economics making adoption of 10GigE+ and Wireless wave2 we’ll see a lot more of the same as we saw this year, maybe even some retraction in hardware innovation
  • Oh and don’t forget, more automation, more DevOps, more “better, easier, smarter”


But enough about me and my predictions, what were some of your favorite and notable trends of 2016 and what are you looking to see coming forward looking to 2017?


And if I don’t get a chance to… Happy Holidays and a Happy New Year to ya’ll!

After the network perimeter is locked down, servers are patched, and password policies enforced, end-users themselves are the first line of defense in IT security. They are often the target for a variety of attack vectors making them the first step of triage when a security incident is suspected. Security awareness training, which should be a part of any serious IT security program, should be based in common sense, but what security professionals consider common sense isn’t necessarily common sense for the average end-user.


In order to solve this problem and get everyone on the same page, end-users need the awareness, knowledge, and tools to recognize and prevent security threats from turning into security breaches. To that end, a good security awareness program should be guided by these three basic principles:


First, security awareness is a matter of culture.


Security awareness training should seek to change or create a culture of awareness in an organization. This means different things to different security professionals, but the basic idea is that everyone in the organization should have a common notion of what good security looks like. This doesn’t mean that end-users know how to spot suspicious malformed packets coming into a firewall, but it does mean that it’s part of company culture to be suspicious of email messages from unknown sources or even from known sources but with unusual text.


The concerns of the organization’s security professionals need to become part of the organization's culture. This isn’t a technical endeavor but a desire to create a heightened awareness of security concerns among end-users. They don’t need to know about multi-tenant data segmentation or versions of PHP, but they should have an underlying concern for a secure environment. This is definitely somewhat ambiguous and subjective, but this is awareness.


Second, security awareness training should empower end-users with knowledge.


After a culture of security awareness has been established, end-users need to know what to actually look for. A solid security awareness program will train end-users on what current attacks look like and what to do when facing one. This may be done simply with weekly email newsletters or required quarterly training sessions.


End-users need to actually learn why it’s not good to plug a USB stick found in the parking lot into their computer, and users need to get a good feel for what phishing emails look like. They should know that they can hover over a suspicious link and sometimes see the actual hidden URL, and they should know that even that can be faked.


Ultimately, they need to know what threats look like. The culture of awareness makes them concerned, and knowledge gives them the ability to identify actual problems in the real world.


Third, security awareness training is concerned with changing behavior.


The whole point here is that end-users take action when there is suspicion of malicious activity. Security awareness training is useless if no one takes action and actually acts like the first line of defense they really are (or can be).


A good security awareness program starts with culture, empowers end-users with knowledge, and seeks to change behavior. This means making significant effort to provide end-users with clear directions for what to do when encountering a suspected security incident. Telling users to simply “create a ticket with the helpdesk” is just not enough. End-users need clear direction as to what they can actually do in the moment when they are dealing with an issue. This is where the whole “first line of defense” becomes a reality and not just a corporate platitude.


For example, what should end-users actually do (or not do) when they receive a suspected phishing email? The directions don’t need to be complicated, but they need to exist and be communicated clearly and regularly to the entire organization.


Security awareness training is the most cost-effective part of a security program in that it doesn’t require purchasing millions of dollars of appliances and software licenses. There is a significant time investment, but the return on investment is huge if done properly. A strong security awareness training program needs to be based in common sense, change culture, empower end-users with knowledge, and change behavior.


(image courtesy of Marvel)


...I learned from "Doctor Strange"

(This is part 3 of a 4-part series. You can find part 1 here and part 2 here)


Withhold judgment and give respect when seeking answers

Standing outside the door to Kamar Taj, having just been saved from muggers, Strange is still glib and sarcastic about the nature of the environment he is in. Mordo stops him and says,


"I was in your place, once. I, too, was disrespectful. So might I offer you some advice? Forget everything that you think you know."


Recently, I was involved in a discussion about monitoring containers. I said,  "Maybe I'm being naive, but it seems like we already solved this problem. Only it was 2001 and we called it LPARs running on AIX." There was some nervous laughter, a few old-timers got the joke, and the rest of the group explained how containers were completely different, and all that old stuff wouldn't apply.


I wrote about this a year ago ("Respect Your Elders") and that sentiment still holds true. If you are not willing to give respect and credence to older ideas (if not older IT pros), then you are going to insult a lot of people, miss a lot of solutions, and spend a lot of extra time fixing old problems all over again.


Redundancy is your friend

In the movie, we discovered that the world is protected from mystical threats by three Sanctum Sanctorums, located in London, Hong Kong, and New York. When London falls, the world is still protected by the other two. Only after Hong Kong falls can the world be overwhelmed by hostile forces.


The message to us in IT is clear: failover systems, disaster recovery plans, high availability solutions, and the rest are all good things.


To say any more about this would be redundant.


Find a teacher and trust them to lead you

Stephen Strange travels to Kathmandu, to the mystical school of Kamar Taj, and meets the Ancient One. His mind is opened to the existence of magic in the world, and he begs to be accepted as a student. The Ancient One then guides Strange in his journey to master the mystical arts, monitoring his progress and helping him avoid pitfalls along the way. Later, she rebukes him by saying, "When you came here you begged me to teach you. Now I'm told you question every lesson and prefer to study on your own."


The correlating lesson for us in IT is that many of us tend to fall into the trap of solitary study. We find our education in the form of online blog posts, web-based tutorials, and PDFs. But there is something to be said for having a teacher, a mentor who understands you; where you started, where you'd like to go, how you learn best, and what your shortcomings are. If you are learning a single skill, self-directed learning is a great way to go. But when you are thinking about your career, it's worth taking the time to find a trusted advisor and stick with them. They will often see things in you that you cannot see in yourself.


Be comfortable with confusion

At one point in the story, Strange complains, "This doesn't make any sense!" The Ancient One replies, "Not everything does. Not everything has to." The point in the movie is that Strange has to let go of his need for things to make sense before he engages with them. Sometimes it needs to be enough to know that something simply is, regardless of how. Or that something works a particular way, irrespective of why.


"Yes, but now I know how it works," is what I say after I've burned hours de-constructing a perfectly working system. It's not that the education wasn't important, it's that it may not have been important at that moment. When our need for things to make sense impedes our ability to get on with our daily work, that's when we need to take a step back and remember that not everything has to make sense to us now, and that inevitably, some things in IT will never make sense to us.


When events pull you a certain direction, take a moment and listen

In the middle of a fight, Strange reaches for an axe hanging on the wall, only to have his semi-sentient cloak pull him toward a different wall. Despite repeated attempts to get the weapon, the cloak insistently pulls him away, until Strange finally realizes that the cloak is trying to tell him about an artifact that would restrain, rather than harm, his opponent. (For comic book geeks, those were a more down-to-earth version of the Crimson Bands of Cytorrak).


Despite our best laid plans and deepest desires, sometimes life pushes us in a different direction. This isn't strictly relegated to our career plans. Sometimes you believe the best solution lies with a particular coding technique, or even a specific language. Or with your chosen hardware platform, a trusted vendor, or even a specific software package.


And yet, despite your rock-solid belief that this is the best and truest way to achieve your goal, you can't seem to get it done.


In those moments, it's useful to look around and see where events are pushing you. What is over there? Is it something useful?


Even if others label it useless, be proud of the knowledge you have

During surgery, the anesthesiologist quizzes Doctor Strange on his musical knowledge, asking him to identify Chuck Mangione's hit, "Feels So Good." Later on, in an aside that goes by too fast for many in the audience, Strange tells his colleague that he traveled to Kathmandu. She asks "Like the Bob Seger song?" He responds, "Beautiful Loser album, 1975, A-side, third cut? Yes. In Nepal."


No, having this knowledge didn't help our hero save the day, but it was still a tangible part of who he was. Strange is a gifted doctor, an unapologetically arrogant ass, a talented sorcerer... and an unashamed music geek.


We in IT have to remember that we are also whole people. We're not just storage engineers or SysAdmins or pen testers or white hat hackers. We have other aspects of our lives that are important to us, even if they aren't central or even relevant to the plot of our story. They provide richness and depth of character. We shouldn't lose sight of that, and we shouldn't ignore our need for hobbies, interests, and non-IT outlets in our life.


Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 4, coming next week.

It would be really easy to just post this link to Sam Harris’ TED talk and say “Discuss!” Sam Harris: Can we build AI without losing control over it?


But for you, busy people, let me distill some of Sam’s points and add a few of my own.

Sam does a brilliant job of pointing out that we’re not as worried about the impact of artificial intelligence as we should be.


“Death by science fiction is fun. Death by famine is not fun.”


If we knew we would all die in 50 years from a global famine, we’d do a heck of a lot to stop it. Sam is concerned that there’s a risk to humans once artificial intelligence surpasses us and it will, it’s only a matter of time.


"Worrying about AI safety is like worrying about over population on Mars."


So, we’re using a time frame as an excuse? That we shouldn’t worry our pretty little heads about it because it won’t occur in our lifetime? In half my lifetime, I’ve gone from having an Amstrad CPC 6128 running DOS to now carrying the Internet in my pocket. Also, I have kids and hopefully one day grandkids, so I’m a little worried for them.


Information processing is the source of intelligence.  And we wouldn't consider for a moment the option that we'd ever stop improving our technology. We will continue to improve our intelligent machines until they are more intelligence than we are and they will continue to improve themselves.


Elon Musk’s OpenAI group released Universe this week, providing a way for machines to play games and use websites like humans do. That's not a big deal if you’re not worried about the PC beating you at GTA V. It's a slightly bigger deal if you are a travel agent and the machines can now use comparison websites and book the cheapest fare without you. And while you’d hopefully have a more compass screening what you’d do online, do the machines have one? Could they get cheeky and ship their enemies glitter, or something more sinister?


Robert "Uncle Bob" Martin (author of The Clean Coder and other books), sets out 10 rules for software developers that he calls "The Scribe's Oath". One of those rules is you will not write code that does harm. But the issue isn't that a human will write code that shuts down a city's water treatment plant. The issue is that we're writing code that constructs deep learning neural networks, allowing machines to make decisions by themselves. We're enabling them to become harmful on their own, if we're not able to code a sense of morals, ethics and values into them.


Then we get into the ethical debates. If there's only two outcomes for an incident with a self-driving car, one that preserves the life of the driver and one that preserves the life of a pedestrian or another car driver, which one should the machine choose? Do we instil a human-like self-preservation/survival instinct?


Is this all the fault of (or a challenge for) the software developer? How does this apply to systems administrators & systems architects?


We've talked about autonomic computing before. If we are configuring scripted and self-healing systems, are we adding to the resilience of the machines and will this ultimately be detrimental to us? How outlandish does that seem right now though - that we'd enable machines to be so self-preserving that they won't even die if we want them too? We've even laughed in these comments about whether the machines will let us pull the power plug on them. Death by science fiction is funny.  But the machines can now detect when we are lying, because we built them to be able to do that. Ooops.


Technosociologist Zeynep Tufekci says “We cannot outsource our responsibilities to machines. We must hold on ever tighter to human values and human ethics." Does this means we need to draw a line about what machines will do and what they won’t do? Are we conscious in our AI developments about how much power and decision making control we are enabling the machines to have, without the need for a human set of eyeballs and approval first? Are we building this kind of safety mechanism in? With AI developments scattered among different companies with no regulation, how do we know that advances in this technology are all ethically and morally good? Or are we just hoping they are, relying on the good nature of the human programmers?


Ethics in AI has come up a few times in the comments of my articles so far. Should we genuinely be more worried about this than we are? Let me know what you think.



Updating your Active Directory Schema is something that needs to be done from time to time whether we like it or not. It is done to either support a new version of the OS Domain controller or because an AD integrated application such as Exchange, Skype for Business or SCCM requires the update. Regardless of the reasons the mere mention of an Active Directory ( AD) Schema update would make administrators cringe. The dreaded fear of the schema update is mostly due to the fact that this an update that cannot be undone. There is no uninstall button that allows you to reverse your changes. Things would get complicated if you have AD-integrated applications or have third party applications that also extended your schema.


Active Directory is like a beating heart


For those not sure what Active Directory is, it is a database of objects that represents users, computers, groups etc in your network, as well as being used for authentication and authorization. The schema is the component of Active Directory that defines all the objects with classes and attributes. For each version of Windows Server Domain Services, for instance the schema is different between AD 2003 and AD 2008 and AD 20012. When you introduce a new Domain controller with newer version OS you will need to update your schema.



I sometimes refer to AD as the heart of the network. The flow of the network, your enterprise objects, pass through this beating heart and if it has a brief hiccup or is slowed down it can affect the overall function of your network. Users not being able to login to their computers can have major impacts to the business and productivity loss can cost lost dollars. A non-working heart can be almost paralyzing for some businesses.


Upgrade all things NOW!


If there is mention of a schema update most would tend to delay an upgrade until they felt it was “safe”. Now this push of new product releases every 18 -24 months by Microsoft, it has introduced a re-thinking of sorts. In effort to reduce the fear and increase upgrades they have made these schema updates a little less painful and sometimes almost transparent. With each new release they simplify and make it easier to deploy and update.


With Windows server 2012 they made that process simpler by simplifying the upgrade process. The functions of with adprep and /forestprep, /domainprep have now been wrapped up into the Active Directory Domain Services role installation process making the process much easier through a few click of next. You can still use the command and do it manually if you want to be old school.



Schema updates are almost required for every Exchange Service Pack or major CU update now. The same can be said for other Microsoft applications such as Skype for Business and SCCM. They have made it so easy that in some cases, by installing the Application update such as a CU for Exchange 2013 the schema update process was built into the application. Given that the account you were using to run the Exchange update had all the appropriate permissions to update AD the schema, the update would be easy and seamless.

I think the level of fear of schema updates has decreased somewhat in past several years with administrators having to do it more often and the process to update keeps getting easier by Microsoft. Now if you have third party applications that extend your schema that may not be as pain free. As with any upgrade/update, you should always plan accordingly and test as much as possible, even the simple point and click ones.

I'm in Orlando this week for SQL Live as well as the Orlando SWUG meeting. With three sessions to deliver at SQL Live, working the booth there, and the session at the SWUG it is going to be a busy week. If you are in the Orlando area I hope you can stop by and say hello.


Anyway, here's a bunch of links I found on the Intertubz that you may find interersting, enjoy!


Canadian Money May Contain Animal Fat, Bank Of Canada Confirms

Gives new meaning to "put your money where your mouth is", because it's so tasty!


Everything announced at AWS re:Invent 2016

Oh, yeah, AWS: reInvent was this past week, in case you didn't know. Here's the list of everything they announced. I like the idea of Snowmobile, wonderful marketing gimmick. And now I know why Amazon operates with such thin margins, so they can afford things like Snowmobile.


Apple is reportedly using drones to beat Google Maps

Well, if the drones use Apple Maps, this might take a while.


34 tips to boost iPhone & iPad battery life

After fighting with applications slowly consuming the memory on my iMac this week I figured I would share some tips on battery life for your iPhone. Yes, I am assuming you have an iPhone, because I know you don't have a Windows Phone, but many of the tips work for Android as well.


Five Things You Need to Know About the U.K.’s Mass Surveillance Law

In case you had not heard about this, but the UK government has made it legal to do some questionable data collections.


NYPD plans to expand Smart car fleet to replace scooters

Not only are these cars adorable to look at, but I'm guessing they are a stepping stone to autonomous patrolling and data collection.


Big Data Poised to Get Much Bigger in 2017

All I want for Christmas is data, of course. Let's just hope it doesn't sit out and ROT all year.


Am I in the Christmas spirit yet? You be the judge:




In previous weeks, we talked about application-aware monitoring, perspective monitoring and agent/responder meshes to get a decentralized view of how our network is functioning.


With our traditional network monitoring system (NMS), we have a device-level and interface-level view. That's becoming less and less true as modern software breaks the mould of tradition, but it's still the core of its functionality. Now we have added perspective monitoring and possibly some agent/responder monitoring to the mix. How do we correlate these so that we have a single source of meaningful information?


Maybe We Don't?


Describing the use of the phrase "Single Pane of Glass" (SPoG) in product presentations as "overused" is an understatement. The idea of bringing everything back to a single view has been the holy grail of product interface design for some time. This makes sense, as long as all of that information is relevant to what we need at the time. With our traditional NMS, that SPoG is usually the dashboard that tells us whether the network is operating at baseline levels or not.


Perspective monitoring and agent/responder meshes can gather a lot more data on what's going on in the network as a whole. We have the option of feeding all that directly into the NMS, but is that where we're going to get the best perspective?


Data Visualization


We're living in a world of big data. The more we get, the less likely it becomes that we will be able to consume it in a meaningful way. Historically, we have searched for the relevant information in our network and filtered out what isn't immediately relevant. Big data is teaching us that it's all relevant, at least when taken as a whole.


Enter log aggregators and data visualization systems. Most of the information that we're getting from our decentralized tools can be captured in such a way as to feed these systems. Instead of just feeding it into the NMS, we have the option of collecting all of this data into custom visuals. These can give us single view not only of where the network is experiencing chronic problems, but of where we need to adjust our baselines.


Whether we're looking at Elastic Stack, Splunk, Tableau or other tools. The potential to capture the gestalt of our network's data and present it usefully is worthwhile.




What if there's something in all this that indicates unacceptable performance or a failure? Yes, that should raise alerts in our NMS.


This isn't an either/or thing. It's a complementary approach. There's no reason why the data from our various agents and probes can't feed both. Depending on the tool that's used, the information can even be forwarded directly from the visualizer, simplifying the collection process.


The Whisper in the Wires


Depending on what we’re looking for, there’s more than one tool for the job. Traditionally, we’re observing the network for metrics that fall outside of our baselines, particularly when those have catastrophic impact to operations. This is essential for timely response to immediate problems. Moving forward, we also want a bird’s eye view of how our applications and links are behaving, which may require a more flexible tool.


Has anyone else looked at implementing data visualization tools to complement their NMS dashboards?

The Rolling Stones once wrote a song about how time waits for no one, but the inverse is also true today. These days, no one waits for time; certainly not government personnel who depend on speedy networks to deliver mission-critical applications and data.


Fortunately, agency administrators can employ deep packet-level analysis to ensure the efficiency of their networks and applications. Packet-level analysis involves capturing and inspecting packets that flow between client and server devices. This inspection can provide useful information about overall network performance, including traffic and application response times, while fortifying network security.


Before we get into how this works, let’s take a minute to go back to the concept of time – specifically, network response time (NRT), also known as network path latency. NRT measures the amount of time required for a packet to travel across a network path from sender to receiver. When latencies occur, application performance can be adversely impacted.


Some applications are more prone to latency issues, and even lower bandwidth applications aren’t completely immune. End-users commonly think that these problems are the result of a “slow network,” but it could be the application itself, the network, or a combination of both.


Packet analysis can help identify whether the application or network is at fault. Managers can make this determination by calculating and analyzing both application and network response time. This allows them to attack the root of the problem.


They can also use analysis to calculate how much traffic is using their networks at any given time. This is critically important for two reasons: first, it allows administrators to better plan for spikes in traffic, and second, it can help them identify abnormal traffic and data usage patterns that may indicate potential security threats.


Additionally, administrators can identify which applications are generating the most traffic. Packets can be captured and analyzed to determine data volume and transactions, among other things. This can help managers identify applications and data usage that may be putting a strain on their networks.


The challenge is that, traditionally, packet-level analysis has typically been either too difficult or expensive to manage. There’s a free powerful open source tool called Wireshark, but it’s also a bit difficult to wrangle for those who may not be familiar with it. Many proprietary tools are full-featured and easier to use, but expensive.


The good news is that some standard network monitoring tools now include packet analysis as another key feature. That makes sense, because packet analysis can play an important – and very precise – role in making sure that networks continue to run efficiently. As a result, federal IT administrators now have more options to reach deep into their packets and honor the words that **** Jagger once sang: “Hours are like diamonds. Don’t let them waste.”


Find the full article on our partner DLT’s blog, TechnicallySpeaking.

This is the last of a 3-part series, which is itself is a longer version of a talk I give at conferences and conventions.

You can find part 1 here, and you can find part 2 here

Now that I'm wrapping it up, I would love to hear your thoughts, suggestions, and ideas in the comments below!


In the last two sections of this series, I made a case for WHY unplugging should be important to us as IT Professionals, and I began to dig into specific examples of HOW we can make unplugging work for us. What follows are some additional techniques you can adapt for your own use, as well as some ways to frame your time away so that you avoid the FUD that can come with trying something new and potentially different from what our colleagues are doing.


Perspective is Key

Along with planning, another key to successfully disconnecting is to develop a healthy perspective.


Try this for the next few days: Note how you are contacted during real emergencies (and how often those emergencies actually happen).


It's easy to fall into the trap of answering every call, jumping screens at the sound of a bell or tweet, checking our phone at two-minute intervals, and so on, when NOTHING is actually that important or urgent.


Develop an awareness of how often the things you check turn out to be nothing, or at least nothing important.


Change the way you think about notifications. Mentally re-label them interruptions and then see which matter. Pay attention to the interruptions. That's where you lose control of your life.



If someone really needed you or needed to tell you something, they wouldn't do it in a random tweet. They wouldn't tag you in a photo. They probably wouldn't even send it as a group text. When people want you to know something, they use a very direct method and TELL you.


So once again, take a deep breath. Learn to reassure yourself that you aren't going to miss anything important. Honest.


Prioritization is Key

For people like me, going offline is pretty much an all or nothing deal. As I said earlier, if it has an on switch, it's off limits for me and my family.


But that doesn't have to be the case. You can choose levels of connectivity as long as they don't get the best of you.


A good example of this is your phone. Most now support an ultra, super-duper power saving mode, which has the unintended benefit of turning off everything except... you know... the phone part. With one swipe you can prioritize direct phone calls while eliminating all the distractions that smartphones represent. You can also set different applications manually to interrupt – I  mean notify – you or not, so that you only receive the interruptions that matter.


As long as we're talking about prioritization, let's talk about getting work done. Despite your nagging suspicion to the contrary, your technology was not protecting you from the Honey Do list. It was just pushing the items on your list to the point where you had to work on them later in the day or week, and at a time when you are even less happy about it than you would have been otherwise.


Use your unplugged time to prioritize some of the IRL tasks that are dragging you down. I know it sounds counterintuitive, but it is actually easier to get back to work when you know the gutters are clean.


As challenging as it sounds, you might also need to prioritize who you get together with on your day off the grid. Don't purposely get involved with friends who spend their weekends gaming, live-tweeting, etc. There's nothing wrong with those things, of course, but you're not really offline if you keep telling your buddy, "Tweet this for me, okay?”


Yes, this may change who you associate with and when. But don't try to be offline when everyone else around you is online. That's like going on a diet and forcing your friends to eat vegan chili cheese dogs.


But What About...

Hopefully this has gotten you thinking about how to plan for a day away from the interwebz. But there's still that annoying issue of work. Despite claims of supporting work-life balance, we who have been in IT for more than 15 minutes understand that those claims go out the window when the order entry system goes down.


The answer lies partly with prioritization. If you've made your schedule clear (as suggested earlier) and the NOC still contacts you, you'll need to make a judgement call about how or if you respond.


Spoiler Alert: Always opt for keeping a steady paycheck.


Speaking of which, on-call is one of those harsh realities of IT life that mangle, if not outright destroy, work-life balance. It's hard to plan anything when a wayward email, text, or ticket forces you to go running for the nearest keyboard.



If you are one of those people who is on-call every day of the year around the clock, I have very little advice for you to go offline, and honestly you have bigger fish to fry. Because that kind of rat race gets old fast.


On the other hand, I have a ton of experience coordinating rotating on-call with offline. Now, I don't want you to think that I've negotiated this upfront on every job I've held. I have had managers who respected my religious schedule and worked around it, and others who looked me in the eye and said my religion was my problem to solve. Here's what I've learned from both experiences:


First, the solution will ultimately rest with your coworkers. Not with your manager and certainly not with HR. If you can work out an equitable solution with the team first, and then bring it to management as a done deal, you're likely home free.


Second, nobody in the history of IT has ever said they loved an on call schedule; and everyone wants more options. YOU, dear reader, represent those options. In exchange for your desired offline time, you can offer to trade coworkers and cover their shift. You wouldn't believe how effective this is until you try it. In a few rare cases, I've had to sweeten the deal with two-for-one sales ("I'll take your Sunday and Monday for every Saturday of mine"), but usually just swapping one day for another is more than enough. Another trick is to take your coworker's entire on-call week in exchange for them taking that number of your offline days during your on call rotation.


Yet another trick: My kids school schedule is extremely non-standard. They have school on Sunday and don't get days off for Christmas, Thanksgiving, or most of the other major national holidays. So I can graciously offer to cover prime time days like Thanksgiving in exchange for them taking my time off. In essence, I'm leveraging time when my family isn’t going to be home, anyway.


The lesson here is that if you have that kind of flexibility, use it to your advantage.


But what about perception? If you unplug regularly, won't people notice and judge you?


First, don't overthink it. When people get wind of what you are doing, you're more likely to receive kudos than criticism, and more than a few wistful comments along the lines of, “I wish I could do that."


Second, if you followed my suggestions about communicating and prioritizing - the right people knew about your plans AND you remained flexible in the face of an actual crisis - then there really shouldn't be any question. In fact, you will have done more than most IT folks ever do when they walk out the doors.


So that leaves the issue of using your evenings and weekends to get ahead with technology, so you can be the miracle worker when Monday rolls around.


While I understand the truth of this comic:



I'll put a stake in the ground and say that few - if any - people saved their job, won the bonus, or got the promotion because they consistently used personal time to get work done. And for those few who did, I'd argue that long term it wasn't worth it for all the reasons discussed at the beginning of this essay.


It's also important to point out that managers, departments, or companies that require this level of work and commitment are usually dangerously toxic. If you find yourself in that situation, you will be doing your long-term happiness and career a favor, even if your bank account isn't happy in the short term.


To sum up: Learning to disconnect regularly and for a meaningful amount of time offers benefits to your physical health, your peace of mind, and even your career; and there are no insurmountable challenges in doing so, regardless of your business sector, years of experience, or discipline within IT.


The choice is yours. At the start of this series, I dared you to just sit and read this article without flipping over to check your phone, email, twitter feed, etc. Now, if you made it to the end of these essays without checking those blinking interrupti... I mean notifications, then you have my heartfelt gratitude as well as my sincere respect.


If you couldn’t make this far, you might want to think about why that is, and whether you are okay with that outcome. Maybe this is an opportunity to grow, both as an IT professional and as a person.


That's it folks! I hope you have gained a few insights that you didn't already have, and that you'll take a shot at making it work for you. Let me know your thoughts in the comments below.

This week we kicked off the December Writing Challenge, and the response has been incredible. Not just in volume of people who have read it (over 800 views) or commented (60 people and counting - all of whom will get 200 THWACK points for each of their contributions!), but also in the quality of the responses. And that's what I wanted to share today.


First, if you are having trouble finding your way, the links to each post are:


So here are some of the things that leapt out at me over the last 3 days:


Day 0: Prepare

First, I have to admit this one was a sneaky trick on my part, since it came out on Nov 30 and caught many people unprepared (#SeeWhatIDidThere?). Never the less, a few of you refused to be left out.


KMSigma pointed out:

"IT is (typically) an interrupt-driven job.  Sure, you have general job duties, but most are superseded by the high priority email from the director, the alert that there is a failing drive in a server, the person standing in your cube asking for the TPS report, the CIO stating that they just bought the newest wiz-bang that they saw at the trade show and you need to implement it immediately.  Regardless of what is causing the interruptions, your "normal" daily duties are typically defined by the those same interruptions.


So, how can you plan for interruptions?  Short answer is that you can't, but you can attempt to mitigate them."


Meanwhile, sparda963 noticed the connection to the old Boy Scout motto, and said:

"Instead of keeping rope, emergency food, matches, water filter, and other endless supplies within reasonable reach I keep things like tools, utilities, scanners and the such around."


Finally (for this day) zero cool noticed (not incorrectly), that

"Preparing for a days work in IT is like preparing for trench warfare.  You need to be a tech warrior and have a good plan of attack on how to communicate with EUs and prioritize their requests (demands). "


Moving to Day One (Learn), some highlights included: spoke for many when he said

"The ability to learn is in my opinion one of the greatest tools in the kit for today’s IT professional. It is the ability to adapt and change to a world that is nowhere near static.  It is the skill to not just master a task but understand the concept as well."


Many others pointed out that learning is an active process that we have to be engaged with, not passively consume. And also that, as rschroeder commented,

"The day I stop learning is the day I die."


There were so many other amazing insights into how, why, and what to learn that you really should check them out.


But that brings us to today's word: Act.

miseri captured the essence of what many others said in the quote

"I don't trust words, I trust actions."


tinmann0715 was even able to honor the thoughts of his high school principal (even if he wasn't able to appreciate them at the time), who shared the motto:

"If it is to be it is up to me!"


And bleggett continued what is becoming a burgeoning Word Challenge trend to put thoughts into haiku with:

"Alerts that tell us

Charts that show us what we seek

Think before you act."


All in all there were some incredible ideas and personal stories shared. I appreciate everyone taking time out of their busy lives to share a piece of themselves in this way.


In the coming weeks, the "lead" article will come from other Head Geeks as well as folks from across the spectrum of the SolarWinds corporate community - members of the video team, editorial staff, product manageent, and more will share their stories, feelings, and reactions to each day's prompt.


Until next week...

Well hey everybody, I hope the Thanksgiving holiday was kind to all of you. I had originally planned to discuss more DevOPS with ya’ll this week however a more pressing matter came to mind in my sick and weakened state of stomach flu!


Lately we’ve been discussing ransomware but more important, lately I’ve been seeing an even greater incidence of ransomware affecting individuals and businesses, and worse when it would hit a business it would have a lot of collateral damage (akin to encrypting the finance share that only cursory access was allowed to or such)


KnowBe4 has a pretty decent Infographic on Ransomware I’m tossing in here and I’m curious what ya’ll have been seeing in this regards.

Do you find this to be true, an increased incidence, a decrease, roughly the same?




Some real hard and fast takeaways I’ve seen from those who aspire to mitigate ransomware attacks is to Implement:


  • Stable and sturdy firewalls
  • Email filtering scanning file contents and blocking attachments
  • Comprehensive antivirus on the workstation
  • Protected Antivirus on the servers


Yet all too often I see all of this investment around trying to ‘stop’ it from happening without a whole lot left to handling clean-up should it hit the environment, basically… Having some kind of backup/restore mechanism to restore files SHOULD you be infected.


Some of the top ways I’ve personally seen where Ransomware has wrought havoc in an environment have happened in the cases of; 

  • Using a work laptop on an untrusted wireless network
  • Phishing / Ransomware emails which have links instead of files and opening those links
  • Opening a “trusted” file off-net and then having it infect the environment when connected
  • Zero Day Malware through Java/JavaScript/Flash/Wordpress hacks (etc)


As IT Practitioners not only do we have to do our daily jobs, and the business to keep the lights on, and focus on innovating the environment, and keeping up with the needs of the business.   Worst of all when things go bad, and few things are as bad as Ransomware attacking and targeting an environment, then we have to deal with that on a massive scale! Maybe we’re lucky and we DO have backups, and we DO have file redirect so we can restore off of a VSS job, and we can detect encryption in flight and stop things from taking effect.   But that’s a lot of “Maybe” from end-to-end in any business and all of the applicable home devices that may be in play.  


There was a time when Viruses would break out in a network and require time and effort to cleanup, but at best it was a minor annoyance.  Worms would breakout and so long as we stopped whatever was the zero-day trigger we could stop it from occurring on the regular.   And while APTs and the like are more targeted threats this was less of a common occurrence for us to deal with where it would occupy our days as a whole.   But Ransomware gave thieves a way to monetize their activities, which gives incentives to infiltrate and infect our networks.   I’m sure you’ve seen the Ransomware now offering Helpdesk to assist victims with paying?



It’s definitely a crazy world we live in, one which leaves us only with more work to do on a daily basis, a constant effort to fend off and fight against.  This is a threat which has been growing at constant pace and is leaking and growing to infect Windows, Mac AND Linux.


What about your experiences, do you have any attack vectors for Ransomware you’d like to share, or other ways you were able to fend them off?  

Software Defined WAN is easily the most mature flavor of SDN. Consider how many large organizations have already deployed some sort of SD-WAN solution in recent years. It’s common to hear of organizations migrating dozens or even thousands of their sites to an entirely SD-WAN infrastructure, and this suggests that SD-WAN is no longer an interesting startup technology but part of the mainstream of networking.


The reason is clear to me. SD-WAN provides immediate benefits to a business’s bottom line, so from a business perspective, SD-WAN just makes sense. SD-WAN technology reduces complexity, improves performance and greatly reduces cost of an organization’s WAN infrastructure. The technology offers the ability to replace super-expensive private MPLS circuits for cheap broadband without sacrificing quality and reliability. Each vendor does this somewhat differently, but the benefits to the business are so palpable that the technology really is an easy sell.


The quality of the public internet has improved greatly over the last few years, so being able to tap into that resource and somehow retain a high quality link to branch offices and cloud applications is very tempting for cost-conscious CIOs. How can we leverage cheap internet connections like basic broadband, LTE and cheap cable yet maintain a high-quality user experience?


Bye-bye private circuits.


This is the most compelling aspect for using this technology. Ultimately it boils down getting rid of private circuits. MPLS links can cost thousands of dollars per month each, so if an SD-WAN solution can dramatically cut costs, provide fault tolerance and retain a quality experience, the value is going with all public internet connections.


Vendors run their software on proprietary appliances that make intelligent path decisions and negotiate with remote end devices to provide a variety of benefits. Some offer the ability to aggregate dissimilar internet connections such as broadband and LTE, some tout the ability to provide granular QoS over the public internet, and some solutions offer the ability to fail over from one primary public connection to another public connection without negatively affecting very sensitive traffic such as voice or streaming video. Also, keep in mind that this is an overlay technology which means that using SD-WAN means your transport is completely independent from the ISP.


Sweet. No more 3-year contracts with a monolith service provider.


Most SD-WAN vendors offer some, if not all, of these features, and some are going a step further by offering their solution as a managed service. Think about it: if your company is already paying some large ISP thousands per month for Ethernet handoffs into their MPLS cloud, what’s the difference with an SD-WAN managed service handing off a combination of Ethernet, LTE, etc. interfaces into their SD-WAN infrastructure?


Especially for small and medium-sized multi-site businesses, the initial cost of switching from managed MPLS to a dramatically cheaper managed SD-WAN provider is nothing compared to the savings over only a few years of savings from dropping private circuits.


For organizations such as high-transaction financial firms that want to manage their entire WAN infrastructure themselves and require almost perfect, lossless connectivity, SD-WAN may be a harder sell, but for most businesses it’s a no-brainer.


Picture a retail company with many locations such as a clothing store, bank, or chain restaurant that needs simple connectivity to payment processing applications, files, and authentication servers. These types of networks would benefit tremendously from SD-WAN because new branch locations can be brought online very quickly, very easily, and much more inexpensively than when using traditional private circuits. Not only that, but organizations wouldn’t be locked into a particular ISP anymore.


This is mainstream technology now, and it’s something to consider seriously when thinking about designing your next WAN infrastructure. It’s cheaper, easier to deploy, and easier to switch ISPs. That’s the real value of SD-WAN and why even huge organizations are switching to this technology in droves.


(image courtesy of Marvel)


...I learned from "Doctor Strange"

(This is part 2 of a 4-part series. You can find part 1 here)


When fate robs you of your skills, you can always seek others

The catalyst for the whole story was an accident that damaged Strange's hands beyond repair, or at least beyond his ability to ever hold a scalpel again.


The corollary for IT pros happens when we lose a technology. Maybe the software vendor is bought out and the new owner stops developing the old tools. Maybe your company moves in a different direction.  Or maybe the tool you know best  simply becomes obsolete. Whatever the reason, we IT professionals have to be ready to, in the words of my colleague Thomas LaRock (sqlrockstar), learn to pivot.


The interesting thing is that, very much like Stephen Strange, most of the time when we are asked (or forced) to pivot, we find we are able to achieve results and move our career forward in ways we couldn't have imagined previously.


Leverage the tools you have to learn new tools

One of the smaller jokes in the movie is when Wong the librarian asks, "How's your Sanskrit?" Strange glibly responds, "I'm fluent in Google Translate.” (Side note: Google translates Tamil, Telugu, Bengali, Gujarati, Kannada, and Sindhi among other Indic languages. But Sanskrit is not yet on the list).


The lesson for us in IT is that often you can leverage one tool (or the knowledge you gained in one tool), to learn another tool. Maybe the menuing system is similar. Maybe there are complimentary feature sets. Or maybe knowing one solution gives you insight into the super-class of tools that both tools belong to. Or maybe it's as simple as having a subnet calculator or TFTP server that lets you get the simple jobs done faster.


There’s no substitute for hard work

It's important to note that Strange does, in fact, learn to read Sanskrit. He puts in the work so that he isn't reliant on the tool forever. In fact, Strange is rarely shown already knowing things. Most of the time, he's learning, adapting, and most frequently just struggling to keep up. But at the same time, the movie shows him putting in enormous amounts of work. He rips through books at a fearsome rate. He learns to project his astral form so that he can stretch out his sleeping hours and continue to read, absorb, and increase his base of knowledge. Obviously, he also has natural gifts, and tools, but he doesn't rest on either of those.


In IT, there really is no better way to succeed than to put in the work. Read the manual. Test the assumption. Write some sample code. Build a test network (even if you do it completely virtually). Join a forum (for example,, and ask some questions.


Experience, creativity, and powerful tools help save the day

At the climax of the movie, Strange defeats the Dread Dormammu, lord of the dark dimension, in a most curious way: He creates a temporal loop that only he can break, locking Dormammu and himself into an endless repetition of the same moment in time. Faced with the prospect of his own personal Groundhog Day, Dormammu agrees to leave the Earth alone. The interesting thing is that, by all accounts, Strange isn't the strongest sorcerer in the world. Nor is he the most experienced. He has a spark of creativity and a few natural gifts, but that's about it.


Anyone in IT should be all too familiar with this narrative. A willingness to use the tools at hand, along with some personal sacrifice to get the job done, is often how the day is saved. In the movie, the tool at hand was the Eye of Agamotto. In real life, the small but powerful tool is often monitoring, which provides key insights and metrics that help us cut straight to the heart of the problem with little effort or time wasted.


Ask people who’ve stood in your shoes how they moved forward

In the course of his therapy, Stephen Strange is referred to the case of Jonathan Pangborn, a man who suffered an irreparable spinal cord injury, but who Strange finds one day playing basketball with his buddies. Telling Pangborn that he is trying to find his own way back from an impossible setback, Strange begs him to explain how he did it. This is what sets the hero's path toward the mystical stronghold in Kathmandu.


In IT, we run up against seemingly impossible situations all the time. Sometimes we muscle through and figure it out. Sometimes we just slap together a kludgy workaround. But sometimes we find someone who has had the exact same problem, and solved it! We need to remember that many in our ranks have stood where we stand and solved what we hope to solve. There’s no need to struggle to re-invent an already existing solution. But to benefit from others' experience, we have to ASK.


That's where being part of a community, such as Stack Exchange or THWACK©, can pay off. I’m not talking about registering an account and then asking questions only when you get stuck. I mean joining the community, really getting involved, reading articles, completing surveys, adding comments, answering questions, and, yes, asking your own as they come up.


Even broken things can help you find your way

On his way to the mystical school of Kamar Taj, Doctor Strange is accosted by muggers and ordered to give up his watch. Even though he is rescued from what appears to be a brutal beating, his watch isn't so lucky. It's only later that we realize there's an inscription on the back that reads, "Only time will tell how much I love you,” indicating that the watch is from Christina, one of the few people Strange has made a personal connection with.


While the joke, "Even a broken clock is right twice a day" comes to mind, the lesson I'm thinking of is a little deeper. In IT, we often overlook the broken things, whether it's code that doesn't compile, a software feature that doesn't work as advertised, or hardware that's burnt out, in favor of systems and solutions that run reliably. And that's not a bad choice, generally speaking.


But our broken things can still teach us a lot. I've rarely learned anything from a server that ran like clockwork for months on end. But I've learned a lot about pin-outs, soldering, testing, timing, memory registers, and more when I've tried to get an old boat anchor working again.


Sometimes that knowledge transferred. Sometimes it didn't. But even if not, the work grounded me in the reality of the craft of IT, and gave me a sense of accomplishment and direction.


Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 3-4, coming in the following weeks.

If you haven't read the earlier posts, here's a chance to catch up on the story so far:


  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)


Now you're up to speed with the chaotic life of the two characters whose jobs we are following, here's the third installment of the story, by Tom Hollingsworth (networkingnerd).



The View From Above: James (CEO)


I got another call about the network today. This time, our accounting department told us that their End of Year closeout was taking much too long. They have one of those expensive systems that scans in a lot of our paperwork and uploads it to the servers. I wasn't sure if they whole thing was going to be worth it, but we managed to pay for it with the savings from renting warehouse space to store huge file boxes full of the old paper records. That's why I agreed to sign off on it.


It worked great last year, but this time around I'm hearing nothing but complaints. This whole process was designed to speed things up and make everyone's job easier. Now I have to deal with the CFO telling me that our reports are going to be late and that the shareholders and the SEC are going to be furious. And I also have to hear comments in the hallways about how the network team still isn't doing their job. I know that Amanda has done a lot recently to help fix things, but if this doesn't get worked out soon the end of the year isn't going to be a good time for anyone.



The View From The Trenches: Amanda (Sr Network Manager)


Fresh off my recent issues with the service provider in Austin, I was hoping the rest of the year was going to go smoothly. Until I got a hotline phone call from James. It seems that the network was to blame for the end of year reporting issues that the accounting department was running into. I knew this was a huge issue after sitting in on the meetings about the records scanning program before I took over the network manager role. The arguments about the cost of that thing made me glad I worked in this department. And now it was my fault the thing wasn't working? Time to get to the bottom of this.


I fired up SolarWinds NPM and started checking the devices that were used by the accounting department. Thankfully, there weren't very many switches to look at. NPM told me that everything was running at peak performance; all the links to the servers were green, as was the connection between the network and the storage arrays. I was sure that any misconfiguration of the network would have shown up as a red flag here and given me my answer, but alas the network wasn't the problem. I could run a report right now to show to James to prove that the network was innocent this time.


I stopped short, though. Proving that it wasn't the network was not the issue; the issue was that the scanning program wasn't working properly. I knew that if it ended up being someone else's bigger issue that they were going to be on the receiving end of one of those conference room conversations that got my predecessor Paul fired. I knew that I had the talent to help this problem get fixed and help someone keep their job before the holidays.


So, if the network wasn't the problem, then what about the storage array? I called one of the storage admins, Mike, and asked him about the performance on the array. Did anything change recently? Was the firmware updated? Or out of date? I went through my standard troubleshooting questions for network problems. The answers didn't fill me with a lot of confidence.


Mike knew his arrays fairly well. He knew what kind they were and how to access their management interfaces. But when I started asking about firmware levels or other questions about the layout of the storage, Mike's answers became less sure. He said he thought maybe some of the other admins were doing something but he didn't know for sure. And he didn't know if there was a way to find out.


As if by magic, the answer appeared in my inbox. SolarWinds emailed me about a free trial of their Storage Resource Monitor (SRM) product. I couldn't believe it! I told Mike about it and asked him if he'd ever tried it. He told me that he had never even heard of it. Given my luck with NPM and keeping the network running, I told Mike we needed to give this a shot.


Mike and I were able to install SRM alongside NPM with no issues. We gave it the addresses of the storage arrays that the accounting data was stored on and let it start collecting information. It only took five minutes before I heard Mike growling on the other end of the phone. He was looking at the same dashboard I was. I asked him what he was seeing and he started explaining things.


It seems that someone had migrated a huge amount of data onto the fast performance storage tier. Mike told me that data should have been sitting around in the near-line tier instead. The data in the fast performance tier was using up resources that the accounting department needed to store their scanned data. Since that data was instead being written to the near-line storage, the performance hit looked like the network was causing the problem when in fact the storage array wasn't working like it should.


I heard Mike cup his hand over the phone receiver and start asking some pointed questions in the background. No one immediately said anything until Mike was able to point out the exact time and date the data was moved into the performance tier. It turns out one of the other departments wanted to get their reports done early this year and talked one of the other storage admins into moving their data into a faster performance tier so their reports would be done quicker. That huge amount of data had caused lots of problems. Now, Mike was informing the admin that the data was going to be moved back ASAP and they were going to call the accounting department and apologize for the delay.


Mike told me that he'd take care of talking to James and telling him it wasn't the network. I thanked him for his work and went on with the rest of my day. Not only was it not the network (again), but we found the real problem with some help from SolarWinds.


I wouldn't have thought anything else about it, but Mike emailed me about a week later with an update. He kept the SRM trial running even after we used it to diagnose the accounting department issue. The capacity planning tool alerted Mike that they were going to run out of storage space on that array in about six more weeks at the rate it was being consumed. Mike had already figured out that he needed to buy another array to migrate data and now he knew he needed a slightly bigger one. He used the tool to plan out the consumption rate for the next two years and was able to convince James to get a bigger array that would have more than enough room. It's time to convert that SRM trial into a purchase, I think; it's great value and I'm sure Mike will be only too happy to pay.




Filter Blog

By date:
By tag: