Skip navigation
1 6 7 8 9 10 Previous Next

Geek Speak

1,765 posts

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

One Company's Journey Out of Darkness, Part I: What Tools Do We Have?

One Company's Journey Out of Darkness, Part II: What Tools Should We Have?

One Company's Journey Out of Darkness, Part III: Justification of the Tools

One Company's Journey Out of Darkness, Part IV: Who Should Use the Tools?

One Company's Journey Out of Darkness, Part V: Seeing the Light

One Company's Journey Out of Darkness, Part VI: Looking Forward

If you'e followed the series this far, you've seen a progression through a series of tools being rolled out. My hope is that this last post in the series spawns some discussion around tools that are needed in the market and features or functionality that is needed. these are the top three things that we are looking at next.


Event Correlation

The organization acquired Splunk to correlate events happening at machine level throughout the organization, but this is far from fully implemented and will likely be the next big focus. The goal is to integrate everything from clients to manufacturing equipment to networking to find information that will help the business run better and experience fewer outages and/or issues as well as increase security. Machine data is being collected to learn about errors in the manufacturing process as early as possible. This error detection allows for on the fly identification of faulty machinery and enables quicker response time. This decreases the amount of bad product and waste as a result, improving overall profitability. I still believe there is much more to be gained here in terms of user experience, proactive notifications, etc.

Software Defined X

Looking to continue move into the software defined world for networking, compute, storage, etc. These offerings vary greatly and the decision to go down a specific path shouldn't be taken lightly by an organization. In our case here we are looking to simplify network management across a very large organization and do so in such a way that we are enabling not only IT work flows, but for other business units as well. This will likely be OpenFlow based and start with the R&D use cases. Organizationally IT has now set standards in place that all future equipment must support OpenFlow as part of the SDN readiness initiative.

Software defined storage is another area of interest as it reduces the dependency on any one particular hardware type and allows for ease of provisioning anywhere. The ideal use case again is for R&D teams as they develop new product. Products that will likely lead here are those that are pure software and open, evaluation has not really begun in this area yet.

DevOps on Demand

IT getting a handle on the infrastructure needed to support R&D teams was only the beginning of the desired end state. One of the loftiest goals is to create an on-demand lab environment that provides compute, store and network on demand in a secure fashion as well as provide intelligent request monitoring and departmental bill back. We've been looking into Puppet Labs, Chef, and others but do not have a firm answer here yet. This is a relatively new space for me personally and I would be very interested in further discussion around how people have been successful in this space.


Thank you all for your participation throughout this blog series.  Your input is what makes this valuable to me and increases learning opportunities for anyone reading.

Recently we covered what it means to configure server monitoring correctly, and the steps we can take to ensure that the information we get alerted on is useful and meaningful.  We learned that improper configuration leads to support teams that ignore their alerts, and system monitoring becomes noise. Application monitoring isn’t any different, and what your organization sets up for these needs is likely to be completely different than what was done for your server monitoring.  During this article we will focus on monitoring Microsoft Exchange on-premises, and what should be considered when choosing and configuring a monitoring tool to ensure that your organizational email is functioning smoothly to support your business.

Email Monitoring Gone Wrong

In the early days of server monitoring it wasn’t unusual for system administrators to spend months configuring their server monitoring tool for their applications.  With some applications, this decision may be completely appropriate, but with Microsoft Exchange I have found that server monitoring tools typically are not enough to successfully monitor your organizational email system. Even if the server monitoring tool comes with a “package” that is specifically for monitoring email.  The issue becomes that by default these tools with either alert on tool much or too little never giving your application owner exactly what they need.

Corrective Measures

So how can your business ensure that email monitoring is setup correctly, and that the information received from that tool is useful?  Well it really comes down to several simple things.

  • Evaluate several Exchange monitoring tools, and then choose a tool that will best suit your Exchange needs.  In most cases this tool is not the same as your server monitoring tool.
  • Implementation of Exchange monitoring should be project with a dedicated resource.
  • The tool should be simple and not time consuming to configure. It should NOT take 6 months to be able to properly monitoring your email system.
  • Choose a tool that monitors Active Directory too.  Exchange depends heavily on Active Directory and DNS, so Active Directory health is also vital.
  • Make sure you can easily monitor your primary email functionality. This includes email flow testing, your Exchange DAG, DAG witness, Exchange databases, ActiveSync, Exchange Web Services, and any additional email functionality that is important to your organization.
  • Ensure that the tool selected has robust reporting.  This will allow for time saving’s from scripting your own reports and allow for better historical trending of email information. These reports should include things such as mail flow SLA’s, large mailboxes, abandoned mailboxes, top mail senders, public folder data, distribution lists and more.

This approach will ensure that your email system will remain functional, and alert you before a real issue occurs.  Not after the system has gone down.

Concluding Thoughts

Implementing the correct tool set for Microsoft Exchange monitoring is vital to ensuring the functionality and stability of email for your business.  This is often not the same tool used for server monitoring, and should include robust reporting options to ensure your SLA’s are being met and that email remains functional for your business purposes.

Helllllllo Geek Speak Friends,


It is my pleasure to inform our awesome community that Geek Speak has been nominated for the 2015 Bytes that Rock Awards in the Best Software Blog Category. That said, I want to thank all of our amazing contributors and thwack community. All of you provide amazing content from blogs to in depth discussion. The Head Geeks and Ambassadors work very hard to crank out the content you enjoy so much.

Kong Yang - @kong.yang

Patrick Hubbard - @patrick.hubbard

Leon Adato - @adatole

Thomas LaRock - @sqlrockstar

Show them some love and Vote today!

The winner will be announced December stay tuned.




Yet another tip from your friendly neighborhood dev.

By Corey Adler (ironman84), professional software developer

(With many thanks to my boss, John Ours.)


I knew it would eventually come to this. I knew I would inevitably have to deal with this issue, but I was hoping to put it off a bit longer. The thing is, I painted myself into a corner with my last post.


Well, here goes:


Always, always, FRAKKING ALWAYS declare your variables.


I don’t care if you’re using a language that allows for dynamic typing.




Why am I so passionate about this? It probably has to do with the fact that I’ve had to take a look and fix code in script files where variables were not declared. Those were some of the most difficult tasks in my career, simply because it made it harder for me to keep track of everything going on. It might be easy and simple for you to describe what each of your variables is doing, but what about the next person that’s going to take a look at it? As a programmer you always have to think not just about what you’re doing right now, but what might happen days, months, or even years from now.


So what about someone, like yourself, who is not a professional programmer, who is just playing around with the code? Wouldn’t that argument work in your favor (since no one else will be working on it)? NOPE! Because that next person could easily be you, years later, looking back over at some piece of code. Sadly, you can’t even make the assumption that you, who wrote the code, will still have some idea what that code does when you go back to look at it.



The best thing for you to do is to assume that the next person to look at it won’t have any idea about what’s going on and needs you to help explain things. The reason you should declare the variables is to help with the self-documentation of your code for future use. It’s a great practice to get into, and no professional experience is needed.


That’s as far as styling goes, though. The primary reason to always declare your variables (as well as why you shouldn’t just declare them at the top of your file), is because of a concept called implicit scope. Implicit scope is the idea that one should only declare their variables as they happen to need them. The benefits of this approach are twofold. First, it reduces the number of variables in your program that only appear in certain contained blocks of code. For example, let’s say that you have a variable that you only use inside of a for-loop. Instead of having that variable taking up space (both in your file and in memory) for the whole length of the program, you have that variable declared and used only in that for-loop. That way no one looking at the code needs to worry about other locations that use that variable since it’s clearly contained within a specific block of code. Second, it makes it easier to debug your code should the need arise. When you see a variable declared right before its first use in your program, you know that it hasn’t been used previously.  If you haven’t declared your variables, though, or you’ve declared them at the top of the file, someone (including you) who goes to look at the code later will need to check to see if that variable is being used anywhere else prior to that location, which can waste a lot of time and be a drain on one’s memory.


A third reason to declare your variables involves something in programming called coercion. Coercion is defined as automatically forcing one of the operands in a function to that of another type at runtime. Here’s an example of implicit type coercion, using JavaScript:

[ ] == 0; // result is true
0 == "0"; // result is true

So why, exactly, is coercion a bad thing? First of all, it requires each variable that you create to take a larger chunk out of memory than would normally be required. In statically typed languages, like C#, each different variable type is allocated a certain, fixed amount of memory, which would correspond to the size of the maximum values for those types. With coercion, all variables are treated the same, and are allocated the same amount of memory, sometimes way more than is necessary. All of this can add up in your application, potentially causing it to run slower.


A fourth, and probably more understandable reason for people not as well versed in coding as you, are the insanely weird behaviors that can occur by using coercion. Take this extreme example that I found at  Insert the following into the address bar of your Web browser (and replace the “javascript:” at the front if you have a browser that automatically removes it when copy-pasted):

javascript:alert(([]+[][[]])[!![]+!![]]+([]+[][[]])[!![]+!![]+!![]+!![]+!![]]+(typeof (![]+![]))[!![]+!![]]+([]+[][[]])[!![]+!![]+!![]+!![]+!![]]+([]+!![])[![]+![]]+([]+!![])[![]+!![]]+([]+[][[]])[!![]+!![]+!![]+!![]+!![]])


The result of this is an alert box with the name of the person who wrote the post at the aforementioned link. All of this is only possible through coercion. Let’s take another, simpler example of the weird behaviors that can occur when you allow coercion to happen, found at****/ :

var a = "1"

var b = 2

var c = a + b


In a case like this, what would you expect the result of c to be? If you think it should be 3, then you’re a victim of coercion. The same goes if you think it will be “3” (as in the string value). In actuality, the value populated in c is 12. Adding a + sign to a (as in “+a + b”) will give you back 3 (as a number; not a string). Or how about the following example:

16 == [16]       

16 == [1,6]      

"1,6" == [1,6]  


So what do you think? The first one could be true, because they both contain 16, but it could be false since one is an integer and the other is an array. The second one doesn’t look right, but maybe that’s how an array of integers is expressed when printing them on the screen. Or the third one, which also doesn’t look right, but could be right for the same reason the second could be. The truth, in order, is true, false, and… true. WHAT THE HECK IS UP WITH THAT? It turns out that that is actually how JavaScript translates an array (into that string) and then comes back as true, since they now match.


One final reason for avoiding both coercion and the itch to not declare your variables involves programming architecture. Programming languages typically rely either on interpreters to translate the code to be run (like in Java™), or on compilers that do the same. What happens when your code is run is entirely dependent on those interpreters or compilers. Those things, in turn, can be highly dependent on the architecture of the computer that it’s being run on. Slight changes in either the compiler or in the architecture could drastically change the result of the program, even though it’s still the same code being run. What happens if, at some point down the line, someone makes an adjustment to the coercion system? How much code would that affect? Considering the ever-changing nature of the IT world we live in, that’s not a trivial concern. It’s also not a remote possibility either, with JavaScript due to receive the upgrade soon to ES2015. What will your code do when it gets upgraded?


So there you have it: ALWAYS DECLARE YOUR VARIABLES, EVEN IN DYNAMICALLY TYPED LANGUAGES. In the words of the Governator:



No, wait, that’s not right. Ah, here it is:



I’m finished ranting. Let me know what other topics you think I should touch on in the comments below, or feel free to message me on thwack®. Until next time, I wish you good coding.

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

One Company's Journey Out of Darkness, Part I: What Tools Do We Have?

One Company's Journey Out of Darkness, Part II: What Tools Should We Have?

One Company's Journey Out of Darkness, Part III: Justification of the Tools

One Company's Journey Out of Darkness, Part IV: Who Should Use the Tools?

One Company's Journey Out of Darkness, Part V: Seeing the Light

One Company's Journey Out of Darkness, Part VI: Looking Forward

After months of rolling out new tools and provisioning the right levels of access, we started to see positive changes within the organization.


Growing Pains

Some amount of growing pains were to be expected and this was certainly no exception. Breaking bad habits developed over time is a challenge, however the team worked to hold each other accountable and began to build the tools into their daily routines. New procedures for rolling out equipment included integration with monitoring tools and testing to ensure data was being logged and reported on properly.  The team made a concerted effort to ensure that previously deployed devices were populated into the system and spent some time clearing out retired devices. Deployments weren't perfect at first and a few steps were skipped, however the team developed deployment and decommission checklists to help ensure the proper steps were being met. Some of the deployment checklist items included things that would be expected: IP addressing, SNMP strings, AAA configuration, change control submission, etc. while others were somewhat less obvious - placing inventory tags on devices, recording serial numbers, etc. We also noticed that communications between team members started to change as discussions were starting from a place in which individuals were better informed.

Reducing the Shadow

After the "growing pains" period, we were pleased to see that the tools were becoming part of every day activities for core teams. The increased knowledge led to some interesting discussions around optimizing locations for specific purposes and helped shed some light on regular pain points within the organization. For this particular customer, the R&D teams have "labs" all over the place which could place undue stress on the network infrastructure.  The "Shadow IT" that had been an issue before could now be better understood. In turn, IT made an offer to manage the infrastructure in trade for giving them what they wanted. This became a win-win for both groups and has fundamentally changed the business for the better. In my opinion, this is the single best change the company experienced. Reduction in role of "Shadow IT" and migrating those services to the official IT infrastructure group created far better awareness and supportability. As an added benefit, budgets are being realigned with additional funding shifted to IT who has taken on this increased role. There is definitely still some learning that needs to be done here, but the progress thus far has been great.

Training for Adoption

Adoption seemed slow for help desk and some of the ancillary teams who weren't used to these tools and we wanted to better understand why. After working with the staff to understand the limited use it became apparent that although some operational training had been done, training for adoption had not.  A well-designed training-for-adoption strategy can make the difference between success and failure of a new workflow or technology change.The process isn't just providing users with technical knowledge, but rather to build buy-in, ensure efficiency, and create business alignment. It is important to evaluate how the technology initiative will help improve your organization. Part of the strategy should include an evaluation plan to measure results against those organizational outcomes, such as efficiency, collaboration, and customer satisfaction (this could be internal business units or outward facing customers).

The following are tips that my company lives by to help ensure that users embrace new technology to advance the organization:

Communicate the big-picture goals in relevant terms. To senior management or technology leaders, the need for new technology may be self-evident. To end-users, the change can seem arbitrary. However, all stakeholders share common interests such as improving efficiency or patient care. Yet, users may resist a new workflow system—unless the project team can illustrate how the system will help them better serve patients and save time.

Invest properly in planning and resources for user adoption. If an organization is making a significant investment in new systems, investing in the end-user experience is imperative to fully realize the value of the technology. However, training for user adoption often is an afterthought in major technology project planning. Furthermore, it is easy to underestimate the hours required for communications, workshops and working sessions.

Anticipate cultural barriers to adoption. Training should be customized to your corporate culture. In some organizations, for instance, time-strapped users may assume that they can learn new technology “on the fly.” Others rely on online training as a foundation for in-person instruction. Administrators may face competing mandates from management, while users may have concerns about coverage while they are attending training. A strong project sponsor and operational champions can help anticipate and overcome these barriers, and advise on the training formats that will be most effective.

Provide training timed to technology implementation. Another common mistake is to provide generic training long before users actually experience the new system, or in the midst of go-live, where it becomes chaotic. Both scenarios pose challenges. Train too early and, by the time you go “live,” users forget how they are supposed to use the technology and may be inclined to use it as little as possible If you wait for go-live, staff may be overwhelmed by their fears and anxieties, and may have already developed resistance to change. The ideal approach will depend on each facility’s context and dependencies. However, staggering training, delivering complex training based on scenarios, addressing fears in advance, and allowing for practice time, are all key success factors.

Provide customized training based on real-life scenarios. Bridging the gap between the technology and the user experience is a critical dimension of training and one that some technology vendors tend to overlook in favor of training around features and functionality. Train with real-life scenarios, incorporating various technologies integrated into “day in the life” of an end user or staff member. By focusing on real-world practice, this comprehensive training helps overcome the “fear of the new” as users realizes the benefits of the new technology.

Create thoughtful metrics around adoption. Another hiccup in effective adoption occurs when companies do not have realistic metrics, evaluation, and remediation plans. Without these tools, how do you ensure training goals are met—and, perhaps more importantly, correct processes when they are not? Recommend an ongoing evaluation plan that covers go-live as well as one to six months out.

Don’t ignore post-implementation planning. Contrary to popular perception, training and adoption do not end when the new system goes live. In fact, training professionals find that post-implementation support is an important area for ensuring ongoing user adoption.

We’ve talked about the changes that Microsoft has brought to Windows, Office and Office 365. That’s a fair amount of new and update features for any technology company, across their major platforms. But the changes don’t stop there. Let’s take a look at some of the other innovations that have been announced:


Office Lens: Though not a new product, Office Lens was born on the Windows Phone platform but has now been released for iOS and Android. It’s used for snapping a photo of a business card, whiteboard scribble or printed document, cleans it up and saves to Word, PDF and/or OneDrive and handles OCR to text. In my opinion, this is ‘the best Microsoft product that nobody knows about’.


Azure AD Domain Services: While we’re used to hearing that Azure runs Active Directory, in reality it’s a special edition of AD that doesn’t have the full feature set. This has restricted the use of Microsoft’s Cloud platform, as the authentication mechanism used by many applications just weren’t supported. Enter Azure AD Domain Services, with support now for LSAP, NTLM, Kerberos and AD domain join. Group Policy is now also available for managing Azure virtual machines. Does this mean you can turn off your on-premises domain controllers? Not just yet, sorry. Application authentication has priority as this product is still in preview, but watch this space.


Universal Apps: This is arguably Microsoft’s boldest move, but the jury’s still out on whether it will pay off. Universal Windows Platform Apps are based on one API set so developers can code for Windows 10 across multiple devices. The app store is the same whether you’re viewing it for the PC, Windows Phone or your Xbox. The goal is to make it easier for apps to go to market and to roll out updates without having to create device-specific versions. Big goal, but the app developers will need to get on board before it’s a success.


Continuum: When your phone needs to be a full computer, it can be if it’s a compatible Windows 10 Mobile device. We’re not talking just full-screening here. Continuum optimises the enlarged display device, supports mouse & keyboard actions and still lets you pick up your phone to answer calls and send texts. It’s a little mind blowing. But if you’re struggling to find a real-world use for it, presenters may never need to carry a laptop again. In fact, we’re seeing an increasing number of Microsoft staff run presentations this way.  Note: Requires a wired or wireless dongle or dock accessory. Watch Bryan Roper demo Continuum at the Microsoft Windows 10 Devices event -


Surface Book: The tablet that replaces your laptop is a laptop that’s a tablet. Yes, for the first time in history, Microsoft have released their own laptop. The amount of innovation in this product is staggering, from the processor chip to the dynamic fulcrum hinge. It’s light and fast and beautiful. The screen comes off to be a tablet and the pen has four interchangeable tips.  Now I just have to wait for my existing laptop to die. Oops, was that a glass of water?


Hololens: The demos are amazing. Hololens is a high definition holographic computer in a fully contained headset. It’s not meant to be worn walking down the street, but opens up a huge range of applications from gaming to design to medical imaging. Oh and the holograms are environment-aware, so the bad guys will crawl over your furniture. The developer kit is now available for purchase but my kids just want to play holographic Minecraft. See the gaming demos here:Microsoft Hololens Minecraft E3 2015 - Minecraft Holographic (Demo) - YouTube


Some of these seem a little more far-fetched than others, especially in terms of Enterprise adoption. They are strong markers of a new Microsoft though, one that’s not happy just pushing out new versions of its existing platforms.

What do you think? Is there anything in this list on your radar to look at? Is Microsoft focussing on the wrong things or are you pleasantly surprised at how the future might look for the company with Satya Nadella in charge now?




P.S. Please don’t tell me you are rolling out Surface Books ... unless you have a vacancy on your team.

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

One Company's Journey Out of Darkness, Part I: What Tools Do We Have?

One Company's Journey Out of Darkness, Part II: What Tools Should We Have?

One Company's Journey Out of Darkness, Part III: Justification of the Tools

One Company's Journey Out of Darkness, Part IV: Who Should Use the Tools?

One Company's Journey Out of Darkness, Part V: Seeing the Light

One Company's Journey Out of Darkness, Part VI: Looking Forward

Throughout this series I've been advocating the formation of a tools team, whether it is a formalized group of people or just another hat that some of the IT team wears. This team's task is to maximize the impact of the tools that they've chosen to invest in. In order to maximize this impact, understanding who is using each tool is a critical component of success. One of the most expensive tools that organizations invest in is their main network monitoring system.  This expense may be in the CapEx spent obtaining the tool or the sweat equity put in by someone building out an open source offering, but either way these dashboards require significant effort to put in place and demand effective use by the IT organization. Most of IT can benefit from these tools in one way or another, so having Role Based Access Controls to these platforms is important so that this access may be granted in a secure way. Screens should be highly visible so that everyone in the office can see them.

Network Performance Monitoring

NPM aspects of a network management tool should be accessible by most if not all teams, although some may never opt to actually use it. Outside of the typical network team, the server team should be aware of typical throughput, interface utilization, error rates, etc. such that the team can be proactive in remediation of issues. Examples where this has come in useful include troubleshooting backup related WAN congestion issues and usage spikes around anti-virus updates in a large network. In both of these cases, the server team was able to provide some insights into configuration of the applications and options to help remedy the issue in unison with the network management team. Specific roles benefiting from this access include: Server Admins, Security Admins, WAN Admin, Desktop Support

Deep Packet Inspection/Quality of Experience Monitoring

One of the newer additions to NMS systems over the years has been DPI and its use in shedding some light on the QoE for end users. Visibility into application response time can benefit the server team and help them be proactive in managing compute loads or improving on capacity. Traps based on QoE variances can help teams responsible for specific servers or applications provide better service to business units. Specific roles benefiting from this access include: Server Admins, Security Admins, Desktop or Mobile Support

Wireless Network Monitoring

Wireless has outpaced the wired access layer as the primary means of network connectivity. Multiple teams benefit from monitoring the air space ranging from security to help desk and mobile support teams. In organizations supporting large guest networks - health care, universities, hotels, etc. the performance of your wireless network is critical to the public perception of brand. Wireless networks monitoring now even appeals to customer service or marketing teams. This addition to non-IT teams can improve overall communications and satisfaction with the solutions. For teams with wireless voice handsets, telecom will benefit from access to wireless monitoring. In health care, there is a trend to develop a mobile team as these devices are critical to the quality of care. These mobile teams should be considered advanced users of wireless monitoring.

IP Address Management (IPAM)

IPAM is an amazing tool in organizations that have grown organically over the years. Using my customer as a reference, they had numerous /16 networks in use around the world, however many of these were disjointed. This disjointed IP addressing strategy leads to challenge from an IP planning standpoint, especially for any new office, subnet, DMZ, etc. I'd advocate read only access for help desk and mobile support teams and expanded access for server and network teams. Awareness of an IPAM solution can reduce outages due to human error and provides a great visual reference as to the state of organization (or lack there of) when it comes to a company's addressing scheme.


I personally do not advocate an environment that promotes read-only access for anyone interested in these tools as the information held within these tools should be secure as they would provide the seeds for a well planned attack if so desired. Each individual given access to these tools should be made aware that they are a job aide and carry a burden of responsibility. Also, I've worked with some organizations looking for very complex RBAC for their management teams, unless you have an extremely good reason, I'd shy away from this as well as the added complexity generally offers very little.

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

One Company's Journey Out of Darkness, Part I: What Tools Do We Have?

One Company's Journey Out of Darkness, Part II: What Tools Should We Have?

One Company's Journey Out of Darkness, Part III: Justification of the Tools

One Company's Journey Out of Darkness, Part IV: Who Should Use the Tools?

One Company's Journey Out of Darkness, Part V: Seeing the Light

One Company's Journey Out of Darkness, Part VI: Looking Forward

As organizations roll out network management software and extend that software to a number teams they begin to gain additional insights that weren't visible before. These additional insights enable the business to make better decisions, recognize more challenges and/or inefficiencies, etc.


For this customer one of the areas in which we were able to vastly improve visibility had to do with the facilities team. This manufacturing site has its own power station and water plant among other things to ensure that manufacturing isn't ever disrupted. In working on other projects with the team, it became obvious that the plant facilities team was in the dark about network maintenance issues, etc. This team would mobilize into "outage mode" whenever the network was undergoing maintenance. After spending time with this team and understanding why they had to react the way that the do, we were able to extend a specific set of tools to them that would make them aware of any outages, give them insight into when/why certain devices were offline, and provide visibility into when the network would come back online. This increased awareness of their needs, combined with additional visibility from network tools has reduced the average cost of an outage significantly as well as solved some communication challenges between various teams. We were also able to give them a dashboard that would help discern between network and application level issues.

This is a brief of example as to how we can all start to build the case for network management tools and do so in a business relevant way. Justifying these tools has to be about the business rather than simply viewing red/yellow/green or how hard a specific server is working. A diverse team can help explain the total business impact better than any single team could. For admins looking to get these tools look for some of these business impacting advantages:

Reduced Downtime

We always seem to look at this as network downtime, however as in the example above there are other downtime issues to be aware of and all of these can impact the business. Expanding the scope of network related issues can increase the perceived value of any networking tool. Faster time to resolution through the added visibility is a key contributor to reduced downtime. Tools that allow you to be proactive also have a very positive effect on downtime.


This seems rather self explanatory, however enabling helpdesk to be more self-sufficient through these tools can reduce the percentage of escalated tickets. These tickets typically carry a hefty price and also impact the escalations team to work on other issues.

Establish and Maintain Service Level Agreements

Many organization talk about SLAs and expect them from their carriers, etc. but how many are offering this to their own company? I'd argue very few do this and it is something that would benefit the organization as a whole. An organization that sees IT as an asset will typically be willing to invest more in that group. As network admins, we need to make sure we are providing value to the company. Predictable response and resolution times are a good start.

Impact on Staff

Unplanned outages are a massive drain on resources from help desk to admins to executives, everyone is on edge. These also often carry the financial impacts of overtime, consulting fees, etc. in addition to some of the intangibles like work/life balance, etc.

Can server monitoring be configured in a way that it is effective? Is there such a thing as a monitoring project gone right?  In my experience this is rare that a team gets what they want out of their monitoring solution, but rest assured it is possible with the right level of staffing and effort.


Monitoring Project Gone Right

As many of us know, server monitoring is very important to ensure that our business systems do not fail, and that our users are able to do their jobs whenever they need to.  When we are supporting hundreds and possibly even thousands of servers in our enterprises, it would be impossible to do this manually.  The right underlying system is the key to success.  When we are handed a pager (yes, there was a time when we all had pagers) we want to know that the information that comes through is real and actionable.  Throughout my entire career, I have worked only one place that I feel did monitoring really well.  I did not fall ill from being worn down and woken up from pages that were not actionable when I was on-call.  I could actually be certain that if my pager went off in the middle of the night, it was for true purpose.

Steps to Success

So what is the recipe for successful monitoring of your servers? Let’s take a look at how this can be done.

  • Make sure this is a real project with dedicated infrastructure resources.  This will not only allow for development of skill-sets, it will ensure that the project will be completed on a schedule.
  • Put together a Playbook which serves multiple purposes:
    • Provides a detail list of the server monitoring thresholds and commitments for your servers
      • Document any exceptions to the standard thresholds defined
    • Limit the number of core application services monitored to reduce complexity
    • Allows your application owners to determine which software “services” they will want monitored 
    • Allows the application owner to decide what action should be taken if a service fails (i.e. page application owner, restart service, page during business hours only)
  • Make sure you are transparent and work with ALL of IT.  This project requires input from all application owners to ensure that the server monitoring team puts it together properly.
  • Revisit the playbook on a predefined interval to ensure that the correct system monitoring and actionable response is still in place.
  • Refer to “Server Monitoring from the Real World Part 1” for some additional thoughts on this topic.

This may sound like a lot of work, but ensuring that every service and threshold monitored has an actionable response is imperative success in the long-term.  In the end, this approach will actually significantly reduce the amount of effort and resources required to ensure that monitoring is everything your business needs to run smoothly.


Concluding Thoughts

System monitoring done correctly is important for both the business and the engineers on your team.  When it is setup correctly with actionable responses, your team will not “tune out” their pages, and will ensure that the quality of service provided to the business is stellar.  Server and application uptime will also be at their best.

If you think you have a good handle on what’s available in the Cloud, think again. Not that I doubt your knowledge for one minute, but what I am sure of is the rapid pace of change in Cloud services, especially from Microsoft. The Cloud that you investigated 12 months or 6 months ago might now have service offerings that you’re not aware of. Nothing demonstrates this increased pace of technology change as well as Office 365 does. So let’s look at some of the new Office 365 features that have been introduced in the last 12 months. Because it’s a whole lot more than just hosted Exchange and SharePoint.


Delve – Search on steroids. Traditionally if you wanted to find some information, you’d go to the place where that kind of data was kept and use that product’s search function (eg within Outlook or File Explorer). Delve sits instead an Office 365 tenant and analyses all of the data you have access to in one search pane. Information is beautifully presented, whether it was in your mail file, a shared mailbox you have access to or a SharePoint document library. This breaks information out of silos and is especially handy for highlighting information provided by colleagues and stored in places you may not have thought to look (as long as you have access).


Sway – A new way to tell your story. Also available as a standalone product, Sway is now integrated into Office 365, appearing in the app picker. Labelled a ‘digital storytelling app’, some have wondered if this will bring an end to death by PowerPoint. Sadly, it lacks the ‘clicker’ integration for delivering a live presentation. What does shine is the automated information layouts that bring beautiful ‘readability’ without you needing to be a website developer.


Power BI – Display & query your data like never before. OK so this one’s a little older than 12 months and is also available as a standalone product. The Office 365 power comes from using this powerful tool to display stunning representations of your data real-time inside your SharePoint Online portal. Natural language query lets you filter and redisplay that data. You really need to see it to believe it, but image your sales team seeing live sales data plotted in 3D columns on a map of the country and being able to drill down to find the best selling city for widget X versus widget Y … all without touching a spreadsheet or ERP system.


Office 365 Groups – Groups, but not as you know them. This is truly an Office 365 innovation, not available anywhere else. They’re not security groups and they aren’t mailing groups. Office 365 Groups bring a collection of team resources to one place for you to access (emails, shared calendar, shared documents etc.) The most brilliant part is that a new person to the ‘team’ (Group) gets access to all of the historical data including previous emails sent amongst the group. This works well for team members that don’t all work in the same office, but is only available to users within your Office 365 tenant.


Office 365 Planner – Team task management. This is the newest baby, not available in even First Release tenants until next quarter. Office 365 Planner is what you get if Project and Pinterest had a baby. It’s not designed for complicated projects, but it provides a great team space for task creation, allocation and updates.


Mobile Device Management – BYOD fans are going to love this. While Mobile Device Management has been an Enterprise offering in the past, now an Office 365 subscriber can take advantage of some of these capabilities. Under the hood it will look strangely like Windows Intune (because it is). Bringing it to Office 365 makes it affordable for more organisations, but the killer feature is the ability to selectively wipe enrolled devices.  So when staff have an Office 365 license on their phone, not only can you enforce passcodes if you wish but you can delete all synced files and emails if their phone is lost .. without wiping out the family photos in their camera roll.


Additional administration and protection features. Office 365 now lets administrators configure self service password reset for users who have a secondary validation method (eg a mobile number or alternative email address). Custom branding is also available, to make people feel like they are logging onto one of the company’s systems. And data loss prevention has now extended beyond just email and is available in SharePoint Online and OneDrive for Business, suggesting or blocking information from being shared outside of your organization.


Any surprises in that list? Did it change your perception of Office 365? Can you see a use for any of the new features within your own organization?


One thing’s for sure, this is just a taste of how agile Microsoft is being with this product suite. We’re going to see many more new announcements over the next 12 months and the Office Blog is one of the best places to keep informed.



I'm really excited to be heading out to Columbus, Ohio for DevOps Days on November 18 and 19. In fact, SolarWinds is proud to be a gold sponsor of the event this year. Which means we have a few spare tickets for the event.


SO... if you want to come hang out for a couple of days with the coolest DevOps people in the MidWest, all you need is a selfie!


Post a picture of you using your favorite SolarWinds tool down in the comments to this message (not in a new thread, not on some other conversation. here in this message). The first 3 responses win themselves a ticket.


Remember, this is YOU using YOUR COPY of a SolarWinds tool. You aren't going to win by pulling up the demo site or with fancy Photoshop hacks. (I won't be fooled, but I will DEFINITELY be amused).


Good luck, and remember: pics or it never happened!


If you missed Part One of this series, you can find it here.


If you’re not prepared for the future of networking, you’re already behind.


That may sound harsh, but it’s true. Given the speed at which technology evolves compared to the rate most of us typically evolve in terms of our skillsets, there’s no time to waste in preparing ourselves to manage and monitor the networks of tomorrow. Yes, this is a bit of a daunting proposition considering the fact that some of us are still trying to catch up with today’s essentials of network monitoring and management, but the reality is that they’re not really mutually exclusive, are they?


In part one this series, I outlined how the networks of today have evolved from those of yesteryear, and what today’s new essentials of network monitoring and management are as a consequence. By paying careful attention, you will likely have picked up on ways the lessons from the past that I described helped shape those new essentials.


Similarly, today’s essentials will help shape those of tomorrow. Thus, as I said, getting better at leveraging today’s essentials of network monitoring and managing is not mutually exclusive from preparing for the networks of tomorrow.


Before delving into what the next generation of network monitoring and management will look like, it’s important to first explore what the next generation of networking will look like.


On the Horizon


Above all else, one thing is for certain: We networking professionals should expect tomorrow’s technology to create more complex networks resulting in even more complex problems to solve. With that in mind, here are the top networking trends that are likely to shape the networks of the future:


Networks growing in all directions

Fitbits, tablets, phablets and applications galore. The explosion of IoT, BYOD, BYOA and BYO-everything else is upon us. With this trend still in its infancy, the future of connected devices and applications will be not only about the quantity of connected devices, but also the quality of their connections tunneling network bandwidth.


But it goes beyond the (seeming) “toys” that users bring into the environment. More and more, commodity devices such as HVAC infrastructure, environmental systems such as lighting, security devices, and more all use bandwidth (cellular or wifi) to communicate outbound and receive updates/instructions inbound. Companies are using (or planning the use of) IoT devices to track product, employees, and equipment.


The explosion of devices which consume or produce data WILL, not might, create a potentially disruptive explosion in bandwidth consumption, security concerns, and monitoring and management requirements.


IPv6 Now… or sooner

ARIN reports that they have now depleted their IPv4 Free Pool. Meanwhile, IPv6 is enabled by default, and, therefore, is creating challenges for IT professionals—even if they put off their own IPv6 decisions. (Check out this article on VPNs’ insecurity and another on how to mitigate IPv6 attack attempts.)The upshot of all this is that IPv6 is a reality today. You need to learn about it, and be ready for the inevitable moment when switching over is no longer an option, but a requirement.


SDN and NFV and IPv6 will become the mainstream

Software defined networking (SDN) and network function virtualization (NFV) are just in their infancy and should be expected to become mainstream in the next five to seven years. With SDN and virtualization creating new opportunities for hybrid infrastructure, a serious look at adoption of these technologies is becoming more and more important.


So long WAN Optimization, Hello ISPs

There are a number of reasons WAN technology is and will be kicked to the curb in greater fervency. With bandwidth increases outpacing CPU and custom hardware’s ability to perform deep inspection and optimization, and with ISPs helping to circumvent the cost and complexities associated with WAN accelerators, WAN optimization will only see the light of tomorrow in unique use cases where the rewards outweigh the risks. As most of us will admit, WAN accelerators are expensive and complicated, making ISPs more and more attractive. Their future living inside our networks is certainly bright.


Farewell L4 Firewalling

With the mass of applications and services moving towards web-based deployment, using Layer 4 (L4) firewalls to block these services entirely will not be tolerated. A firewall incapable of performing deep packet analysis and understanding the nature of the traffic at the Layer 7 (L7), or the application layer, will not satisfy the level of granularity and flexibility that most network administrators should offer their users. On this front, change is clearly inevitable for us network professionals, whether it means added network complexity and adapting to new infrastructures or simply letting withering technologies go.


Preparing to Manage the Networks of Tomorrow 


So, what can we do to prepare to monitor and manage the networks of tomorrow? Consider the following:


Understand the “who, what, why and where” of IoT, BYOD and BYOA

Connected devices cannot be ignored. According to 451 Research, mobile Internet of Things (IoT) and Machine-to-Machine (M2M) connections will increase to 908 million in just five years, this compared to 252 million just last year. This staggering statistic should prompt you to start creating a plan of action on how you will manage nearly four times the number of devices infiltrating your networks today.


Your strategy can either aim to manage these devices within the network or set an organizational policy to regulate traffic altogether. Nonprofit IT trade association CompTIA noted in a recent survey, many companies are trying to implement partial and even zero BYOD policies to regulate security and bandwidth issues. Even though policies may seem like an easy fix, curbing all of tomorrow’s BYOD/BYOA is nearly impossible. As such, you will have to understand your network device traffic in incremental metrics in order to optimize and secure them. Even more so, you will need to understand network segments that aren’t even in your direct control, like the tablets, phablets and Fitbits, to properly isolate issues.


Know the ins and outs of the new mainstream

As stated earlier, SDN, NFV and IPv6 will become the new mainstream. We can start preparing for these technologies’ future takeovers by taking a hybrid approach to our infrastructures today. This will put us ahead of the game with an understanding of how these technologies work, the new complexities they create and how they will ultimately affect configuration management and troubleshooting ahead of mainstream deployment.


Start Comparison Shopping Now

Going through the exercise of evaluating ISP’s, virtualized network options and other on-the-horizon technologies – even if you don’t intend to switch right now –because it will help you nail down your particular requirements. Sometimes knowing a vendor has or works with technology you don’t need now, such as IPv6, but might later can and should influence your decision.


Brick In, Brick Out

Taking on new technologies can feel overwhelming to those of us with “boots on the ground,” because often the new tech becomes one more mouth to feed, so to speak. As much as possible, look for ways that the new additions will not just enhance, but replace the old guard. Maybe your new real-time deep packet inspection won’t completely replace L4 firewalls, but if it can reduce them significantly – while at the same time increasing insight and the ability to respond intelligently to issues – then the net result should be a better day for you. If you don’t do this, then more times than not, new technology will indeed simply seem to increase workload and do little else. This is also a great measuring stick to identify new technologies whose time may not yet have truly come, at least not for your organization.


At a more basic layer, if you have to replace 3 broken devices and you realize that the newer equipment is far more manageable or has more useful features, consider replacing the entire fleet of old technology even if it hasn’t fallen apart yet. The benefits of consistency often far outweigh the initial pain of sticker shock.


To conclude this series, my opening statement from part one merits repeating: learn from the past, live in the present and prepare for the future. The evolution of networking waits for no one. Don’t be left behind.

Learn from the past, live in the present and prepare for the future.

While this may sound like it belongs hanging on a high school guidance counselor’s wall, they are words to live by, especially in IT. They apply perhaps to no other infrastructure element better than the network. After all, the network has long been a foundational building block of IT, it’s even more important today than it was in the days of SAGE and ARPANET. Its importance will only continue to grow in the future while simultaneously becoming more complex.

For those of us charged with maintaining the network, it’s valuable to take a step back and examine the evolution of the network. Doing so helps us take an inventory of lessons learned—or the lessons we should have learned; determine what today’s essentials of monitoring and managing networks are; and finally, turn an eye to the future to begin preparing now for what’s on the horizon.

Learn from the Past

Think back to the time before the luxuries of Wi-Fi and the proliferation of virtualization, and before today’s wireless and cloud computing.

The network used to be defined by a mostly wired, physical entity controlled by routers and switches. Business connections were based on T1 and ISDN, and Internet connectivity was always backhauled through the data center. Each network device was a piece of company-owned hardware, and applications operated on well-defined ports and protocols. VoIP was used infrequently, and anywhere connectivity—if even a thing—was provided by the low-quality bandwidth of cell-based Internet access.

With this yesteryear in mind, consider the following lessons we all (should) have learned that still apply today:

It Has to Work

Where better to start than with a throw back to IEEE RFC1925, “The Twelve Networking Truths”? It’s just as true today as it was in 1996—if your network doesn’t actually work, then all the fancy hardware is for naught. Anything that impacts the ability of your network to work should be suspect.

The Shortest Distance Between Two Points is Still a Straight Line

Wired or wireless and MPLS, EIGRP or OSPF, your job as a network engineer is still fundamentally to create the conditions where the distance between the provider of information, usually a server, and the consumer of that information, usually a PC, is as near to a straight line as possible. When you forget that but still get caught up in quality of service maps, automated functions and fault-tolerance, you’ve lost your way.

An Unconfigured Switch is Better than the Wizard

It was a long-standing truth that running the configuration wizard on a switch was the fastest way to break it, whereas just unboxing and plugging it in would work fine. Wizards are a fantastic convenience and come in all forms, but if you don’t know what the wizard is making convenient, you are heading for trouble.

What is Not Explicitly Permitted is Forbidden

No, this policy is not fun and it won’t make you popular. And it will actually create work for you on an ongoing basis. But there is honestly no other way to run your network. If espousing this policy will get you fired, then the truth is you’re going to get fired one way or the other. You might as well be able to pack your self-respect and professional ethics into the box along with your potted fern and stapler when the shoe drops. Because otherwise that huge security breach is on you.

Live in the Present

Now let’s fast forward and consider the network of present day.

Wireless is becoming ubiquitous—it’s even overtaking wired networks in many instances—and the number of devices wirelessly connecting to the network is exploding (think Internet of Things). It doesn’t end there, though—networks are growing in all directions. Some network devices are even virtualized, resulting in a complex amalgam of the physical, the virtual and the Internet. Business connections are DSL/cable and Ethernet services, and increased use of cloud services is stretching Internet capacity at remote sites, not to mention opening security and policy issues since it’s not all backhauled through the data center. BYOD, BYOA, tablets and smartphones are prevalent and are creating bandwidth capacity and security issues. Application visibility based on port and protocol is largely impossible due to applications tunneling via HTTP/HTTPS. VOIP is common, also imposing higher demands on network bandwidth, and LTE provides high-quality anywhere connectivity.          

Are you nostalgic for the days of networking yore yet? The complexity of today’s networking environment underscores that while lessons of the past are still important, a new set of network monitoring and management essentials is necessary to meet the challenges of today’s network administration head on. These new essentials include:

Network Mapping

While perhaps a bit back-to-basics and also suitable as a lesson we all should have learned by now, when you consider the complexity of today’s networks and network traffic, network mapping and the subsequent understanding of management and monitoring needs has never been more essential than it is today. Moving ahead without a plan—without knowing the reality on the ground—is a sure way to make the wrong choices in terms of network monitoring based on assumptions and guesswork.

Wireless Management

The growth of wireless networks presents new problems, such as ensuring adequate signal strength and that the proliferation of devices and their physical mobility—potentially hundreds of thousands of network-connected devices, few of which are stationary and many of which may not be owned by the company (BYOD)—doesn’t get out of hand. What’s needed are tools such as wireless heat maps, user device tracking, over-subscribed access points and tracking and managing device IP addresses.

Application Firewalls

When it comes to surviving the Internet of Things, you first must understand that all of the “things” connect to the cloud. Because they’re not coordinating with a controller on the LAN, each device incurs a full conversation load, burdening the WAN and every element in a network. And worse, many of these devices prefer IPv6, meaning you’ll have more pressure to dual-stack all of those components. Application firewalls can untangle device conversations, get IP address management under control and help prepare for IPv6. They can also classify and segment device traffic; implement effective quality of service to ensure that critical business traffic has headroom; and of course, monitor flow.

Capacity Planning

Nobody plans for not growing; it’s just that sometimes infrastructure doesn’t read the plan we’ve so carefully laid out. You need to integrate capacity for forecasting tools, configuration management and web-based reporting to be able to predict scale and growth. There’s the oft-quoted statistic that 70 percent of network outages come from unexpected network configuration changes. Admins have to avoid the Jurassic Park effect—unexpected, but what in hindsight were clearly predictable outages is the bane of any IT manager’s existence. “How did we not know and respond to this?” is a question nobody wants to have to answer.

Application Performance Insight

Many network engineers have complained that the network would be stable if it weren’t for the end users. While it’s an amusing thought, it ignores the universal truth of IT—everything we do is because of and for end-users. The whole point of having a network is to run the business applications end-users need to do their jobs on. Face it, applications are king. Technologies such as deep packet inspection, or packet-level analysis, can help you ensure the network is not the source of application performance problems.

Prepare for the Future

Now that we’ve covered the evolution of the network from past to present—and identified lessons we can learn from the network of yesterday and what the new essentials of monitoring and managing today’s network are—we can prepare for the future. So, stay tuned for part two in this series to explore what the future holds for the evolution of the network.

(COMING SOON) Read “Blueprint: Evolution of the Network - Part Two.


IT's a MAAD World

Posted by kong.yang Employee Nov 3, 2015

This post originally appeared on SolarWinds Content HUB.


All around me are familiar faces

Worn out places, worn out faces

Bright and early for the daily races

Going nowhere, going nowhere...


And I find it kind of funny

I find it kind of sad

The dreams in which I'm dying are the best I've ever had

I find it hard to tell you,

I find it hard to take

When people run in circles it's a very, very

Mad world, mad world

AWS re:Invent 2015 reminds me of the lyrics from Roland Orzabal’s Mad World. The first verse is represented by traditional Enterprise IT as it struggles to transform and enable continuous service delivery and continuous service integration. The second verse encompasses the conversation that IT operations is having with itself to remove the tech inertia and adopt the DevOps culture as well as the conversation it is having with developers as IT professionals try to learn and live agile and lean.

The disruption from highly available, easy-to-use and easy-to-scale cloud services is making IT organizations run in circles to change itself all the while trying to harness that change into business value. It’s like IT is becoming a mad world; but it doesn’t have to be, as long your maad and not literally mad. Whether you are an IT professional, a DevOps engineer, or an application developer—you can never be MAAD enough in this mad world, this age of instant-applications. And by MAAD, I mean monitoring as a discipline.

So why leverage monitoring as a discipline in the age of instant-apps? Solarwinds Developer Evangelist, Dave Josephsen, said it best in his IT Briefcase article“Teams with the know-how to embrace metrics-driven development and scale their monitoring into their codebase, will spend less time mired…and more time building and running world-class systems that scale.” But not so fast you say, because you’re in IT ops and not a developer. Okay, no problem. As my friend and fellow SolarWinds Head Geek, Thomas LaRock, so eloquently puts it, you need to learn to pivot. And when you do, embrace the discipline that you’ve already matured your career with—monitoring.

Monitoring is the ideal discipline to bridge the gap from your premises to your clouds at your scale. I think of monitoring as a set of eight skills:

  1. Discovery – show me what’s going on.
  2. Alerting – tell me when something broke or is going bad.
  3. Remediation – fix the problem.
  4. Troubleshooting – find the root-cause.
  5. Security – govern and control the data, app, and stack planes.
  6. Optimization – run more efficiently and effectively.
  7. Automation – scale it.
  8. Reporting – show and tell to the management teams/business units.


The first four skills (DART framework) are covered in detail in a SolarWinds eBook that focused on virtualization. The last four skills will be covered in another SolarWinds eBook later this year or early next year. These skills apply to any IT professional, especially one looking to enable hybrid IT service models. Below is a figure of the DART framework:



Traditional IT organizations are embracing transformation, as evident by AWS’s continued simplification of cloud services for Enterprises to consume. Many organizations still face resistance internally to change and the rate of change associated with continuous delivery and continuous integration. At the same time, the disdain for IT professionals from the DevOps purist at THE cloud conference is still palatable. Some of it may be deserved for the years of IT roadblocks in the guise of rigor and discipline. Whatever the case, continuous service delivery and continuous service integration are the new realities for Enterprise IT. Dev is the new black.

So IT professionals, take ownership of your premises, your clouds, and your scale with monitoring as a discipline. It’s definitely not all quiet on the cloudy fronts. The storms of continuous change are brewing and IT professionals need to stay ahead of the game. If you’re in the calm, the storm is already upon your organization and disruption is about to be forced upon you.

I’ll end with words from a highly distinguished monitoring engineer who’s always on the leading edge of tech, Adrian Cockcroft. Adrian says that the CIO (and in turn their IT professionals) has three key goals:

  1. Align IT with the business
  2. Develop products faster
  3. Try not to get breached

That all three goals can be achieved with monitoring as a discipline is just utter maad-ness!

Is it possible for monitoring of your servers to be really effective? Or have they been configured in a way that is just white noise that you have come to ignore?  Server monitoring is imperative to ensuring that your organization functions optimally, and minimizes the number of unanticipated outages.


Monitoring Project Gone Wrong

Many years ago when I started with a new company, I was handed a corporate “flip phone”.  This phone was also my pager. When I was on-call for the first time I was expecting that I was going to only be alerted when there was an issue.  WRONG!  I was alerted for every little thing day and night.  When I wasn’t the primary point person on-call I quickly learned to ignore my device, and when I was on-call I was guaranteed to get some form of illness before the end of the week.  I was worn down from checking every little message on my pager all night long. Being the new member of the team, I first observed, but soon enough became enough.  Something had to change; so we met as a team to figure out what we could do.  We were all ready for some real and useful alerting.


Corrective Measures

When monitoring has gone wrong, and the server monitoring needs to change what can be done?  Based upon that incident it became very important to pull together a small team to spearhead the initiative and get the job done right.


Here is a set of recommendations on how monitoring configured wrong could be turned into monitoring done right.

  • Determine which areas of server monitoring are most important to infrastructure success and then remove the remaining unnecessary monitoring.  For example, key areas to monitor would be disk space free, CPU, memory, network traffic, and core server services.
  • Evaluate your thresholds in those areas defined as primary, and modify the thresholds according to your environment.  Often times the defaults setup in monitoring tools can be used as guidelines, but usually need modification for your infrastructure.  Even the fact that a server is physical or virtual can change the thresholds required for monitoring.
  • Once evaluation is complete, adjust the thresholds for these settings according the needs of your organization.
  • Stop and evaluate what is left after these settings were adjusted.
  • Repeat the process until alerting is clean and only occurs when something is deemed necessary.


As the process is repeated, the exceptions will stand out more and can be implemented more easily.  Exceptions can come in the form of resources spiking during overnight backups, some applications inherently requiring exceptions due to their nature of memory usage (e.g. SQL or Microsoft Exchange), or as simple as monitoring of different server services depending on the installed application.  Continual refinement and repetition of the process ensure that your 3am infrastructure pages are real and require attention.

Concluding Thoughts

Server monitoring isn’t one size fits all and these projects are often large and time consuming.  Environment stability is critical to business success.  Poorly implemented server monitoring does impact the reputation of IT, so spending the appropriate amount of time ensuring the stability of your infrastructure becomes priceless.

Filter Blog

By date:
By tag: