A Lap around AppStack

Posted by joeld Jan 30, 2015

What is AppStack?


AppStack is a new technology that brings multiple SolarWinds products together in a new innovative way. AppStack provides visibility to the different infrastructure elements that are supporting an application deployed in production, but most importantly how they are related to each other.


For instance, a typical multi-tier application will be deployed on multiple virtual machines located on one or more hypervisors and accessing one or more datastores. Each of those components uses some level of storage that can either be direct or network attached.



AppStack allows an application owner or IT operator to quickly identify the key infrastructure elements that might be impacting application delivery.


The Orion Platform


Let’s take a quick look at how SolarWinds products are architected in the first place. SolarWinds has a large portfolio of products that can be separated into two main categories. The first category contains all the products primarily focused on monitoring various parts of the IT infrastructure and run on top of the Orion Platform. The second category is composed of all the other tools that are not running on top of Orion. AppStack is related to the products belonging to the first category, such as Server & Application Monitor (SAM), Virtualization Manager (VMan), and Storage Resource Monitor (SRM).


The Orion platform provides a core set of services that are used by the various products that run on top of it. Some of the most important services exposed by the platform are alerting, reporting, security, UI framework, data acquisition, and data storage

as shown in the following diagram:



The products that run on top of Orion are plug-ins that take advantage of those services and are sold individually.


How does AppStack work?


AppStack is not a new product by itself, but rather, it extends the capabilities of the Orion platform in various ways, allowing all the products running on top of Orion to take advantage of those new features.


One of the key enablers of AppStack is the information model that is being maintained by the Orion platform. In the background, Orion maintains a very rich representation of how the entities monitored by the various products relate to each other. This information model is maintained by the Information Service which exposes a set of APIs (SOAP and Restful) to allow easy access to this information. The information model has been a key foundation of each individual product for several years and is now being leveraged to another level by AppStack.


As the following diagram shows, the metadata model is extended and populated by the different products installed on top of Orion. By default, Orion ships with a very minimal model. Its sole purpose is to serve as the foundation of the product models. One key aspect of the model is that it does not require different products to know about each other, but still allows them to extend each other as they are purchased and installed by our users over time. For instance, Server & Application Monitor does not need to know about our Virtualization Product in order for the Application entity to be related to the Virtual Machine entity.





Having a good set of entities is a good start, but the key to success is to ensure that the relationships between those entities are also captured as part of the model. AppStack exploits the presence of those relationships in the information model to tie together the different pieces of infrastructure that support an application.

In the AppStack dashboard, every time the user clicks on one of the entities displayed, behind the scenes AppStack retrieves the list of entities that are directly or indirectly related to that selected entity and highlights them in the user-interface. This provides a very intuitive way of troubleshooting application delivery issues.


An important point to mention is that all of this happens without any prior configuration of the user.


This blog post provided you a quick overview of what AppStack is and how it works behind the scenes. If you have any questions, don’t hesitate to reach out to me on twitter at @jdolisy.

Dare to Thrive is the theme at VMware Partner Exchange (PEX) 2015. And that theme is reflected by the BIG online event, the new beginning, and the PEX experience.



First off, there is VMware's Online Launch Event on February 2nd. Tech buzzwords like software defined data centers (SDDC) software defined networking (SDN), software defined storage (SDS), containers such as Docker, Kubernetes, CoreOS Rocket, and App Volumes aka cloud-native apps, and hybrid cloud will almost surely be used. VMware's been talking about and demoing some of these technologies for years. It'll be interesting to see how they extend their portfolio now that they have an update to their big fundamental, VMware vSphere.

New Beginning

Back for our 3rd season with a new performance-centric focus is So Say SMEs in Virtualization and Cloud with Todd Muirhead, Sr. Staff Engineer in Performance R&D at VMware and me. We're just two virtualizing, cloud sizing, and performance optimizing subject matter experts (SMEs), who happen to enjoy sports.



VMware PEX is more signal and less noise than VMworld and aims to provide VMware partners with the means to succeed. I'm expecting better engagements and more personalized experiences from the following:

  • New Hands-on-Labs - An absolute must if attending any VMware event.
  • NDA sessions and product briefings including invites to the vCloud and the Software-defined Storage events.
  • Solutions Exchange conversations with attendees and fellow VMware partners.
  • Community events like #vBacon and #vExpert tweet-up.


In closing

I look forward to catching up with friends and making new connections. Follow my updates on Twitter with hash tags: #VMwarePEX #SolarWinds. If you want to meet up with me at VMware PEX to discuss Virtualization Manager or any other SolarWinds products, just send me a message on Twitter - @kongyang.

OR: "Don’t just sit there, DO something!"


If you have used a monitoring tool for any length of time, you are certainly comfortable setting up new devices like servers, routers, switches and the like. Adding sub-elementsdisks, interfaces, and the like are probably a snap for you. There’s a good chance you’ve set up your fair share of reports and data exports. Alerts? Pssshhhh! It’s a walk in the park, right?


But what do you DO with those alerts?


If you are like most IT Professionals who use a monitoring tool, you probably set up an email, or a text message to a cellphone, or if you are especially ambitiousan automated ticket in whatever incident system your company uses. And once that’s set up, you call it a day.


Your monitoring will detect an error, a notification will be sent out, a human will be launched into action of some kind, and the problem will (eventually) be resolved.


But why? Why disturb a living, breathing, working (or worse – sleeping) person if a computer could do something about the situation?


The fact is that many alerts have a simple response which can often be automated, and doing sotaking that automatic actioncan save hours of human time.


Here are some examples of direct action you can take:


A monitor triggers when

Have automation do this

A service is down

Attempt to restart the service

A disk is over xx% full

Clear the standard TEMP folders

An IP address conflict is detected

Shut down the port of the newer device


If the action is not successful, most monitoring systems will trigger a secondary action (that email, text message, or ticket I mentioned earlier) after a second wait time. (Pro Tip: If your monitoring solution doesn’t support this, it may be time to re-think your monitoring solution).


At worst, your alert will be delayed by a few minutes. BUT it will be delayed by having done (instantly) what the human technician was going to do once they logged in, so in a sense the situation is more than a few minutes ahead of where it would be if you had let the human process proceed as normal.


But that’s not all. Another action you can take is to gather information. Many monitoring tools will allow you to collect additional information at the time of the alert, and then “inject” it into the alert. For example:


A monitor triggers when

Have automation do this

CPU utilization is over xx%

Get the top 10 processes, sorted by CPU usage

RAM utilization is over xx%

Get the top 10 processes, sorted by RAM usage

A VM is using more than xx% of the host resources

Include the VM name in the message

Disk is over xx% full (after clearing temp folders)

Scan disk for top 10 files, sorted by size, that have been added or updated in the last 24 hours


Sounds lovely, but is this really going to impact the bottom line?


For a previous client, I implemented nothing more sophisticated than the disk actions (clearing the Temp drive, and alerting after another 15 minutes if the disks were still full) and adding the top 10 processes to the high CPU alert.


The results were anywhere from 30% to 70% fewer alerts compared to the same month in the previous year. In real numbers, this translated to anywhere from 43 to 175 fewer alerts per month. In addition, the support staff saw the results and responded FASTER to the remaining alerts because they knew the pre-actions had already been done.


The CPU alerts obviously didn’t reduce, but once again we saw the support staff response improve, since the ticket now included information about what specifically was going wrong. In one case, the client was able to go back to a vendor and request a patch because they were able to finally prove a long-standing issue with the software.


As virtualization and falling costs (coupled, thankfully, with expanding budgets) push the growth of IT environments, the need to leverage monitoring to ensure the stability of computing environments becomes ever more obvious. Less obvious, but just as critical (and valuable) is the need ensure that the human cost of that monitoring remains low by leveraging automation.


NOTE: This is a continuation of my Cost of Monitoring series. The first installment can be found here: The Cost of (not) Monitoring.

Part two is posted here: The Cost of Monitoring - with the wrong tool (part 1 of 2) and here: The Cost of Monitoring - with the wrong tool (part 2 of 2)

You may have noticed a trend these past few weeks with my recent posts such as;

Logging; Without a complete picture, what’s the point?

Troubleshooting vs Compliance Security; Logging without borders?

Are you Practicing Security Theater in IT


Alright, sure you’ve noticed a trend, I’ve been talking a lot about logging, the importance of an audit trail and the overwhelming importance of security while at the same time how security is more a masquerade ball of promises and less guarantees.    But what does this have to do with Taylor Swift?


(Note: This is the real Taylor Swift and not the Infosec Focused Taylor Swift @SwiftOnSecurity)


Every organization has customers, that’s how we do business, the internet is no different.  If you’re say a Twitter or Instagram (which considering Facebook owns Instagram, we may as well say Facebook) and one of your top customers; In the case of Twitter out of your 284 Million users, one of your users (Taylor Swift) is your Fourth largest account.   If that customers account were compromised, information leaked, relationships tarnished, would that look good for your business? Hardly.


But what can we do about things like this?  Twitter recently ‘bolstered’ their security by quietly introducing a Two Factor Authentication model which didn’t even make a blip on the horizon.   But you might be saying, I don’t use twitter, or I don’t care about twitter, or my business doesn’t rely upon it. (We won’t go into the number of faux paux’s in the very recent past by rogue or accidental tweets horribly tarnishing brands) instead, how about something a little closer to home …


Hey, did you realize that JPMC - You know, Chase bank suffered a compromise of 83 Million accounts back in July of 2014? I’m a huge Chase user, I strangely didn’t get any kind of ‘notification’ from them on this… (although I now receive daily fake chase bank spam messages...)


Now what I have to ask you is, if organizations with multi-million dollar security budgets like Chase, with 10s of millions of customers/users, and online ‘designed for online’ organizations like Twitter with hundreds of millions of customers cannot protect our data, our information, protect us from compromise.   What can we do?


I’ll tell you a few things I think we can do:


- We can start getting serious with our security, the security of our systems, of our customers data and our data

- We can get active systems like IPS’s and intelligent firewalls for East-West and North-South traffic vs Perimeter security which is Port or Protocol based.

- We can ensure that if we do have extra security measures, be it two factor or certificate based authentication that it is leveraged by our customers, employees, assets.

- We stop practicing security theater, treating it as an afterthought and living by the check-box.



This is definitely an iterative list, one which will grow as we discover more about our environments and ourselves. 

What are some things you're seeing people do to tackle and protect against threats like these?

I’d love your feedback and contributions so we can all grow with finding better ways to handle measures like this.

We’ve all had that situation where a member of your Web Apps team complains that an Intranet based site is slow or an important manager says that they can use the internal expenses system hosted on a web server but what does everyone tend to do in these situations. I’d say that most of us would open our monitoring tool of choice and look at some website perfmon stats of some response time stats from a tool that can fetch this but sometimes these tools do not give us the user experience.


If you have a widely dispersed company and have employees sitting in other parts of the world accessing an internal site hosted in the UK, how do we know one person’s performance from another and conceptually measure the differences ?


I’d like you to think about these scenarios and let me know how you approach the reactive situations. I would also like to know if any of you are proactively measuring website performance and user experience and how you are doing this ?


Indeed, some of you maybe using Solarwinds products to do this but nevertheless I’d like to know what software you use and whether it has proven successful at your companies.


If you don’t bother to monitor performance, speeds and feeds of your intranet or internet website then I’d like to know why not ?


Don’t be shy and get those comments in.

Quite a few pundits including myself have mentioned that 2015 will be the year of hybrid cloud. But what exactly is hybrid cloud? Ask one hundred IT Pros and you’ll probably get one hundred differing answers. Check out some of the comments on the hybrid cloud post by our thwack Ambassador, amitace, aka @amitpanchal76 on Twitter.


One way to think of hybrid cloud is that some IT services will be fulfilled and supported by the in-house IT department and some IT services will be fulfilled and supported by someone outside of the in-house IT department. These IT services can be infrastructure, platforms, and software. And they can be requested and consumed at any given time from any given place. The preferable means of consumption is via a self-service portal with an associated marketplace, which aggregates all the IT services. Companies that have already adopted hybrid cloud strategies include Coca-Cola and Anthem.


Hybrid cloud matters to IT admins, even if your company currently has no plans for IT services fulfilled and supported by someone outside of your IT department mainly because it will be great for your IT career. Think of it as an investment in yourself. The IT learning curve remains unchanged: (1) understand the technologies, (2) develop and hone the requisite skills, and (3) maximize your utility to the business.


It’s never a bad idea to associate yourself with a technology trend that has research and development, plus capital investment legs. The last time I saw a similar technology trend to the hybrid cloud trend was when virtualization arrived. So ask yourself do you want a new IT career path? If so, hybrid cloud will provide you plenty of opportunities in 2015. And that’s why it matters.

Microsoft is preparing to showcase Windows 10: The Next Chapter this week. VMware is running an online launch event during the week of February 2nd, which happens to coincide with VMware Partner Exchange 2015. What does this mean for IT pros? In the near term, probably not much beyond understanding and testing out new features in controlled environments. Longer term, it means that IT pros will have to decide on if and when they upgrade their current environments to the latest releases.


Microsoft Windows 10VMware Partner Exchange 2015


What are your thoughts and expectations of Microsoft Windows 10 announcement and VMware vSphere’s Big Launch announcement? How will these product launches affect your IT administration mindset especially around monitoring, troubleshooting, and reporting? Please share your point-of-view in the comments below.


The Microsoft Windows 10 event can be viewed on January 21st at 11AM CST here: Windows 10.

Register for the VMware Big Online Launch event on February 2nd at 3-4PM CST here: VMware Online Launch Event.

This year and the previous, we have heard from various cloud providers that 2015 could be the year to take on a Hybrid Cloud approach in your organisation. With all that being said, we know that Cloud models exist in many guises and the various models exist to offer organisations a variety a choice in their deployment methodology. Stretching out workloads to the Cloud offers varying security considerations as well as understanding of the SLA’s in order to provide your business with the reassurance that production workloads are available.


I’m thinking that most shops are using Hybrid for Test / Dev initially but I’d love to hear how you are using it in your organisation or what investigations you have kicked off to see if this model can benefit your company.


What would you need to do in order to convince the executives in your company that Hybrid Cloud could be a good fit for your organisation ? I can see that the value from the developer community is a high one and to burst workloads into a cloud could be attractive when you have limited resources available but are you faced with red tape when it comes to deployment ?


If your business is an International business do you find that you are having more challenges in order to start to seek hybrid cloud opportunities ?


Finally, for those of you that are actively using a hybrid cloud setup, let me ask you this. Do you feel that your management has enough understanding of the topology and the implications of disasters to a single or multiple servers ?

A while ago, I bumped into one of my ex-colleagues who now works as a traveling network consultant for a network services company. Among the various topics we discussed over a cup of coffee, photography and networking took the center-stage. My ex-colleague, being always on the move and meeting various clients, also takes great pictures with his favorite smartphone. What he really likes is the geo-tagging feature that documents which pictures he took at a specific place or list of places. He also compares this nifty feature with his network mapping tool that has been his best friend in this line of work—discovering, documenting, analyzing, and troubleshooting networks!


Ping, tracert, and ipconfig are still the most used commands when understanding interconnectivity. But, in this day and age when everything is delivered in an instant, using a manual method of mapping a network, be it simple or complex, looks ‘stone-age’, and can be ruthlessly time consuming.


The usual suspects – still a headache!


He then went on to explain that studying and documenting network infrastructure for his clients, especially in a time crunch, is often extremely difficult. The most common reasons for this were that the clients:

  • Didn’t know the importance of network documentation
  • Didn’t have their network mapped and documented
  • Used a network diagram they had on Visio® that didn’t match the current setup, or was out-of-date
  • Had their network mapped in multiple formats—making it extremely difficult to collate, validate, and standardize into one master document
  • Had huge gaps in network documentation—probably, an administrator left, and the person that filled-in didn’t continue with the documentation


If the above is true for a small, static network, it isn’t much of a pain. However, it becomes a major headache when a company has to merge into another network or is planning for network expansion.


For example, one of the clients that my ex-colleague worked with had a 300 node network, but the Visio® map that was provided by the client had documented only 120 of them, and was 7 months old. Since my ex-colleague had access to the network mapping software in his laptop, all he had to do was to enter the IP address range, scan and discover the network, customize the maps and then export it to Visio® and PNG, as the client wanted in both the formats.


So, what will give you a head start if you want to accomplish network documentation and mapping, the easy way?


Automate discovery – that’s exactly where you must start!


Network mapping is a three-step process in itself:

  1. Network discovery – Manual or automated—knowing what’s where
  2. Mapping – knowing how devices are connected
  3. Customization, presentation, and reporting—the way you like it


Network consultants, like my ex-colleague, would appreciate an automatic network discovery and diagram software that can automatically discover all devices in the network and map through a variety of discovery methods like SNMP, CDP, ICMP, WMI, or any other standards. For the purpose of documentation, labeling and presentation, the maps can be exported to a variety of formats, including Visio, PNG, and PDF. Exporting the maps to Visio can become handy, especially if the network engineer wants to play with the map and bring it up to his or her liking.


The last question I had for my ex-colleague was “Has mapping with whiteboards and paper become a resource drain? Should we still be doing things the same way we did a decade ago and save costs here and there? Or is it time for more companies to automate network mapping and diagramming?” Of course he believed that network consultants and engineers cannot afford to be laid back, and automating network mapping and documentation is the way to go.


How about you? If you still think whiteboard/paper or any other manual mapping still cut it, give us a holler.

Or how SolarWinds saved a company $1 million a year, monitored more than we ever expected, and made me look like a miracle worker.

(Continued from Part 1)


The company wanted to expand monitoring with their current solution to every single device in the enterprise (roughly 10,000 systems)a task that would have required an additional $2 million in extra licenses, and would have increase yearly maintenance to $1 million.


I saw it as my first order of business to stop using this expensive-to-own, expensive-to-maintain tool like peanut butterspreading it across every surface of the enterprise. For those tasks that needed to be everywhere (ping, simple hardware monitoring, etc) we needed a less expensive tool that could scale without breaking the bank.


We evaluated 6 alternatives from both open-source and traditional vendors. We found that just one came in with a solution under $800,000SolarWinds. Not only that, but in most cases, two SolarWinds modules (Network Performance Monitor and Server & Application Monitor) covered more features than the competition. But the most interesting part was the price tag: Year-one purchase price would be less than 10% of the cost to implement a solution with the incumbent vendor. Maintenance would be about 6% of the predicted cost if we kept the current solution.


These numbers caught the attention of the CIO and the purchasing team. The monitoring group was asked repeatedly if we had somehow evaluated the wrong tools, if we were ordering the wrong number of licenses (we were ordering "unlimited" so that was not the case), if we misunderstood the vendor's quote. They were unable to comprehend how the SolarWinds quote was so low. We finally had to level with them: "Maybe this is normal pricing, and you've just gotten used to being over-charged all this time?"


Once we were able to assure everyone that this was indeed the real deal, the CIO did what companies stuck in a dollar-auction mentality never can: she announced that it was time to walk away. Because the licensing terms of the old tool were so restrictive, we would transition to SolarWinds for all "foundational" monitoringavailability, hardware, and applicationand look for other tools to use in niche areas, be it real-time user experience monitoring, closed-architecture applications, or other special purposes.


The installation and migration was performednot by a team of contractors over 18 months (which is what the previous tool had required), but by existing staff, in parallel with their daily duties on the old system, in just 6 months.


On the day we turned monitoring on, it was doing more than the old systemavailability and hardware for 10,000 systems, plus WAN link monitoring and more. In the first 3 months our team was able to develop monitors that application and server owners had been requesting for years.


Monitoring has become an extremely specialized space. You can’t monitor your virtualization infrastructure the same way you monitor a router. Load Balancers require a different set of metrics than a storage array. Trying to force a tool which either wasn’t designed for the task or which hasn’t kept up with the times is not just a matter of “making do with what you have” and calling it “being budget conscious.” In fact, doing so canand for this customer DIDcost the business an order of magnitude more, as well as not providing the functionality needed.

We’ve all seen the action of Security Theater before, We walk into a building where they interrogate our identification and ask for proof of blood samples or DNA yet the loading dock is unmonitored, or we’re asked for numerous forms of identification over the phone and still allowed to let slide even without that information, or the best example; We get the complete shakedown at the airport while passing through the security checkpoints while so many other entry points go unchecked and unmonitored.


But we’re not here to discuss physical security, no.  We’re here to discuss the security theater that we see in IT organizations every day.   Security theater is masqueraded under so many masks such as from our previous discussions of “TROUBLESHOOTING VS COMPLIANCE SECURITY; LOGGING WITHOUT BORDERS? and LOGGING; WITHOUT A COMPLETE PICTURE, WHAT’S THE POINT?.     For some organizations these are merely checkboxes, but you have to ask yourself, are you safe if you don’t know who’s going through which doors at which times, and if that information is not logged?   That was the importance of the discussion of knowing what is logged, how deeply you interrogate those logs, and how long you retain that information.


A lot of us focus on Network availability, Network Resiliency thus the need to restrict and restrain what information we collect, yet just like any building, call center or airport, if all of the points of entry are not secured or at least monitored it’s all a ruse driven based upon hope and trust instead of truly securing the facility.


Protecting the security of your organization isn’t merely limited to knowing what is being logged, where it is being logged, how often it is being retained and equally knowing what information isn’t being collected or logged.   It covers numerous areas a few of which below (While not being an exhaustive list)


- Monitoring of vendor controlled and embedded devices

   Did you know that embedded devices which sit on your network which you technically have no ‘control’ over (think GE, heart rate monitors and MRI machines in hospitals for example) still need to be patched and monitored, even though you don’t really have that ‘insight’ into them?

- East West and North South traffic

   With threats constantly on the rise, we’ve done an okay job of watching traffic as it enters the Edge and allowing/disallowing it as it traverses North to South, but once the traffic is already in the network, most organizations not only have no visibility nor do they have any real awareness of what is going on.  This has become the greatest threat vector to get a grip upon.

- Actionable Threat Intelligence

   In the event of a breach, how quickly can you identify and mitigate it?  Thought of as a different way, if you have an Intrusion Detection System (IDS) and a breach does occur, what can you do about it? Knowing a criminal is inside your house does nothing if you don’t have any way to handle it once breached.



It’s a shame there is no one panacea to resolve these kinds of threats, risks and dangers which plague our environments every single day.

What are some other areas you see IT Departments practicing security theater?   Do these examples resonate with you, and what do you think we can do about it?

What a wonderful week of virtualization goodness! It started out with a webinar covering three skills (monitoring, troubleshooting, and reporting) that every virtualization admin needs to hone and combine with one tool to become the master of their virtualized universe. The recorded webinar including short presentation, demo, and Q&A is below. And it starts at the demo portion.



Next, I was humbled and honored to be formally announced as the Virtualization Head Geek including my own presser. The icing on top of the cake was hosting the Tech Field Day crew and the delegates for Virtualization Field Day 4 at the SolarWinds campus. Absolutely loved the conversations. Below is a side-by-side P2V of the Virtualization Field Day 4 folks.




[UPDATED VFD4 delegate's posts]:

VFD4 DelegateBlog Post
Amit Panchal How SolarWinds aims to offer a simple perspective - VFD4


Virtually yours,


The New Head Geek of Virtualization

or “It’s not just the cost of buying the puppy, it’s the cost of feeding the puppy that adds up”


This is a story I've told out loud many times, on conference calls, and around water coolers. But I've never written it down fully until now. This is the story of how using the wrong tool for the job can cost a company so much money it boggles the mind. It's a story I've witnessed more than once in my career, and heard anecdotally from colleagues over a dozen times.


Before I go into the details, I want to offer my thoughts on how companies get into this situation.


The discipline of monitoring has existed since that first server came online and someone wanted to know if it was "still up." And sophisticated tools to perform monitoring have been around for over two decades, often being implemented in a "for the first time" manner at most companies. Some of it has to do with inexperience. For example, either the monitoring team is young/new and hasn't experienced monitoring at other companies, or the company itself is new and has just grown to the point where it needs it. Or there's been sufficient turn-over, such that the people on the job now are so removed from those that implemented the previous system, that for all intents and purposes the solution or situation at hand is effectively "new."


In those cases, organizations end up buying the wrong tool because they simply don't have the experience to know what the right one is. Or more to the point…the right ONES. Because monitoring in all but the smallest organizations is a heterogeneous affair. There is no one-stop shop, no one-size-fits-all solution.


But that's only part of it. In many cases, the cost of monitoring has bloated beyond all reason due to the effect known as "a dollar auction". Simply put, the barrier to using better tools is the unwillingness to walk away from all the money sunk into purchasing, deploying, developing, and maintaining the first.


And that leads me back to my story. A company hired me to improve their monitoring. Five years earlier, they had invested in a monitoring solution from one of the "big three" solution providers. Implementing that solution took 18 months and 5 contractors (at a cost of $1 million in contractor costs, plus $1.5million for the actual software and hardware). After that, a team of 9 employees supported the solutionsetting up new monitors and alerts, installing patches, and just keeping the toolset up and running. Aside from the staff cost, the company paid about $800,000 a year in maintenance.


With this solution they were able to monitor most of the 6,000 servers in the environmenta blend of windows, Unix, Linux, and AS400 systems; and they could perform up/down (ping) monitoring for the 4,000 network devices. But they encountered serious limitations monitoring network hardware, non-routable interfaces, and other elements.


Meanwhile, the server and application monitoring inventorythe actual monitors, reports, triggers, and scriptsshowed signs of extreme "bloat." They had over 7,000 individual monitoring situations, and around 3,000 alert triggers.


This was the first company where the monitoring and network teams weren't practically best friends and even the server monitoring was showing signs of strain. Some applications weren't well-monitored either because the team was unfamiliar with it, or because the tool couldn't get the data needed.


Part of the problem, as I mentioned earlier, was that the company had invested a lot in the tool, and wanted to "get their money's worth." So they attempted to implement it everywhere, even in situations where it was less than optimal. Because it was shoehorned into awkward situations, the monitoring team spent inordinate amounts of time not only making it fit, but keeping it from breaking.


KEY IDEA: You don't get your money's worth out of an expensive tool by putting it into as many places as you can, thereby making it more expensive. You get your money's worth by using each tool in a way that maximizes the things it does well and avoids the things it does not do well.


NOTE: This is a continuation of my Cost of Monitoring series. The first installment can be found here: The Cost of (not) Monitoring

Stay tuned for part 2, which I will post on January 20th to see how we resolved this situation.


edit LJA20150116: forgot to include the link to explain a dollar auction.

Last week we had a very interesting discussion around Logging and its importance and effectiveness in an organization (re: LOGGING; WITHOUT A COMPLETE PICTURE, WHAT’S THE POINT? )


And two common themes emerged from the conversation; Being able to see the land through the sea of logging information (Let’s call it Lighthousing) and being able to determine what to collect and filter out the ‘fluff’ so you don’t drown in a sea of logs (We’ll call that Drowning :))


When it comes to troubleshooting the risk of drowning all too apparent, we as IT Administrators, Operations Support and Troubleshooters gain value from *having* the data there, but we also require the ability to see things from 10,000 feet as well as within an inch from where it stands.   That said though, we don’t *always* need to be able to see an inch worth of data while 10,000 feet above.    So the ability to be able to index, analyze and correlate that data is more important than ever as opposed to reading logs in a serial fashion.   For example in that last discussion commenter bspencer63 made note of intelligent proactive SIEM solutions which help with log filtering, grouping, alerting and more; solutions like these are not only useful, they’re essentially required if we’re going to get anywhere in this vast sea of information we’ll continue to be drowning in.


Why will we be drowning in this information though?  Because as much as there is an inclination to scale back what information and logs we send to a SIEM in the effort of filtering out the noise (Trying to diminish the sea of information so we can see land easier) Regulatory Compliance is asking us to  collect more logs from even more disparate devices than we’re collecting from today at an even greater detail than we’re used to consuming.   Consider collecting hundreds of thousands of logs a second today, to hundreds of millions of logs a second tomorrow.   Being able to consume and ingest that information is the responsibility of our SIEMs or solutions we implement, being able to interpret and analyze that information falls on us the Practitioner.


The question of whether we need to collect logs is a given, we have to whether we want to or not.  But the ability to use that information, to be able to filter on what it relevant to the task at hand, to be able to be proactive and keep ahead of problems before the end-users see them has never been more important than now.  


What kind of tools  and methods are you using to wade through the sea of information in your organizations today?  What do you find particularly effective and overwhelmingly ineffective?

The holidays are over and your work force is back at their deskblaming the IT team for whatever isn’t work or is slow. While most application performance issues can be blamed on the application itself, there can be other factors too, like an edge or core device behaving badly, server faults, or just another low bandwidth issue. And there are times when even the good old FTP fails too and you have no idea what’s wrong.


Here is a quick list of things to check for when you are at a dead end:

IP conflicts: Unfortunately, you are not always notified by your system in the event of an IP conflict. If the conflicting device is a rogue network device or an OS that cannot resolve an IP conflict by itself, the result would be intermittent connectivity. This happens because all devices with the same IP respond to ARP requests sent by a switching or routing device. So, during a data transfer, some of the conversation packets will go to one device while a few packets will go to the other device resulting in intermittent connectivity.


Solution? Use a ‘user device’ tracking tool or an IP conflict detection tool.

MTU: This is the largest possible size for a PDU that the communication layer can pass forward and this is set to 1518 units for Ethernet version 2 networks. But in cases when a router receives a large packet, it will either fragment the packet, or drop the packet if the DF (Don’t Fragment) bit has been set. It will also send an ICMP error back to the transmitter about the packet being too large. If your application chooses to ignore the error or your network somewhere blocks the ICMP error sent by the router, the application will continue to send large PDU thereby impacting performance. The issue is usually seen in scenarios where VPN is involved because the encapsulation load causes the MTU to exceed 1500 bytes.


Solution? Use ping or traceroute to find the MTU your router interface can forward and set that MTU on your device. And don’t forget to make sure that the ICMP error messages about MTU are not being blocked anywhere in your network.

Auto-Negotiation or Duplex mismatch: This one can be controversial but still, here we go. While there are network admins who always hard set the speed and duplex on each interface of their networking device, there are others who believe that auto negotiation issues are a myth and will never happen. In reality auto negotiation can fail, but when? When the cabling is bad, the devices in question are obsolete, cheap, or simply because one of the devices is set for auto negotiation and the other is forced. Hard setting the duplex to an interface can also cause an issue when two connected devices are set at different duplexes. The end result is an impact on performance because of packet retransmissions and a high number of errors on the affected ports.


Solution? Check for errors and retransmissions with an NMS or use auto-negotiation on all your devices. And don’t forget, when it comes to Gigabit Ethernet, auto-negotiation must be used.

TCP windowing: When there is a slow connectivity issue, the first step for many organizations is to throw expensive bandwidth at the problem. But there is the TCP window size that many admins forget about. While window scaling is an available solution, some routers and firewalls do not properly implement TCP window scaling, thereby causing a user's Internet connection to malfunction intermittently for a few minutes. When transferring large files between two systems, if the connection is slower than what it should be or intermittent, it could be an issue with low TCP window size on the receiving system.


Solution? As stated, most systems do support TCP window scaling, but when things slow down and you don’t know what is wrong, make sure that TCP window scaling is functioning properly or try increasing the TCP receive buffer. Again, make use of an NMS tool for troubleshooting.

Flow control: The flow control mechanism allows an overloaded Ethernet device to send ‘pause’ frames to other devices that are sending data to it. Without flow control, the overloaded device drops packets causing a major performance impact. But when it comes to the backbone or network core, flow control can cause congestion in areas that would have otherwise transmitted without issues. For example, a switch sends a pause frame to a transmitting device because a switch port is unable to match the transmitter’s speed. When the pause frame is received by the transmitting device, it pauses it’s transmission for a few milliseconds. But what also happens is that the traffic to all other switch ports that has the bandwidth to handle the speed is paused as well.


Solution? Use flow control on computers, but don’t have your switches send out pause frames. Instead, implement QoS in the backbone to prioritize packets based on their criticality. You can find flow control best practices from this whitepaper here.

Right QoS: And that brings us to QoS. Network admins use QoS because it can prioritize business applications and drop unwanted traffic. But a few network admins overdo QoS by using it for every type of traffic that passes through the device. This can result in a few business applications performing well all of the time, while a few other applications continue to act up most of the time as well.


Solution? Use QoS only when it is absolutely necessary. You do not have to set priority or a QoS action for all traffic that passes through your network. Prioritize what is important and set best effort or queuing for everything else. Assign bandwidth for very critical applications whose data delivery is important to business continuity.


It’s important to understand the various reasons for data delivery failure, even the uncommon ones. By doing so, you will have a better idea of where to search when issues arise. If you’ve faced data delivery problems, where did your issue stem from and how did you resolve it?

You’d have to be shut off from the I.T. world to not realise that startups are making a big impression on the virtualisation and storage industries. In a way this has been likened to a breath of fresh air as the dominance of companies such as EMC, HP and NetApp have always moved customers to big an Enterprise leader for their virtual workloads.


The storage market represents several billion dollars and continues to grow year on year. For us I.T. folk we have always been steered by our organisations or indeed allegiance to a particular vendor through the politics that come with our I.T shops so where will this disruption end up. Indeed, we could see that in another 1-2 years, some of these niche players may be bought up by larger companies but what does the future hold ?


How many of you work at Enterprise companies and have entertained such startup companies ? I’m talking about companies such as Nutanix, Simplivity, Nimble, VMTurbo etc. ?


If you work for a SMB then I’d like to hear your thoughts on why you chose a particular startup over an established magic quadrant leader. Surely this can’t be a capital based decision so I’m keen to hear from you and contribute to what I see is something that needs some active discussion on.

What does a wireless thermometer have in common with ping? Both can keep a business from losing cash.


One of the ways businesses stay in business is by keeping a tight rein on costs. So, it should come as no surprise that convincing executives to allocate budget money toward IT monitoring software can be a challenge. To the average executive—and let’s be honest, the mid-level manager listening to the technical ramblings of an excited but fiscally vague IT Pro—monitoring seems like a pure sunk cost with no possibility of return.


However, IT pros know this couldn’t be further from the truth. All that’s required to help others understand why, is to answer a single question: how much will not monitoring cost?


Case in point: recently, a 300-bed hospital considered implementing a $5,000 automated temperature monitoring system for the freezers where the hospitals supply of food was stored. The system would have saved staff time by measuring the current temperature in each of the coolers and freezers, and sending notifications if the temperature was out of acceptable range.


Hospital administration declined, deeming the solution too expensive just to know that a freezer was five degrees too cold. Needless to say, one of the staff members eventually left the door to the main cooler open, which caused the compressor to run all evening until it failed completely. The next morning, staff arrived only to find all the food in that cooler had spoiled. Recovering from this failure required emergency food orders, extra staff, repair services and a lot of overtime.


The total cost of the outage came to a cool $1 million—200 times more than the cost of the monitoring system deemed to be “too expensive.” This kind of scenario, where a small upfront investment could have prevented costly problems down the road, should sound hauntingly familiar to IT pros.


With this example in mind, it behooves us as IT professionals to be able to explain—in clear terms that non-technical staff can understand—what is intuitively obvious to those of us in the trenches: the cost of not monitoring is often far greater than the tools that could help us avoid failures in the first place.


Convincing non-IT staff of the need for monitoring tools after a critical system failure is probably a little easier, as outages tend to remain fresh in people’s minds for a long time. But just how can IT pros make the case for monitoring without first experiencing an actual IT resource failure? Or, if an organization has experienced a failure with a particular system, how can IT pros make the case for purchasing monitoring tools to protect other mission critical systems?


It really comes down to identifying the potential costs of a failure. Every management team feels differently; what leadership at one organization feels is catastrophic, others might simply consider the cost of doing business. Therefore, IT pros need to highlight costs that are eminently avoidable. Some things to consider are:


  • The ultimate end result of a problem if it goes undetected
  • The amount of time a particular failure could go unreported
  • The amount of time it would take to fix the system from as a result of a failure
  • Regular hourly staff cost for the system in question
  • Emergency and overtime staff cost for the system in question
  • Planned vendor maintenance costs versus emergency vendor repair costs
  • Lost sales or other income per hour if the system in question is unavailable


To understand how all this fits together, consider the simple example of a hard drive failure on a primary email server.


To begin with, no self-respecting IT pro would be caught dead without some form of fault-tolerance for a critical system such as email. So, in this example, let’s say a mirrored drive was in place, but it failed a couple days prior to the second drive’s failure. Since there was no monitoring solution in place, nobody noticed, effectively making it a single drive system.


The end result is that the system would crash. You would think an email system crash would be immediately noticeable, but email clients like Outlook do a great job of offline caching, so it can actually take a while before anyone notices. In this example, let’s say it takes 30 minutes.


Recovering from a hard drive failure takes time unless there are spare parts immediately on hand and some kind of instant recovery option. Let’s estimate that replacing the drive itself takes about an hour, and restoring from backup takes another hour. However, this is a vendor repair. That’s either a four hour lead time or one hour for emergency service.


Now let’s look at the costs. Let’s say regular staff time is $53 per hour while overtime is $75 per hour. Standard vendor repair is free, but remember that four hour lead time. Emergency vendor repair is $150 per hour with a two hour minimum.


This means email will be offline for between three and a half to six and half hours, with a cost of between $106 and $450. This may not seem like a big deal. However, that is the cost of just one drive failure. Consider a company that experiences 350 drive failures a year (something I have personally witnessed). Now we’re talking about between $37,000 and $157,000 per year—not counting company revenue lost while email is down and productivity plummets as a result.


Now, of course, drives fail whether they are monitored or not. However, in the above example, catching the first drive failure, replacing it at a convenient time and avoiding both the outage and the time spent performing data recovery could save between $18,500 and almost $140,000 over the course of a year.


It’s important to go through a similar exercise for all mission critical systems in the IT environment—including email, CRM and Web services—combined with different types of outages, such as disk failure, application crashes and network failure.


To avoid becoming overwhelmed, prioritize. Take a hard look at the IT environment and honestly assess what systems are rock-solid and which are a bit shakier. Also, leverage other team members where necessary by asking them how long it takes to identify when their systems are offline, and how long it takes to bring them back up.


This process may seem tedious, but all too often it’s what it takes to help non-IT executives and other decision makers understand that proper monitoring is crucial, and that the cost of not monitoring can far exceed that of doing so. Simply put: speak their language, which is the language of money.


BONUS: To help you get started, I've uploaded a spreadsheet that collects this information and does some simple calculations: Monitoring_Cost_Estimator.xlsx


Note: This article originally appeared on InfoTech Spotlight. Click here to read that version.

I’m a huge fan of AWS. Not that AWS (Amazon Web Services), even though I do like that AWS and it exemplifies this AWS. This AWS is all about agility, web-scale, and simplicity. These are core tenets of IT transformation, also known as the Consumerization of IT. To keep up with this AWS and that AWS, IT admins need to continually learn and re-learn new technologies and skills. Say hello to complexities!


So how does one manage these ever-changing complexities while continuing to grow their IT career? The answer is quite simple – tools. IT tools are the bridge between technologies, skills, and utility to the business. Delivering a successful business ops utility will allow IT admins to enjoy a long and rewarding career. So, tools become integral to an IT pro’s career outlay.


Ideally, the tool would be simple and intuitive to deploy and use, but it would also be powerful enough to be used by an IT pro as they scale and grow their career. For instance, virtualization admins may have a tool that they use to monitor their VMware vSphere environment and another to monitor their Microsoft Hyper-V environment. What if there was one tool that monitored both? Next, when something breaks in their virtualized IT environment, virtualization admins may have different tools to troubleshoot and resolve the issue. Again, what if there was one tool that has alerting capabilities with drill-down and guided remediation capabilities? Finally, virtualization admins probably use spreadsheets, documents, and slide decks to report status in support of both IT and business ops. What if there was one tool that has reporting capabilities from the deepest technical details to the most holistic summary? What if I told you that there is one tool that does all of these things for virtualization admins?


Join me and Chris Paap as we cover Virtualization Manager - The One Tool for virtualization admins to hone and own your IT skills. We will provide practical examples of monitoring, troubleshooting, and reporting so that you can become the Master of your Virtualized Universe!


Webcast: Three skills to become the Master of your Virtualized Universe

Date: Tuesday, January 13, 2015
Time: 11:00 a.m. – 12:00 p.m. CST
3 Skills to Become the Master of Your Virtualized Universe

Just the other day (Okay, it was a few weeks ago) I was having a discussion about logging with a “small” Fortune 50 company.  Their problem was… They wanted a more intelligent way to analyze the information they are logging so they could help troubleshoot or understand problems in their environment easier.  This is obviously a capability we all would love, intelligence out of our data collection, systems and event log subsystems. 


Oh but logging intelligence doesn’t come without its challenges, you tell me if you experience some of the same challenges they expressed because this really throws a wrench into the works.


- Only collecting logs from some systems not every single one of them

- Not collecting Windows Event Logs, Syslog, or detailed logging from every server or device

- Inability to ingest the information of the existing logs which are being collected

- Unable to keep long collections of information in accord with compliance due to lack of allocated storage


Now let’s not even bring compliance or regulatory requirements into this, because imagine the above challenges, at scale and then retention over the course of 7 to 10 years depending upon who’s “rules” you need to follow.


You might be asking yourself just as I was asking while we were discussing this; If your unable to collect the data fast enough, without enough space to store it for a long enough duration, from an incomplete picture of your entire infrastructure… What’s the point? I mean what if we were trying to more than merely troubleshoot a problem and had to react or respond to a breach which seems all the rage these days?


With breaches like the ones which are making all the news having some elements of intelligence to analyze, interpret and act upon the data would be ideal, however without a complete picture of the environment, or only selectively logging it gives us an incomplete ability to react and respond to incidents.


The challenges we all face when it comes to logging collection can be paramount to a successfully defended and understood infrastructure.


Are there other challenges you see organizations face?

Do you find logging to be more of a ‘set it and forget it’ never to look at unless troubleshooting or responding to an incident?

I know it’s difficult to ask these questions without implicitly exposing your environment by saying, “Yes we have an incomplete logging solution” which is why it can be a sensitive topic to discuss.


What are your thoughts, is this off the mark and these issues are few and far between? I’d love to hear your thoughts on this matter.

My thought provoking article is around the management of storage in the datacenter. IT managers and Storage Admins have long since had decisions to in the areas of capacity planning and performance but now that we are being encouraged to think of the software defined datacenter, why should we still be thinking about storage ?


Virtualisation is teaching us to forget about the building blocks and focus on the apps but if you still manage on premise storage then you must have a vested interest in the upkeep of these building blocks. In essence, we are being encouraged to think outside the box and to not worry about where IOPS and latency are required and let the dynamic models of automation handle this.


Storage comes in all shapes and sizes from DAS and NAS all the way up to enterprise based flash systems but are you guys meant to be able to distinguish where one type of system is favoured over another ? Sometimes you don’t know the application requirements until everything is in production. How do you then go back and adjust your model ?

How do staff continue to forecast capacity growth with cloud and virtualized deployments since the nature of these elastic deployments is to grow and shrink depending on business requirements ? For the people that have to manage dissimilar hardware, what tools are you using ?


I’d like to hear your thoughts on what you think and will reply to any comments to see what interesting discussions arise.

Leon Adato

Another Look Back

Posted by Leon Adato Employee Jan 5, 2015

Many of us are getting back to work after a few weeks of relative quiet (or for those of us with large families, at least a week of quiet from work-related issues). And as we're catching up on emails and updating our TODO lists, a look back may help us plan how best to move forward. Last week, Head Geek Lawrence Garvin  posted a retrospective on the highs and lows of IT in 2014. I'm re-posting it here.


But first, I am updating Lawrence's list with a few of my own observations:


Credit card breaches? Thank you Sir, may I have another?
Chik-Fil-A just announced they have lost credit card details for approximately 9,000 customers. According to at least one banking source, that's more than the Target breach!


Microsoft announced a new CEO. As usual, Apple topped them.
As usual, Apple proved that it could be more current, accessible, stylish, and cutting edge. Kudos to Tim Cook for being yet another role model to the GBLT community and proving coming out may be a difficult choice, but it will never be more embarrassing than dancing like a monkey on stage at your company's annual convention. (


Celebrity photo leak
Having something personal taken from you is painful on many levels. One of the targets of this theft was Kim Kardashian, who made things slightly confusing (as well as basically breaking the internet) in November by choosing to pose nude for a interview. That said, the operative word here is "choosing."


How do you spell "Sony"? How about I-R-O-N-Y?
In all the hubbub about the hacks, threats, and eventual release of "The Interview", one item which was overlooked by many is that one of the songs ("Pay Day", by Yoon Mi Rae) was itself "pirated." There were discussions between Sony and the singer's label, but they were dropped early on and the movie was released without paying or crediting the artist.


And now, on to Lawrence's list. Note: This post originally appears in Information Week: Strategic CIO.


Take a look back at some of the most memorable IT incidents -- for good and bad -- over the past year.
The things we'll most likely remember from 2014 are all the things in IT that went wrong, and those won't go un-re-noticed here. A couple of those things were just flat-out attributable to human error, and I'll also make a point of calling those out where I think they occurred.

But the year was not all catastrophic. There were a few really cool things that happened in IT, and in technology generally. Those are just as important to remember as the lessons learned from the fiascos.

18 months of credit card breaches
No doubt the biggest story of the year, or at least the longest-running story, was the spate of credit card breaches suffered by some of the country's most notable retailers. We've all read about these, to some extent or another, but since part of the point of this article is to call out the good, bad, and ugly, let's start there.

First, kudos to P.F. Chang's for its rapid response in simply pulling the plug on its electronic credit card processing systems. No kudos for Neiman Marcus, which only reported its breach in June, although it occurred prior to the Target breach in late 2013. So, in fact, it was Neiman Marcus in July 2013 that is due the credit of starting the recent wave of breaches. The ugly goes to Home Depot. I'm still trying to wrap my head around how that stuff got past compliance auditing.

Microsoft names new CEO
I said I'd include some good news. While a good portion of the world was somewhat skeptical back in February, I have to say that for the most part I think Satya Nadella's ascendency to the software throne of the world has been a positive thing for Microsoft. Certainly, the culture of listening to customers has become more open, and it's hard not to be encouraged by the looks of Windows 10.

Unfortunately, the lack of quality in the trenches, particularly with respect to the bad batch of patches released over the past six months, is damaging the memory of what could otherwise have been a great year for Microsoft.

XXII Olympic Winter Games, Sochi, Russia
Despite all the cynical attitudes about the Winter Olympic Games being in Russia, all in all I thought the Sochi event was as good as any other Olympic Games in recent years, and certainly better than a few.

So, in the midst of all the credit card chaos, we learned something really important about open source software: Apparently open source developers read their peers' source code about as often (and as diligently) as IT professionals read product documentation before implementing software in production.

The good news from Heartbleed, though, is that the damage could have been exponentially worse than it actually was. Kudos to a responsive IT community that plugged the critical holes pretty quickly, and as far as I know, there's still only one actual breach attributed to Heartbleed.

FIFA World Cup, Brazil
Like the Winter Olympic Games, the naysayers had a lot of negativity floating around the airwaves about Brazil hosting the FIFA World Cup. But aside from a couple of minor disruptions early in the tournament, some really bad officiating, and unbelievably unsportsmanlike incidents, it was every bit the success that the Sochi Winter Olympic Games were. It's sad, however, to realize that most of the high points of the year in an article about IT were sporting events.

Celebrity "NSFW" photographs
In September, we learned exactly how important personal passwords are. We also learned (well I think some celebrities learned) that one ought not to store controversial content on somebody else's computer systems. But if you do, encrypt it. And encrypt it with your own keys!

If only the responsiveness to Shellshock had been as strong as it was for Heartbleed. Unfortunately, it was not, and today there are a myriad of active exploits affecting all sorts of Unix- and Linux-based systems that use the Bash shell as their default. Ostensibly, this fix was even easier than Heartbleed: Just turn off the Bash shell! Of course, some systems have only the Bash shell, so this is not practical in all cases. But the fact that exploits are still commandeering entire storage systems because patches that exist have not been applied is just, well, shocking.

Humanity landed on a comet!
It's been a really long time since anybody in the world did anything truly notable in the realm of space exploration. Yeah, SpaceX built a rocket to resupply the International Space Station, but humanity has been building suborbital rockets for 50 years. But this year, the European Space Agency landed on a comet! Well, to be honest, ESA bounced the lander off the comet and then it landed in shade, rendering it functionally useless. But do you have any idea what sort of navigational expertise it takes to hit a comet after 10 years of unmanned spaceflight? I definitely think this is the story of the year.

And, not to be outdone by any of the above, once again Sony gave us something to think about. I might have a modicum of sympathy for Sony, given the size of the intrusion and the ongoing impact of what was stolen, except we're now learning that (like with Home Depot) much of the damage was due to the company failing to maintain its own computer security. To add insult to injury, we're also finding out that the code that infiltrated Sony was so bug-ridden that it may be a miracle that it even worked at all. Then the hackers make a threat against movie theatres that planned to show Sony's movie The Interview, and Sony pulled the movie from distribution. (Well, really, I'm more inclined to think Sony pulled the movie so it wouldn't have to explain a $10 million opening weekend from the few theatres that actually showed it.)

So, that was 2014. From malware hacking poorly protected credit card systems abetted by dysfunctional corporate security procedures, to malware hacking poorly protected entertainment companies abetted by dysfunctional corporate security procedures, it just seems that nothing ever changes. Shakespearean theatre would refer to 2014 as a "comedy," inasmuch as the year started pretty much like it ended. Let's all learn a lesson or two, or ten, from these rough experiences and make 2015 a little better.


The Top Ten IT Pro-Dictions

The SolarWinds High Council of Head Geeks has gathered their sage and eternally savvy wisdom. You would be wise to heed their words

~ Knowstradamus

SolarWinds 2015 Head Geek Predictions

As 2014 comes to a close and 2015 begins, SolarWinds tapped its band of experts – the Head Geeks – to take a look inside their crystal balls and provide a glimpse into IT trends to watch for this year.

To complement the IT predictions from the Head Geeks, we’d like you to share your view of each (50 thwack points will be awarded for replies!). And don’t be afraid to go out on a limb and suggest a “paranoid perspective” for 2015 – Will networks become so complex that they are nearly impossible to manage? Security vulnerabilities so vast that all companies will experience at least one? Will the hybrid cloud transition create more headaches than benefits? Will business end users stage a revolt? Be sure to @mention each geek to continue the conversation with them about their predictions. Here are each of their thwack handles:

Kong Yang - @kong.yang

Lawrence Garvin – @LGarvin

Patrick Hubbard - @patrick.hubbard

Leon Adato - @adatole

Thomas LaRock - @sqlrockstar

Throughout the year, we plan to revisit these predictions and see how are becoming a reality, or instead, how many are turning out to be a completely paranoid fantasy.

Reporting for IT Duty

Posted by kong.yang Jan 2, 2015

IT reporting is rooted in three simple goals: (1) show compliance, (2) provide holistic status updates, and (3) deliver proof that is used as data evidence for actionable decisions. Furthermore, it is always best practice to start with the end goal in mind. With IT reporting, it’s no different because each of the aforementioned goals may have a different audience, each with their own specific need.


Once the audience is defined, the relevant data that the audience wants to consume will also become clearer. For instance, a CIO may need a 1-pager showing overall IT compliance for auditors, an IT manager may need a high level update for each infrastructure layer to show uptime efficiency to IT executives, and an IT admin may need to summarize technical data to justify necessary IT actions to their direct managers and fellow IT colleagues.


With the objective in mind, relevant data can be captured and processed into something that the audience can consume. The IT report shouldn’t be just a data dump. It should aspire to be easily understand and quickly provide compelling evidence to justify an IT business operations decision. In this sense, one can think of IT reporting as the bridge between IT operations and business operations.


Successful IT reporting consists of:

  1. Defining the audience
  2. Understanding their objective
  3. Gathering and processing the relevant data
  4. Presenting the data in a clear, concise manner
  5. Enabling actionable decisions


In the comments below, let me know what you think of the IT reporting flow and construct. This concludes the 3-part series on monitoring, troubleshooting, and reporting. The three blog posts serve to level set definitions as well as provide context for future posts that will cover specific, practical use cases for each of the three IT skills.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.