1 6 7 8 9 10 Previous Next

Geek Speak

1,543 posts

"I know real estate applications," the Business Analyst said confidently to the dozen of us cloistered in a conference room.


"And real estate applications don't work well when they're virtualized," she insisted, face lowered, eyes peering directly at me over the rims of her Warby Parkers.


For a good 5-8 seconds, all you could hear was the whirring of the projector's fan as the Infrastructure team soaked in the magnitude of the statement.


I had come prepared for a lot of things in this meeting. I was asking for a couple hundred large, and I had spreadsheets, timelines, budgets, and a project plan. Hell, I even had an Excel document showing which switch port each new compute node would plug into, and whether that port would be trunked, access, routed, a member of a port-channel, and whether it got a plus-sized MTU value. Yeah! I even had my jumbo frames all planned & mapped out, that's how I roll into meetings where the ask is a 1.x multiple of my salary!


But I had nothing for this...this...whatever it was...challenge to my professional credibility? An admission of ignorance? Earnest doubt & fear? How to proceed?


Are we still fighting over whether something should be virtualized?


It was, after all 2014 when this happened, and the last time I had seen someone in an IT Department resist virtualization was back when the glow of Obama was starting to wear off on me....probably 2011. In any case, that guy no longer worked in IT (not Obama, the vResistor!), yet here I was facing the same resistance long long long after the debate over virtualization had been settled (in my opinion anyway).


Before I could get in a chirpy, smart-ass "That sounds like a wager" or even a sincere "What's so special about your IIS/SQL application that it alone resolutely stands as the last physical box in my datacenter?" my boss lept to my defense and, well, words were exchanged between BAs, devs, and Infrastructure team members. My Russian dev friend and I glanced at each other as order broke down...he had a huge Cheshire cat grin, and I bet the ******* had put her up to it. I'd have to remember to dial the performance on his QA VMs back to dev levels once if I ever got to build the new stack.


The CIO called for a timeout, order was restored, and both sides were dressed down as appropriate.


It was decided then to regroup one week hence. The direction from my boss & the CIO was that my presentation while, thorough, was at 11 on the Propellerhead scale and needed to answer some basic questions like, "What is virtualization? What is the cloud?"


You know, the basics.


Go Powerpoint or Go Home


Somewhat wounded, I realized my failure was even more elemental than that. I had forgotten something a mentor taught me about IT, something he told me to keep in mind before showing my hand in group meetings: "The way to win in IT is to understand which Microsoft Office application each of your teammates would have been born as if they had been conceived by the Office team. For example, you're definitely a Visio & Excel guy, and that's great, but only if you're in a meeting with other engineers."


Some people, he told me, are amazing Outlookers. "They email like it's going out of style; they want checklists, bullet points, workflows and read receipts for everything. Create lots of forms & checklists for them as part of your pitch."


"Others need to read in-depth prose, to see & click on footnotes, and jot notes in the paper's margin;  make a nice .docx the focus for them."


And still others -perhaps the majority- would have been born as a Powerpoint, for such was their way of viewing the world. Powerpoint contains elements of all other Office apps, but mostly, .pptx staff wanted pictures drawn for them.


So I went home that evening and I got up into my Powerpoint like never before. I built an 8 page slide deck using blank white pages. I drew shapes, copied some .pngs from the internet, and made bullet points. I wanted to introduce a concept to that skeptical Business Analyst who nearly snuffed out my project, a concept I think is very important in small to medium enterprises considering virtualization.


I wanted her to reconsider The Stack (In Light of Some Really Bad Visualizations).


So I made these. And I warn you they are very bad, amateur drawings, created by a desperate virtualization engineer who sucks at powerpoint, who had lost his stencils & shapes, and who was born a cell within a certain column on a certain row and thought that that was the way the world worked.


The Stack as a Transportation Metaphor


Slide 2: What is the Core Infrastructure Stack?  It's a Pyramid, with people like me at the bottom, people like my Russian dev friend in the middle, and people like you Ms. Business Analyst, closer to the top. And we all play a part in building a transportation system, which exists in the meatspace (that particular word was not in the slide, and was added by me, tonight). I build the roads, tunnels & bridges, the dev builds the car based on the requirements you give him, and the business? They drive the car the devs built to travel on the road I built."


Also, the pyramid signifies nothing meaningful. A square, cylinder, or trapezoid would work here too. I picked a pyramid or triangle because my boy would say "guuhhhl" and point at triangles when he saw them.



I gotta say, this slide really impressed my .pptx colleagues and later became something of an underground hit. Truth be told, inasmuch as anything created in Powerpoint can go viral, this did. Why?


I'd argue this model works, at least in smaller enterprises. No one can argue that we serve the business, or driver. I build roads when & where the business tells me to build them. Devs follow, building cars that travel down my roads.


But if one of us isn't very good at his/her job, it reflects poorly on all of IT, for the driver can't really discern the difference between a bad car & a bad road, can they?


What's our current stack?


"Our current stack is not a single stack at all, but a series of vulnerable, highly-disorganized disjointed stacks that don't share resources and are prone to failure," I told the same group the next week, using transitions to introduce each isolated, vulnerable stack by words that BAs would comprehend:



My smart ass side wanted to say "1997 called, they want their server room back," but I wisely held back.

"This isn't an efficient way to do things anymore," I said, confidence building. No one fought me on this point, no one argued.


What's so great about virtualizing the stack?


None of my slides were all that original, but I take some credit for getting a bit creative with this one. How do you explain redundancy & HA to people woefully unprepared for it? Build upon your previous slides, and draw more boxes. The Redundant Stack within a Stack:


The dark grey highlighted section is -notwithstanding the non-HA SQL DB oversight- a redundant Application Stack, spread across an HA Platform, itself built across two or more VMs, which live on separate physical hosts connecting through redundant core switching to Active/Active or A/P storage controllers & spindles.

I don't like to brag (much), but with this slide, I had them at "redundant." Slack-jawed they were as I closed up the presentation, all but certain I'd get to build my new stack and win  #InfrastructureGlory once more.

And that Cloud Thing?


Fuzzy white things wouldn't do it for this .pptx crowd. I struggled but to keep things consistent, I built a 3D cube that was fairly technical, but consistent with the previous slides. I also got preachy, using this soapbox to remind my colleagues why coding against anything other than the Fully Qualified Domain Name was an mortal sin in an age when our AppStack absolutely required being addressed at all times by a proper FQDN in order to to be redundant across datacenters, countries, even continents.



There are glaring inaccuracies in this Hybrid Cloud Stack, some of which make my use of Word Art acceptable, but as a visualization, it worked. Two sides to the App Stack, Private Cloud (the end-state of my particular refresh project), and the Public Cloud. Each have their strengths & weaknesses, each can be used by savvy Technology teams to build better application stacks, to build better roads & cars for drivers in the business.

About six weeks (and multiple shares of this .pptx) later, my new stack complete with 80 cores (with two empty sockets per node for future-proofing!), about 2TB of RAM, 40TB of shared storage, and a pair of Nexus switches with Layer 3 licensing arrived.


And yes, a few weeks after that, a certain stubborn real estate application was successfully made virtual. Sweet.

I’ve worked with a few different network management systems in my career, some commercial and some open source. Based on my experience with each one of them, I’ve developed certain qualities that I look for when deciding on which product to recommend or use.


One common theme that always seems to come up when comparing experiences with my peers is how easy it is to implement and operate product $A vs product $B?


In my opinion, implementation and operation of a product are critical when accessing any product, not just a NMS. If it’s not easy to implement, how are you ever going to get it off the ground? Will training be necessary just to install it? IF you ever do get it off the ground and running, will it take a small army to keep it going? Will you have to dedicate time and resources each day just to cultivate that product? How can you trust a product with the management and monitoring of your network environment, if it crashes all the time or if you have to be a Jedi to unlock all of its mystical bells and whistles?


With that being said, what do you look for in a network management system? Easy to install? Intuitive interface? All the features you could wish for? Is cost the ultimate factor? I’d love to hear what you all think.





A Lap around AppStack

Posted by joeld Jan 30, 2015

What is AppStack?


AppStack is a new technology that brings multiple SolarWinds products together in a new innovative way. AppStack provides visibility to the different infrastructure elements that are supporting an application deployed in production, but most importantly how they are related to each other.


For instance, a typical multi-tier application will be deployed on multiple virtual machines located on one or more hypervisors and accessing one or more datastores. Each of those components uses some level of storage that can either be direct or network attached.



AppStack allows an application owner or IT operator to quickly identify the key infrastructure elements that might be impacting application delivery.


The Orion Platform


Let’s take a quick look at how SolarWinds products are architected in the first place. SolarWinds has a large portfolio of products that can be separated into two main categories. The first category contains all the products primarily focused on monitoring various parts of the IT infrastructure and run on top of the Orion Platform. The second category is composed of all the other tools that are not running on top of Orion. AppStack is related to the products belonging to the first category, such as Server & Application Monitor (SAM), Virtualization Manager (VMan), and Storage Resource Monitor (SRM).


The Orion platform provides a core set of services that are used by the various products that run on top of it. Some of the most important services exposed by the platform are alerting, reporting, security, UI framework, data acquisition, and data storage

as shown in the following diagram:



The products that run on top of Orion are plug-ins that take advantage of those services and are sold individually.


How does AppStack work?


AppStack is not a new product by itself, but rather, it extends the capabilities of the Orion platform in various ways, allowing all the products running on top of Orion to take advantage of those new features.


One of the key enablers of AppStack is the information model that is being maintained by the Orion platform. In the background, Orion maintains a very rich representation of how the entities monitored by the various products relate to each other. This information model is maintained by the Information Service which exposes a set of APIs (SOAP and Restful) to allow easy access to this information. The information model has been a key foundation of each individual product for several years and is now being leveraged to another level by AppStack.


As the following diagram shows, the metadata model is extended and populated by the different products installed on top of Orion. By default, Orion ships with a very minimal model. Its sole purpose is to serve as the foundation of the product models. One key aspect of the model is that it does not require different products to know about each other, but still allows them to extend each other as they are purchased and installed by our users over time. For instance, Server & Application Monitor does not need to know about our Virtualization Product in order for the Application entity to be related to the Virtual Machine entity.





Having a good set of entities is a good start, but the key to success is to ensure that the relationships between those entities are also captured as part of the model. AppStack exploits the presence of those relationships in the information model to tie together the different pieces of infrastructure that support an application.

In the AppStack dashboard, every time the user clicks on one of the entities displayed, behind the scenes AppStack retrieves the list of entities that are directly or indirectly related to that selected entity and highlights them in the user-interface. This provides a very intuitive way of troubleshooting application delivery issues.


An important point to mention is that all of this happens without any prior configuration of the user.


This blog post provided you a quick overview of what AppStack is and how it works behind the scenes. If you have any questions, don’t hesitate to reach out to me on twitter at @jdolisy.

Dare to Thrive is the theme at VMware Partner Exchange (PEX) 2015. And that theme is reflected by the BIG online event, the new beginning, and the PEX experience.



First off, there is VMware's Online Launch Event on February 2nd. Tech buzzwords like software defined data centers (SDDC) software defined networking (SDN), software defined storage (SDS), containers such as Docker, Kubernetes, CoreOS Rocket, and App Volumes aka cloud-native apps, and hybrid cloud will almost surely be used. VMware's been talking about and demoing some of these technologies for years. It'll be interesting to see how they extend their portfolio now that they have an update to their big fundamental, VMware vSphere.

New Beginning

Back for our 3rd season with a new performance-centric focus is So Say SMEs in Virtualization and Cloud with Todd Muirhead, Sr. Staff Engineer in Performance R&D at VMware and me. We're just two virtualizing, cloud sizing, and performance optimizing subject matter experts (SMEs), who happen to enjoy sports.



VMware PEX is more signal and less noise than VMworld and aims to provide VMware partners with the means to succeed. I'm expecting better engagements and more personalized experiences from the following:

  • New Hands-on-Labs - An absolute must if attending any VMware event.
  • NDA sessions and product briefings including invites to the vCloud and the Software-defined Storage events.
  • Solutions Exchange conversations with attendees and fellow VMware partners.
  • Community events like #vBacon and #vExpert tweet-up.


In closing

I look forward to catching up with friends and making new connections. Follow my updates on Twitter with hash tags: #VMwarePEX #SolarWinds. If you want to meet up with me at VMware PEX to discuss Virtualization Manager or any other SolarWinds products, just send me a message on Twitter - @kongyang.

OR: "Don’t just sit there, DO something!"


If you have used a monitoring tool for any length of time, you are certainly comfortable setting up new devices like servers, routers, switches and the like. Adding sub-elementsdisks, interfaces, and the like are probably a snap for you. There’s a good chance you’ve set up your fair share of reports and data exports. Alerts? Pssshhhh! It’s a walk in the park, right?


But what do you DO with those alerts?


If you are like most IT Professionals who use a monitoring tool, you probably set up an email, or a text message to a cellphone, or if you are especially ambitiousan automated ticket in whatever incident system your company uses. And once that’s set up, you call it a day.


Your monitoring will detect an error, a notification will be sent out, a human will be launched into action of some kind, and the problem will (eventually) be resolved.


But why? Why disturb a living, breathing, working (or worse – sleeping) person if a computer could do something about the situation?


The fact is that many alerts have a simple response which can often be automated, and doing sotaking that automatic actioncan save hours of human time.


Here are some examples of direct action you can take:


A monitor triggers when

Have automation do this

A service is down

Attempt to restart the service

A disk is over xx% full

Clear the standard TEMP folders

An IP address conflict is detected

Shut down the port of the newer device


If the action is not successful, most monitoring systems will trigger a secondary action (that email, text message, or ticket I mentioned earlier) after a second wait time. (Pro Tip: If your monitoring solution doesn’t support this, it may be time to re-think your monitoring solution).


At worst, your alert will be delayed by a few minutes. BUT it will be delayed by having done (instantly) what the human technician was going to do once they logged in, so in a sense the situation is more than a few minutes ahead of where it would be if you had let the human process proceed as normal.


But that’s not all. Another action you can take is to gather information. Many monitoring tools will allow you to collect additional information at the time of the alert, and then “inject” it into the alert. For example:


A monitor triggers when

Have automation do this

CPU utilization is over xx%

Get the top 10 processes, sorted by CPU usage

RAM utilization is over xx%

Get the top 10 processes, sorted by RAM usage

A VM is using more than xx% of the host resources

Include the VM name in the message

Disk is over xx% full (after clearing temp folders)

Scan disk for top 10 files, sorted by size, that have been added or updated in the last 24 hours


Sounds lovely, but is this really going to impact the bottom line?


For a previous client, I implemented nothing more sophisticated than the disk actions (clearing the Temp drive, and alerting after another 15 minutes if the disks were still full) and adding the top 10 processes to the high CPU alert.


The results were anywhere from 30% to 70% fewer alerts compared to the same month in the previous year. In real numbers, this translated to anywhere from 43 to 175 fewer alerts per month. In addition, the support staff saw the results and responded FASTER to the remaining alerts because they knew the pre-actions had already been done.


The CPU alerts obviously didn’t reduce, but once again we saw the support staff response improve, since the ticket now included information about what specifically was going wrong. In one case, the client was able to go back to a vendor and request a patch because they were able to finally prove a long-standing issue with the software.


As virtualization and falling costs (coupled, thankfully, with expanding budgets) push the growth of IT environments, the need to leverage monitoring to ensure the stability of computing environments becomes ever more obvious. Less obvious, but just as critical (and valuable) is the need ensure that the human cost of that monitoring remains low by leveraging automation.


NOTE: This is a continuation of my Cost of Monitoring series. The first installment can be found here: The Cost of (not) Monitoring.

Part two is posted here: The Cost of Monitoring - with the wrong tool (part 1 of 2) and here: The Cost of Monitoring - with the wrong tool (part 2 of 2)

You may have noticed a trend these past few weeks with my recent posts such as;

Logging; Without a complete picture, what’s the point?

Troubleshooting vs Compliance Security; Logging without borders?

Are you Practicing Security Theater in IT


Alright, sure you’ve noticed a trend, I’ve been talking a lot about logging, the importance of an audit trail and the overwhelming importance of security while at the same time how security is more a masquerade ball of promises and less guarantees.    But what does this have to do with Taylor Swift?


(Note: This is the real Taylor Swift and not the Infosec Focused Taylor Swift @SwiftOnSecurity)


Every organization has customers, that’s how we do business, the internet is no different.  If you’re say a Twitter or Instagram (which considering Facebook owns Instagram, we may as well say Facebook) and one of your top customers; In the case of Twitter out of your 284 Million users, one of your users (Taylor Swift) is your Fourth largest account.   If that customers account were compromised, information leaked, relationships tarnished, would that look good for your business? Hardly.


But what can we do about things like this?  Twitter recently ‘bolstered’ their security by quietly introducing a Two Factor Authentication model which didn’t even make a blip on the horizon.   But you might be saying, I don’t use twitter, or I don’t care about twitter, or my business doesn’t rely upon it. (We won’t go into the number of faux paux’s in the very recent past by rogue or accidental tweets horribly tarnishing brands) instead, how about something a little closer to home …


Hey, did you realize that JPMC - You know, Chase bank suffered a compromise of 83 Million accounts back in July of 2014? I’m a huge Chase user, I strangely didn’t get any kind of ‘notification’ from them on this… (although I now receive daily fake chase bank spam messages...)


Now what I have to ask you is, if organizations with multi-million dollar security budgets like Chase, with 10s of millions of customers/users, and online ‘designed for online’ organizations like Twitter with hundreds of millions of customers cannot protect our data, our information, protect us from compromise.   What can we do?


I’ll tell you a few things I think we can do:


- We can start getting serious with our security, the security of our systems, of our customers data and our data

- We can get active systems like IPS’s and intelligent firewalls for East-West and North-South traffic vs Perimeter security which is Port or Protocol based.

- We can ensure that if we do have extra security measures, be it two factor or certificate based authentication that it is leveraged by our customers, employees, assets.

- We stop practicing security theater, treating it as an afterthought and living by the check-box.



This is definitely an iterative list, one which will grow as we discover more about our environments and ourselves. 

What are some things you're seeing people do to tackle and protect against threats like these?

I’d love your feedback and contributions so we can all grow with finding better ways to handle measures like this.

We’ve all had that situation where a member of your Web Apps team complains that an Intranet based site is slow or an important manager says that they can use the internal expenses system hosted on a web server but what does everyone tend to do in these situations. I’d say that most of us would open our monitoring tool of choice and look at some website perfmon stats of some response time stats from a tool that can fetch this but sometimes these tools do not give us the user experience.


If you have a widely dispersed company and have employees sitting in other parts of the world accessing an internal site hosted in the UK, how do we know one person’s performance from another and conceptually measure the differences ?


I’d like you to think about these scenarios and let me know how you approach the reactive situations. I would also like to know if any of you are proactively measuring website performance and user experience and how you are doing this ?


Indeed, some of you maybe using Solarwinds products to do this but nevertheless I’d like to know what software you use and whether it has proven successful at your companies.


If you don’t bother to monitor performance, speeds and feeds of your intranet or internet website then I’d like to know why not ?


Don’t be shy and get those comments in.

Quite a few pundits including myself have mentioned that 2015 will be the year of hybrid cloud. But what exactly is hybrid cloud? Ask one hundred IT Pros and you’ll probably get one hundred differing answers. Check out some of the comments on the hybrid cloud post by our thwack Ambassador, amitace, aka @amitpanchal76 on Twitter.


One way to think of hybrid cloud is that some IT services will be fulfilled and supported by the in-house IT department and some IT services will be fulfilled and supported by someone outside of the in-house IT department. These IT services can be infrastructure, platforms, and software. And they can be requested and consumed at any given time from any given place. The preferable means of consumption is via a self-service portal with an associated marketplace, which aggregates all the IT services. Companies that have already adopted hybrid cloud strategies include Coca-Cola and Anthem.


Hybrid cloud matters to IT admins, even if your company currently has no plans for IT services fulfilled and supported by someone outside of your IT department mainly because it will be great for your IT career. Think of it as an investment in yourself. The IT learning curve remains unchanged: (1) understand the technologies, (2) develop and hone the requisite skills, and (3) maximize your utility to the business.


It’s never a bad idea to associate yourself with a technology trend that has research and development, plus capital investment legs. The last time I saw a similar technology trend to the hybrid cloud trend was when virtualization arrived. So ask yourself do you want a new IT career path? If so, hybrid cloud will provide you plenty of opportunities in 2015. And that’s why it matters.

Microsoft is preparing to showcase Windows 10: The Next Chapter this week. VMware is running an online launch event during the week of February 2nd, which happens to coincide with VMware Partner Exchange 2015. What does this mean for IT pros? In the near term, probably not much beyond understanding and testing out new features in controlled environments. Longer term, it means that IT pros will have to decide on if and when they upgrade their current environments to the latest releases.


Microsoft Windows 10VMware Partner Exchange 2015


What are your thoughts and expectations of Microsoft Windows 10 announcement and VMware vSphere’s Big Launch announcement? How will these product launches affect your IT administration mindset especially around monitoring, troubleshooting, and reporting? Please share your point-of-view in the comments below.


The Microsoft Windows 10 event can be viewed on January 21st at 11AM CST here: Windows 10.

Register for the VMware Big Online Launch event on February 2nd at 3-4PM CST here: VMware Online Launch Event.

This year and the previous, we have heard from various cloud providers that 2015 could be the year to take on a Hybrid Cloud approach in your organisation. With all that being said, we know that Cloud models exist in many guises and the various models exist to offer organisations a variety a choice in their deployment methodology. Stretching out workloads to the Cloud offers varying security considerations as well as understanding of the SLA’s in order to provide your business with the reassurance that production workloads are available.


I’m thinking that most shops are using Hybrid for Test / Dev initially but I’d love to hear how you are using it in your organisation or what investigations you have kicked off to see if this model can benefit your company.


What would you need to do in order to convince the executives in your company that Hybrid Cloud could be a good fit for your organisation ? I can see that the value from the developer community is a high one and to burst workloads into a cloud could be attractive when you have limited resources available but are you faced with red tape when it comes to deployment ?


If your business is an International business do you find that you are having more challenges in order to start to seek hybrid cloud opportunities ?


Finally, for those of you that are actively using a hybrid cloud setup, let me ask you this. Do you feel that your management has enough understanding of the topology and the implications of disasters to a single or multiple servers ?

A while ago, I bumped into one of my ex-colleagues who now works as a traveling network consultant for a network services company. Among the various topics we discussed over a cup of coffee, photography and networking took the center-stage. My ex-colleague, being always on the move and meeting various clients, also takes great pictures with his favorite smartphone. What he really likes is the geo-tagging feature that documents which pictures he took at a specific place or list of places. He also compares this nifty feature with his network mapping tool that has been his best friend in this line of work—discovering, documenting, analyzing, and troubleshooting networks!


Ping, tracert, and ipconfig are still the most used commands when understanding interconnectivity. But, in this day and age when everything is delivered in an instant, using a manual method of mapping a network, be it simple or complex, looks ‘stone-age’, and can be ruthlessly time consuming.


The usual suspects – still a headache!


He then went on to explain that studying and documenting network infrastructure for his clients, especially in a time crunch, is often extremely difficult. The most common reasons for this were that the clients:

  • Didn’t know the importance of network documentation
  • Didn’t have their network mapped and documented
  • Used a network diagram they had on Visio® that didn’t match the current setup, or was out-of-date
  • Had their network mapped in multiple formats—making it extremely difficult to collate, validate, and standardize into one master document
  • Had huge gaps in network documentation—probably, an administrator left, and the person that filled-in didn’t continue with the documentation


If the above is true for a small, static network, it isn’t much of a pain. However, it becomes a major headache when a company has to merge into another network or is planning for network expansion.


For example, one of the clients that my ex-colleague worked with had a 300 node network, but the Visio® map that was provided by the client had documented only 120 of them, and was 7 months old. Since my ex-colleague had access to the network mapping software in his laptop, all he had to do was to enter the IP address range, scan and discover the network, customize the maps and then export it to Visio® and PNG, as the client wanted in both the formats.


So, what will give you a head start if you want to accomplish network documentation and mapping, the easy way?


Automate discovery – that’s exactly where you must start!


Network mapping is a three-step process in itself:

  1. Network discovery – Manual or automated—knowing what’s where
  2. Mapping – knowing how devices are connected
  3. Customization, presentation, and reporting—the way you like it


Network consultants, like my ex-colleague, would appreciate an automatic network discovery and diagram software that can automatically discover all devices in the network and map through a variety of discovery methods like SNMP, CDP, ICMP, WMI, or any other standards. For the purpose of documentation, labeling and presentation, the maps can be exported to a variety of formats, including Visio, PNG, and PDF. Exporting the maps to Visio can become handy, especially if the network engineer wants to play with the map and bring it up to his or her liking.


The last question I had for my ex-colleague was “Has mapping with whiteboards and paper become a resource drain? Should we still be doing things the same way we did a decade ago and save costs here and there? Or is it time for more companies to automate network mapping and diagramming?” Of course he believed that network consultants and engineers cannot afford to be laid back, and automating network mapping and documentation is the way to go.


How about you? If you still think whiteboard/paper or any other manual mapping still cut it, give us a holler.

Or how SolarWinds saved a company $1 million a year, monitored more than we ever expected, and made me look like a miracle worker.

(Continued from Part 1)


The company wanted to expand monitoring with their current solution to every single device in the enterprise (roughly 10,000 systems)a task that would have required an additional $2 million in extra licenses, and would have increase yearly maintenance to $1 million.


I saw it as my first order of business to stop using this expensive-to-own, expensive-to-maintain tool like peanut butterspreading it across every surface of the enterprise. For those tasks that needed to be everywhere (ping, simple hardware monitoring, etc) we needed a less expensive tool that could scale without breaking the bank.


We evaluated 6 alternatives from both open-source and traditional vendors. We found that just one came in with a solution under $800,000SolarWinds. Not only that, but in most cases, two SolarWinds modules (Network Performance Monitor and Server & Application Monitor) covered more features than the competition. But the most interesting part was the price tag: Year-one purchase price would be less than 10% of the cost to implement a solution with the incumbent vendor. Maintenance would be about 6% of the predicted cost if we kept the current solution.


These numbers caught the attention of the CIO and the purchasing team. The monitoring group was asked repeatedly if we had somehow evaluated the wrong tools, if we were ordering the wrong number of licenses (we were ordering "unlimited" so that was not the case), if we misunderstood the vendor's quote. They were unable to comprehend how the SolarWinds quote was so low. We finally had to level with them: "Maybe this is normal pricing, and you've just gotten used to being over-charged all this time?"


Once we were able to assure everyone that this was indeed the real deal, the CIO did what companies stuck in a dollar-auction mentality never can: she announced that it was time to walk away. Because the licensing terms of the old tool were so restrictive, we would transition to SolarWinds for all "foundational" monitoringavailability, hardware, and applicationand look for other tools to use in niche areas, be it real-time user experience monitoring, closed-architecture applications, or other special purposes.


The installation and migration was performednot by a team of contractors over 18 months (which is what the previous tool had required), but by existing staff, in parallel with their daily duties on the old system, in just 6 months.


On the day we turned monitoring on, it was doing more than the old systemavailability and hardware for 10,000 systems, plus WAN link monitoring and more. In the first 3 months our team was able to develop monitors that application and server owners had been requesting for years.


Monitoring has become an extremely specialized space. You can’t monitor your virtualization infrastructure the same way you monitor a router. Load Balancers require a different set of metrics than a storage array. Trying to force a tool which either wasn’t designed for the task or which hasn’t kept up with the times is not just a matter of “making do with what you have” and calling it “being budget conscious.” In fact, doing so canand for this customer DIDcost the business an order of magnitude more, as well as not providing the functionality needed.

We’ve all seen the action of Security Theater before, We walk into a building where they interrogate our identification and ask for proof of blood samples or DNA yet the loading dock is unmonitored, or we’re asked for numerous forms of identification over the phone and still allowed to let slide even without that information, or the best example; We get the complete shakedown at the airport while passing through the security checkpoints while so many other entry points go unchecked and unmonitored.


But we’re not here to discuss physical security, no.  We’re here to discuss the security theater that we see in IT organizations every day.   Security theater is masqueraded under so many masks such as from our previous discussions of “TROUBLESHOOTING VS COMPLIANCE SECURITY; LOGGING WITHOUT BORDERS? and LOGGING; WITHOUT A COMPLETE PICTURE, WHAT’S THE POINT?.     For some organizations these are merely checkboxes, but you have to ask yourself, are you safe if you don’t know who’s going through which doors at which times, and if that information is not logged?   That was the importance of the discussion of knowing what is logged, how deeply you interrogate those logs, and how long you retain that information.


A lot of us focus on Network availability, Network Resiliency thus the need to restrict and restrain what information we collect, yet just like any building, call center or airport, if all of the points of entry are not secured or at least monitored it’s all a ruse driven based upon hope and trust instead of truly securing the facility.


Protecting the security of your organization isn’t merely limited to knowing what is being logged, where it is being logged, how often it is being retained and equally knowing what information isn’t being collected or logged.   It covers numerous areas a few of which below (While not being an exhaustive list)


- Monitoring of vendor controlled and embedded devices

   Did you know that embedded devices which sit on your network which you technically have no ‘control’ over (think GE, heart rate monitors and MRI machines in hospitals for example) still need to be patched and monitored, even though you don’t really have that ‘insight’ into them?

- East West and North South traffic

   With threats constantly on the rise, we’ve done an okay job of watching traffic as it enters the Edge and allowing/disallowing it as it traverses North to South, but once the traffic is already in the network, most organizations not only have no visibility nor do they have any real awareness of what is going on.  This has become the greatest threat vector to get a grip upon.

- Actionable Threat Intelligence

   In the event of a breach, how quickly can you identify and mitigate it?  Thought of as a different way, if you have an Intrusion Detection System (IDS) and a breach does occur, what can you do about it? Knowing a criminal is inside your house does nothing if you don’t have any way to handle it once breached.



It’s a shame there is no one panacea to resolve these kinds of threats, risks and dangers which plague our environments every single day.

What are some other areas you see IT Departments practicing security theater?   Do these examples resonate with you, and what do you think we can do about it?

What a wonderful week of virtualization goodness! It started out with a webinar covering three skills (monitoring, troubleshooting, and reporting) that every virtualization admin needs to hone and combine with one tool to become the master of their virtualized universe. The recorded webinar including short presentation, demo, and Q&A is below. And it starts at the demo portion.



Next, I was humbled and honored to be formally announced as the Virtualization Head Geek including my own presser. The icing on top of the cake was hosting the Tech Field Day crew and the delegates for Virtualization Field Day 4 at the SolarWinds campus. Absolutely loved the conversations. Below is a side-by-side P2V of the Virtualization Field Day 4 folks.




[UPDATED VFD4 delegate's posts]:

VFD4 DelegateBlog Post
Amit Panchal How SolarWinds aims to offer a simple perspective - VFD4


Virtually yours,


The New Head Geek of Virtualization

or “It’s not just the cost of buying the puppy, it’s the cost of feeding the puppy that adds up”


This is a story I've told out loud many times, on conference calls, and around water coolers. But I've never written it down fully until now. This is the story of how using the wrong tool for the job can cost a company so much money it boggles the mind. It's a story I've witnessed more than once in my career, and heard anecdotally from colleagues over a dozen times.


Before I go into the details, I want to offer my thoughts on how companies get into this situation.


The discipline of monitoring has existed since that first server came online and someone wanted to know if it was "still up." And sophisticated tools to perform monitoring have been around for over two decades, often being implemented in a "for the first time" manner at most companies. Some of it has to do with inexperience. For example, either the monitoring team is young/new and hasn't experienced monitoring at other companies, or the company itself is new and has just grown to the point where it needs it. Or there's been sufficient turn-over, such that the people on the job now are so removed from those that implemented the previous system, that for all intents and purposes the solution or situation at hand is effectively "new."


In those cases, organizations end up buying the wrong tool because they simply don't have the experience to know what the right one is. Or more to the point…the right ONES. Because monitoring in all but the smallest organizations is a heterogeneous affair. There is no one-stop shop, no one-size-fits-all solution.


But that's only part of it. In many cases, the cost of monitoring has bloated beyond all reason due to the effect known as "a dollar auction". Simply put, the barrier to using better tools is the unwillingness to walk away from all the money sunk into purchasing, deploying, developing, and maintaining the first.


And that leads me back to my story. A company hired me to improve their monitoring. Five years earlier, they had invested in a monitoring solution from one of the "big three" solution providers. Implementing that solution took 18 months and 5 contractors (at a cost of $1 million in contractor costs, plus $1.5million for the actual software and hardware). After that, a team of 9 employees supported the solutionsetting up new monitors and alerts, installing patches, and just keeping the toolset up and running. Aside from the staff cost, the company paid about $800,000 a year in maintenance.


With this solution they were able to monitor most of the 6,000 servers in the environmenta blend of windows, Unix, Linux, and AS400 systems; and they could perform up/down (ping) monitoring for the 4,000 network devices. But they encountered serious limitations monitoring network hardware, non-routable interfaces, and other elements.


Meanwhile, the server and application monitoring inventorythe actual monitors, reports, triggers, and scriptsshowed signs of extreme "bloat." They had over 7,000 individual monitoring situations, and around 3,000 alert triggers.


This was the first company where the monitoring and network teams weren't practically best friends and even the server monitoring was showing signs of strain. Some applications weren't well-monitored either because the team was unfamiliar with it, or because the tool couldn't get the data needed.


Part of the problem, as I mentioned earlier, was that the company had invested a lot in the tool, and wanted to "get their money's worth." So they attempted to implement it everywhere, even in situations where it was less than optimal. Because it was shoehorned into awkward situations, the monitoring team spent inordinate amounts of time not only making it fit, but keeping it from breaking.


KEY IDEA: You don't get your money's worth out of an expensive tool by putting it into as many places as you can, thereby making it more expensive. You get your money's worth by using each tool in a way that maximizes the things it does well and avoids the things it does not do well.


NOTE: This is a continuation of my Cost of Monitoring series. The first installment can be found here: The Cost of (not) Monitoring

Stay tuned for part 2, which I will post on January 20th to see how we resolved this situation.


edit LJA20150116: forgot to include the link to explain a dollar auction.

Filter Blog

By date:
By tag: