If you were a Star Trek fan like myself growing up, you might remember a funny device called the Heisenberg Compensator. It was used when transporting people to account for the affects of a theory called the uncertainty principle, which basically states that the more precisely you try to determine something, the less precise it becomes. OK - the geeky mechanics lesson is over for now.
In the field of Information Technology, this is commonly extrapolated as the Observer Effect. In essence, when we try to monitor and observe servers, networks, and applications, we are changing the results. This makes sense, as it does take some network bandwidth, CPU cycles, and other resources to effectively monitor an environment. However, the impact of this is highly dependent on where the monitoring takes place. Which brings me to an agent vs agentless question.
What are your thoughts on using agents for monitoring your environment? The typical argument goes something like this: Agents provide richer data streams while adding load to the application, while agentless can only pull limited data (based on the API and security access permissions) but can isolate the load to an external resource and do not impact the application as severely. Has this been your mindset / experience, or have you encountered the opposite ... or something in between?
All excellent points raised above.
From a slightly different angle we are having to go with all Agents for monitoring our servers, the networking kit will be using SNMP v3.
We have chosen Agent based polling as a whole due to a resent audit, where it was found that the account SolarWinds was using for Polling was an administrator on all machines. This is indeed needed to ensure it can collect the data that we need to ensure thorough monitoring. The problem is, the credentials are cached locally on the servers after authentication and these can be exploited by would be attackers.
So, no to WMI.
SNMP - this is becoming an issue too, Microsoft are deprecating the SNMP service from 2012 and there is no native support for V3, which we would need for security purposes.
The Agent therefore for us is the best option. It utilises less resources on the monitored node by comparison to WMI and is so much more secure than the other methods. My concern is as you have all described with it being the ideal candidate for finger pointing. But, in another company where I have used this and used SNMP polling it was still blamed for bandwidth use. It was never the problem, but was definitely first to be blamed.
I'm a fan of hyrid. WMI is incredibly leaky (I currently deploy 8 WMI related hotfixes to my windows estate) and so ageltess indirectly leaves a big memory footprint in my opinion. If you want the network QOE stuff also you need the agent. WinRM/PSRemoting has a security issues in terms of granular delegation which causes trouble with security minded folks
That said, agents require reboots sometimes and thats no good!
With agent monitoring you also can run into the issue of mult-platform compatibility. Some enviroments run several versions of hardware/software. I've seen environments ranging from Win2k, Win2k3, Win2k8, WinNT (yes, one WinNT server), multiple versions of Ubuntu, RHEL, Debian, Solaris, et. al. Getting agents to work properly across all these platforms can be a pain. Personally I am in the agentless camp - not just because of compatibility issues, but also the aforementioned resource requirements. Unfortunately some tools require the use of agents. I'm not completely anti-agent, especially when the agent plays nice, but given the choice, I like to utilize SNMP, WMI (sometimes), etc.
When polling with SNMP or WMI, UDP is commonly used. With the remote agents, a long lived TCP is used. While the agent uses less bandwidth than WMI, my WAN accelerator doesn't show the UDP traffic in connection views for the obvious reasons. What this means is that while the agents take less bandwidth than WMI, they will show up in the connection views as having the largest amount of bandwidth. This means that the agents are ideal candidates for fingers pointing.
I would love for the agents to have an option I could set which would allow the connections to end and reconnect on a schedule that I set. In my case, every 30 minutes. This would remove those single Agent connections with 7Gb of traffic over many days from ever showing up in the connection views.
There are times that appearance is more important than function.
As a service provider we have found that our customers generally don't want us installing agents on their systems which is one of the main reasons we choose Orion as our monitoring solution. In working with other products we have experienced just about all of the issues related to agents that have already been mentioned here by others. We have also found that some vendors are slow to update their agents to work on newer operating systems.
Ultimately we have found an agentless solution more flexible and more favorable with our customers.
We actually moved from Tivoli to Orion recently. Tivoli was agent based. From a cost perspective, IBM charges maintenance and a fee for each agent, therefore there was a decision to be made with each purchase. That left us with only monitoring 1/3 of our servers.
A big positive of the agent was no need to "login" every two minutes to look at a log, or check processes etc. The agent never lost connection, and the agent was very easy to setup for logs and text file monitoring in real time.
Orion or agentless technology has been great due to the freedom it gives. I am not limited by what the agent can do. I do not have to wait to upgrade, or test new versions. We have over 700 servers and testing before you use against all those application would be impossible.
Cost, innovation and freedom are we moved to Orion.... and we have few regrets (Unix log monitoring 😞 )
I think the only reason to have an agent would be to accomplish some type of self healing without relying on the remote server to do it, or make it easier to monitor through a firewall. I think the hybrid approach is best course for a software company because you don't limit anyone and you can just stand back and say, yeah we can do the way you want, but we'll leave you to argue about which way is better.
I'm a fan of the hybrid offering model. There are use cases for both agent and agentless. I know that typically an agent may be required if there is a security concern (such as a policy that prevents opening ports) or WAN bandwidth (where the collector is at the main site and wants to monitor a remote site).
I hope that my input will be found valuable even though I must admit I am not a Star Trek fan.
The use of agents for monitoring should always be avoided at any price. Adding an agent complicates things in that you need to understand what the agent does, how it runs, what are its requirements, how to fix it if it crashes and how to update it. These are just 5 basic things out of a possible <enter number here> variables. Add that agent on 5 systems and you have 5 agents you need to run and 5x5 "variables" that you need to be prepared for, which add up to a total of 30 possible factors that may and eventually WILL need addressing at some point and likely very often if the agent hasn't been coded to be self healing or the listener server doesn't care if the agent hasn't reported back with a status and won't report on it.
Apart from the above mentioned implications that you need to be prepared for - before even deploying an agent, there's the resource consumption bit that you need to pay attention as well. You can't expect to deploy an agent and think that its resource footprint is so tiny that it wouldn't matter to sacrifice extra CPU cycles, extra memory pool and what else gets affected by running a process with several handles. Even if it only required a handle to run once every hour then that would be one handle too much as the operating system needs to stop what its doing, listen for the annoying little call from that agent, address requirements and dependencies <a whole set of other things> then make room in line for that little agent to run whatever needs to be run. All this has a greater effect on the system since said agent is making calls and requirements to other system resources and handles that make it possible for the agent to function.
Lastly, and in the end, agent.exe would have increased running system resource consumption every hour for that one task it has to run. If it crashes you'd have to stop what you're doing to fix it. In my eyes that is not an acceptable tradeoff no matter how much more data i can get by using an agent. That has always been a problem since resources (system memory and CPU) were things you had to keep a close eye on and avoid "spillage". This has been highlighted by the trends nowadays, what with cost savings and running everything virtual to save on expenses, mainly the running ones that won't go away: administration and maintenance. Man is a resource too, but one that should be put to use more efficiently.
Death to agents!
I agree with most of what Deltona has said but I will add one point. Agents get blamed for everything. As long as they are there they are a suspect for every potential issue that occurs on a box regardless of how big or small.
Agents do tend to get the brunt of the blame. One of the interesting trends we've seen over time is the proliferation of virtualization vs. the cost of hardware. For a while, we went along with single server = single piece of hardware and hardware started to get super cheap! So awesome! Ahh, those were the days. There was no point in operating your triple channel backplane with 3GB RAM, your minimum was at least 6GB, if not 24GB - big jump for low cost. The OS ran smooth, performance was a non-issue (usually, though it did encourage running multi-purpose systems to make use of hardware).
Enter.......... virtualization. Now I can maximize that hardware by running a NUMBER of servers on the same hardware, I just have to give it more disk (and disk is relatively cheap, especially compared to 5 years ago). Resource pooling & sharing combine with a tendency to experiment - it's really hard to measure EXACTLY how much CPU or RAM any application/system needs, so it's a guess that depends a lot on your environment and observation; now you can see how Virtualization Manager was born . Anyway, in this situation, every piece of critical vs. non-critical software running on a system comes back into light, and agents can become an issue where they weren't before.
Purists always hated agents, but now there's a performance reason, too. In my experience, systems that are running right on the edge (in other words, effectively maximizing cost vs. performance) can have their balance tipped by an agent. Thankfully, resources are still relatively plentiful in a lot of environments, but who wants to be the guy (or gal) that has to go justify more resources because of an agent? Now your whole value proposition is called into question.
There's also a bit of "six of one, half-dozen of the other" (maybe more dozen of one, baker's dozen of another) - in the agentless world, you're counting on a single well-provisioned system to monitor X number of systems, vs. distributing the load X times to individual systems. The data stream might not be as rich and technically you are still paying for those resources, they are just centralized. Chances are the X resources ONCE is a lower cost than the resources X times, though - whether that justifies the X number of resources is dependent on a lot of things.
There absolutely does need to be value in the agent in order to justify it. If vendors don't have a solid justification for using an agent, and/or aren't moving toward hybrid options, we just aren't doing our jobs.
One of my favorite Transporter quotes is from TNG (slightly blasphemous to cross references from TOS with TNG, but...).
Picard: "If this hadn't worked, it would have been necessary to beam your energy into empty space…"
Pulaski: "… and spread my atoms across the galaxy!"
Picard: "Yes, I'm sorry, it…"
Pulaski: "No, no, don't be sorry. Every time I get into the damn thing I'm convinced that's what's going to happen."
Beam me up
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.