I hope that my input will be found valuable even though I must admit I am not a Star Trek fan.
The use of agents for monitoring should always be avoided at any price. Adding an agent complicates things in that you need to understand what the agent does, how it runs, what are its requirements, how to fix it if it crashes and how to update it. These are just 5 basic things out of a possible <enter number here> variables. Add that agent on 5 systems and you have 5 agents you need to run and 5x5 "variables" that you need to be prepared for, which add up to a total of 30 possible factors that may and eventually WILL need addressing at some point and likely very often if the agent hasn't been coded to be self healing or the listener server doesn't care if the agent hasn't reported back with a status and won't report on it.
Apart from the above mentioned implications that you need to be prepared for - before even deploying an agent, there's the resource consumption bit that you need to pay attention as well. You can't expect to deploy an agent and think that its resource footprint is so tiny that it wouldn't matter to sacrifice extra CPU cycles, extra memory pool and what else gets affected by running a process with several handles. Even if it only required a handle to run once every hour then that would be one handle too much as the operating system needs to stop what its doing, listen for the annoying little call from that agent, address requirements and dependencies <a whole set of other things> then make room in line for that little agent to run whatever needs to be run. All this has a greater effect on the system since said agent is making calls and requirements to other system resources and handles that make it possible for the agent to function.
Lastly, and in the end, agent.exe would have increased running system resource consumption every hour for that one task it has to run. If it crashes you'd have to stop what you're doing to fix it. In my eyes that is not an acceptable tradeoff no matter how much more data i can get by using an agent. That has always been a problem since resources (system memory and CPU) were things you had to keep a close eye on and avoid "spillage". This has been highlighted by the trends nowadays, what with cost savings and running everything virtual to save on expenses, mainly the running ones that won't go away: administration and maintenance. Man is a resource too, but one that should be put to use more efficiently.
Death to agents!
Agents do tend to get the brunt of the blame. One of the interesting trends we've seen over time is the proliferation of virtualization vs. the cost of hardware. For a while, we went along with single server = single piece of hardware and hardware started to get super cheap! So awesome! Ahh, those were the days. There was no point in operating your triple channel backplane with 3GB RAM, your minimum was at least 6GB, if not 24GB - big jump for low cost. The OS ran smooth, performance was a non-issue (usually, though it did encourage running multi-purpose systems to make use of hardware).
Enter.......... virtualization. Now I can maximize that hardware by running a NUMBER of servers on the same hardware, I just have to give it more disk (and disk is relatively cheap, especially compared to 5 years ago). Resource pooling & sharing combine with a tendency to experiment - it's really hard to measure EXACTLY how much CPU or RAM any application/system needs, so it's a guess that depends a lot on your environment and observation; now you can see how Virtualization Manager was born . Anyway, in this situation, every piece of critical vs. non-critical software running on a system comes back into light, and agents can become an issue where they weren't before.
Purists always hated agents, but now there's a performance reason, too. In my experience, systems that are running right on the edge (in other words, effectively maximizing cost vs. performance) can have their balance tipped by an agent. Thankfully, resources are still relatively plentiful in a lot of environments, but who wants to be the guy (or gal) that has to go justify more resources because of an agent? Now your whole value proposition is called into question.
There's also a bit of "six of one, half-dozen of the other" (maybe more dozen of one, baker's dozen of another) - in the agentless world, you're counting on a single well-provisioned system to monitor X number of systems, vs. distributing the load X times to individual systems. The data stream might not be as rich and technically you are still paying for those resources, they are just centralized. Chances are the X resources ONCE is a lower cost than the resources X times, though - whether that justifies the X number of resources is dependent on a lot of things.
There absolutely does need to be value in the agent in order to justify it. If vendors don't have a solid justification for using an agent, and/or aren't moving toward hybrid options, we just aren't doing our jobs.
One of my favorite Transporter quotes is from TNG (slightly blasphemous to cross references from TOS with TNG, but...).
Picard: "If this hadn't worked, it would have been necessary to beam your energy into empty space…"
Pulaski: "… and spread my atoms across the galaxy!"
Picard: "Yes, I'm sorry, it…"
Pulaski: "No, no, don't be sorry. Every time I get into the damn thing I'm convinced that's what's going to happen."
Beam me up
I think the only reason to have an agent would be to accomplish some type of self healing without relying on the remote server to do it, or make it easier to monitor through a firewall. I think the hybrid approach is best course for a software company because you don't limit anyone and you can just stand back and say, yeah we can do the way you want, but we'll leave you to argue about which way is better.
I'm a fan of the hybrid offering model. There are use cases for both agent and agentless. I know that typically an agent may be required if there is a security concern (such as a policy that prevents opening ports) or WAN bandwidth (where the collector is at the main site and wants to monitor a remote site).
As a service provider we have found that our customers generally don't want us installing agents on their systems which is one of the main reasons we choose Orion as our monitoring solution. In working with other products we have experienced just about all of the issues related to agents that have already been mentioned here by others. We have also found that some vendors are slow to update their agents to work on newer operating systems.
Ultimately we have found an agentless solution more flexible and more favorable with our customers.
We actually moved from Tivoli to Orion recently. Tivoli was agent based. From a cost perspective, IBM charges maintenance and a fee for each agent, therefore there was a decision to be made with each purchase. That left us with only monitoring 1/3 of our servers.
A big positive of the agent was no need to "login" every two minutes to look at a log, or check processes etc. The agent never lost connection, and the agent was very easy to setup for logs and text file monitoring in real time.
Orion or agentless technology has been great due to the freedom it gives. I am not limited by what the agent can do. I do not have to wait to upgrade, or test new versions. We have over 700 servers and testing before you use against all those application would be impossible.
Cost, innovation and freedom are we moved to Orion.... and we have few regrets (Unix log monitoring :-( )
When polling with SNMP or WMI, UDP is commonly used. With the remote agents, a long lived TCP is used. While the agent uses less bandwidth than WMI, my WAN accelerator doesn't show the UDP traffic in connection views for the obvious reasons. What this means is that while the agents take less bandwidth than WMI, they will show up in the connection views as having the largest amount of bandwidth. This means that the agents are ideal candidates for fingers pointing.
I would love for the agents to have an option I could set which would allow the connections to end and reconnect on a schedule that I set. In my case, every 30 minutes. This would remove those single Agent connections with 7Gb of traffic over many days from ever showing up in the connection views.
There are times that appearance is more important than function.