We're starting to leverage agents as part our testing for cloud-based server monitoring (less holes in the firewalls = better security, right?) and when you go through the agent add process it says that agent initiated is the preferred communication method. Any ideas why? We are leaning towards server initiated as it makes the security types less worried when comms from external servers are going out from our network rather than listening for inbound comms.
Check out the /Orion/AgentManagement/Admin/DownloadAgent.aspx page to see what I mean.
Tagging aLTeReGo as the guru of all things agent.
I'm a bit late to the party jbiggley (as I'm actually searching for another agent related issue) however they are also best in a NAT'd environment.
Due to some segments having overlapping IPs and 1:1 NATs in place for our network, the agents are still able to perform.
After some chatting with the UX team and here on Thwack, I've come to the conclusion that agents are pretty great -- in the right circumstances. We are going to stick with our SNMP and/or WMI stance for now, but when needed we'll be using agents. We're also defaulting to server-initiated and have communicated to the SolarWinds devs that making the assumption of agent-initiated comms is not a security best practice. It means that you have to listen for those comms in some sort of edge zone. We've opted to default to server-initiated as an extra layer of security.
I know some features and process flows were designed around agents self-registering. Hopefully that won't be a sticking point for SAM going forward.
I wish I could say we would miss you when you were gone but we won't.
I've used lots of solutions over the years and if I've learned anything it is that the person implementing it is as much at fault for the outcomes as is the product. No product, not even the SolarWinds Orion platform, is perfect. The more I push and stretch and hammer away at the products the more things I discover that make me go "hmm" as well as "oh, heck yah that's awesome!" Somehow I manage to make things work. And not just work in isolation but work at a very grand scale. We interface with other platforms, we deliver a critical service for thousands of co-workers who depend on us to help them help our customers, and we do it with relatively little pain given all that we are asked to do.
I've read your vitriol time and time again. If you don't like the product(s) then get involved with the beta tests, the UI/UX reviews, offer suggestions on how to fix the problems. I've been a SolarWinds customer for nearly a decade now. I've built and implemented systems of all sorts of sizes and shapes and I support a platform that is on the scale of ridiculous for a single instance. I can promise you that I don't hold back when I talk with the folks at SolarWinds. Whether it is product concerns, process concerns, or just concerns about the messaging in general, they get to hear it all. And the best part? THEY LISTEN! They are collaborative and eager to support us. They want to work with us because they understand that their customers help them make a better product. Since there is no perfect customer there is no perfect product but I can tell you that it is a pretty powerful combination when folks come together to try and solve a problem.
Sorry that you aren't able to find value in any of the SolarWinds products. I expect that means you will be promptly uninstalling the code and surrendering your licenses. It sounds like you have another solution that is already doing the job for you so why bother to run two platforms? Please let us know when you've removed all of your SolarWinds products. I am sure we can help you find a non-profit or charity that would appreciate a donation of those licenses. Heck, I'd be glad to take them off your hands for my lab environment.
Although, I expect the reality is that monitoring is an essential service to you and your employer/clients and shutting off a monitoring environment, regardless of your personal opinions on its completeness, would be a resume generating event. Hopefully you can start to contribute solutions to help the Community of Practice for Enterprise Monitoring. This 'rant on repeat' is getting old.
Yes in some environments you only have so many ways to deal with Monitoring. As an example those familiar with the PURDUE model, you can not hop a level. And you are often confronted in those levels of multiple domains, not part of a forest, stand alone servers....etc
So if you have the 4 levels, and need to move Alerting to your DMZ you face the challenge how do I get alerting from level 1? 2? I can use Windows Event forwarding, and send those to a server in the level. And then that server up to the next level and so on?
Or as encrypted 1 way communication then that is fine. But then I need an agent.
Unless others on this board have any other ideas on how to move events with out violating the restrictions of the PURDUE model.
Event forwarding is definitely the way to go when you have distinct domains such as are suggested by the PERA model. (Purdue Enterprise Reference Architecture - Wikipedia ) I'm not intimately familiar with the specifics of this model but I do remember having to support a manufacturing environment, for which PERA is aptly designed, and the challenges that go along with air-gapped systems, multiple networks, etc.
The agent is a great way to meet those restrictions for the reasons I outlined. Definitely feels like 1998 all of again though. Agents are now in vogue again!
I have to say in my years of experience I am not a fan of agent-based monitoring. I use Solarwinds today for NPM/SAM/etc. and I use SAP's Solution Manager for my SAP servers/applications. SolMan is agent-based. The administration and overhead for my SolMan agents is exponentially higher. Granted, I get tons more data back, more than I ever wanted or could do with, but the technical debt I pay for agent-based monitoring is much higher.
I've been running the SW agent in my env for 3 months now across 100 servers so far. I haven't had a single agent issue of any kind. Everybody is against agents until they fully understand the benefit.
We use server initiated polling for our cloud based monitoring due to the same concerns you had regarding the direction of the initiated traffic. However, I have noticed that the server initiated agents have a tendency for polling problems (such as application status going to "unknown"). Restarting the Orion services on the central poller, or restarting the central poller altogether, is the only thing that fixes it sometimes.
Not yet, as I'm actually leaning towards it being the way in which the servers are configured that's making the issue worse. It's also pretty sporadic, so by the time I log a ticket and actually get a support person to talk to the issue might be gone. I should open a case the next time it happens though.
chadsikorra, what version of SAM/NPM are you running? Also, please do open a case with support for this issue. You don't need to have someone on the phone with you when it's happening but when it is occurring, please grab a Diagnostic from the Orion server (or Additional Poller the Agent is associated with) and the log files from the agent. E.G. zip up the entire C:\ProgramData\SolarWinds" directory on the agent machine if it's lost connection to Orion. When the agent is connected you can simply 'edit' the agent from the Agent Management grid and download the remote Agent log files through the Orion web interface. These logs are what we'll need to better understand what's happening.
Apologies, I missed the question at the beginning of this. We are using SAM 6.2.3 and NPM 11.5.3. I was waiting to do a major upgrade of all the modules once the new SAM 6.3 is released. Anyway, the issue happened again on one of our hosted VMs so I opened case # 1050599 with some details about what I'm seeing in the logs. It's currently in a state where I can't even start the SolarWinds agent anymore, and unfortunately I can't reboot this server at the moment.
WIthout knowing more about the issue it's hard to diagnose, but if you need immediate resolution I would recommend uninstalling and re-installing the agent to see if that resolves the issue.
We were forced to reboot the server this morning (for unrelated issues). After the reboot the agent started working again. Is there some more debugging I can enable on the agent side so that we get some more useful information for when the problem does happen again? I can't find any KBs regarding how to adjust the logging level for the agent.
Yeah, please do. Just grab the diagnostics from your primary poller and, if you have additional polling engine, from the poller that the offending nodes are assigned to even before you get a support agent. They will ask for them anyway so you might as well have them on hand anyway. If you grab them quickly you just need the last 24 hours or so.
Very curious if this is a systemic issue or a server/network issue. I suspect the latter...but I've been wrong once today already!
Josh, there is no single 'right' answer here. Agent initiated is the easiest on end-users because it allows for automatic agent registration and node creation. That is why it's the default and the recommended mode. We added 'Passive' mode because we knew that not all environments or situations would allow for agent initiated communication.
Perfect. That was the answer I've been giving folks (or some form of that) and just wanted to confirm. Some of our other admins questioned my choice to not use the "recommended" method
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.