i'm using the NPM 12.4 version. On daily basis i'm getting the bulk alerts of packet loss and URL down and as confirmed with user there is no problem observed at their end. i have checked the alert condition which is all correctly set. Please help to understand this and resolution?
thanks & regards
On my environment we had too many false positives as well and this stopped after I adjusted the trigger conditions on my alerts and check the box Conditions must exist for XX minutes, in most of my cases my checks runs every 15 minutes so for us 30 minutes means 2 attempts ( I assume)
Just adjust this to you environment!
Hard to know whats happening in your specific environment, but if it were mine I typically work through a process roughly like the below:
How do you confirm that these are false positives? At the time when the alerts are happening do you log into the polling engine and test if the url can be reached or not? Small to medium amounts of packet loss are not especially obvious to web users beyond the page being a little slow since browsers are usually smart enough to retry things if they don't get the responses they expected. End user complaints are among the least reliable way to validate what's really happening on a web server. If a website is bad people rarely raise tickets, they just assume that's normal and suffer through it, or if they don't need whats on that site very badly they just leave to go somewhere else.
You are getting packet loss alerts. As in the solarwinds poller is sending out ICMP echos to "somewhere" and not consistently getting a reply.
Is there a single node that is especially problematic or does the problem manifest on several nodes?
If there are several, do those nodes have anything in common in terms of network paths, subnets, types of hardware?
If it's just one, what makes that device special? Can I use any of my other nodes a sort of reference point to compare it against?
Can I replicate the issues? if I run a continuous ping against the IP from my workstation for a day and log the output to a file does it ever drop packets? What about a continuous ping from the polling engine?
Does my test match or differ from what Solarwinds is telling me? If i see that there are intermittent ping losses from my workstation as well then that's a good indicator that the alert is valid and I need to figure out what is causing that node to lose pings. If my workstation shows no losses then that's likely pointing me toward a problem with my polling engine, is it overloaded? Are there errors in my application logs? Is it having issues writing to the database? If I have multiple polling engines does the problem go away if I move the node to another poller or does it persist across all of them?
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.