We have an advanced alert configured to email an on-call phone when a critical device reboots. We had a core switch reboot and Orion detected this - I see it in the event log, but we did not receive an email. Support is telling me that the email was never sent since the SMTP server couldn't be reached at the time. Obviously it couldn't be reached - the switch that Orion is connected to was down! I ask for a bug to be filed and instead I am told that a feature request will be submitted? Seriously - a feature request to ensure that my alerts come through? I thought I had already paid for a system that emailed me when my configured alerts were triggered? How is anyone supposed to depend on this product? Is it not intended for production use? Is anyone else ok with this? Do people use a different monitoring product when dependability on alert delivery is high?
"After reviewing your logs it looks like the culprit was a momentary connection issue with the database as the e-mail was supposed to be sent. The Alert log records the e-mail sent message however there is no success or failure which always follows to assist in troubleshooting these kinds of problems. When looking in to the Alerting service itself there is a DB timeout issue between 11:22 and 11:24. Right at 11:24 we can see the e-mail was not truly sent"
"At this same time the data processor also cannot connect to the database so the loss in connection across the board caused the problem."
"Your polling data gets cached in the MSMQ until it is successfully sent to the database however e-mail alerts are not stored until they are successfully sent to the SMTP server. The software tried to send it, could not reach the SMTP server and moved on to the next task. Additionally it tried to write a failure to the database in the AlertLog, could not reach it and then moved on to the next task. That is how the alerting service works.
I will submit this as requesting that e-mail alerts become cached much in the same way that the polling data does for redundancy. "
Frustrated and disappointed.
FWIW - currently running NPM 10.4.2
One possible workaround is to install the SMTP Server feature on your Windows server. Configure SolarWinds to email itself (127.0.0.1) and configure the SMTP feature to act as a relay sending email to your actual email server. The amount of retries, connection timeouts, etc can be configured on the SMTP relay. Depending on your primary email server configuration, you may need to configure it to accept emails sent from the relay.
In your above scenario, SW would have sent the email to the mail relay, the mail relay would not have been able to send the email immediately but would keep trying periodically until it was sent or deleted (again based on timeouts/resend attempts).
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.