cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 11

Alerting is unreliable, for us, in latest Solarwinds

Alerting has been flaky since the original update to the latest, and is still unreliable, lasts just a few days.  It's been rock solid until recently.

I originally had this in case # 00118398 after upgrading to the latest, I still have the problem despite having the latest hotfix.

Most other issued are resolved or getting there, but alerting silently dies after a while.

New case # 00134147 raised for this.  In AdministrativeService logs it show errors.  These persist, not sure if they are related.  It seems the service cannot communicate with the other pollers / web console.   I will restart those to try to clear the errors.

To resolve we restart Information Service v3 and Alerting Service v2 - that fixes it - not sure if you need to do both, but definitely Information Service v3.

I get alerting from all sorts of other systems so I didn't notice for a day or so.

In view of the service still being unreliable, for us, I'm about to instigate daily restarts of all the web services.   Looking at the dependencies it doesn't seem they have any, so you should be able to restart them all in any order.

There is a possibiliy nobody else, or very few, have this problem.   We have some alerts which previous versions of Solarwinds didn't correctly upgrade - you can see them as having "Complex Condition" turned on, and those need to be rewritten.   But whether it is those which is causing the service to enter a faulted state I'm not sure.

Is anybody else having a problem with alerting going silent ?

0 Kudos
10 Replies
Level 8

We've experienced the same issue for the last 3-4 weeks since we ugraded NPM and SAM. Alert actions would just die after 1-2 days without any warning. We were resorting to restarting all the services to resolve it but then it died again a day later

After the latest Orion hotfix HF3 was applied, plus the port exhaustion fix we managed 10 days uptime for alert actions. Unfortunately this just died again yesterday.

Now waiting for the next "fix"

Level 9

Management's sense of humor is starting to falter.  12.3 is a P.O.S. when it comes to stability,  With no indication at all, it just stops working.  Key services running but no alert processing.  since I first heard of flakyness I let it brew figuring SW would fix stuff.  The original post is now "Locked" which is marketing's way of say, "Please dont embarrass us with facts.

Look, no I havent opened a ticket.  B/c when it breaks I am told to reboot the whole system and get it working again.  SW knows what the issues  - FIX IT.  I'm getting a whole lot of "Can we jsut change back to Intermapper - it never broke."  IM starting to agree.

0 Kudos

I'm thinking compensation is now due as this has been doing on for far too long.  It is totally unacceptable to use us as beta-testers on production systems, when we get the flak.

A free copy of the Engineer's Toolset might be acceptable for all the hard graft and effort and sleepless nights we're endured.

Solarwinds is a good product, I like it a lot, but equally this last upgrade was botched.  Not the end of the world - which manufacturer hasn't had any botched upgrades.  The good thing is we are getting nearer to the stability we once enjoyed, and that which is expected and demanded.

Ever since we got bitten with the NPM 11.5 upgrade issues, we always wait at least a few weeks prior to upgrading to the latest release, while reviewing the Thwack forums for possible horror stories from the early adopters.

We are experiencing similar issues with our environment since the last update.

0 Kudos
Level 11

Spoke too soon, our system is down again.

pastedImage_0.png

0 Kudos

Did you lose your SQLBrowser Service? or need to restart SQLServer in this case of that error?  That named pipe error:40 could also be related to communication issues/ports not being open. I have seen it happen both ways (FW/Comm issue, and SQL Browser being stopped)  Did you migrate to a new SQL server?  If the Info Service / Alerting is just stopping, what sort of errors in your logs are you seeing? 

0 Kudos

In the end our SQL really didn't become unresponsive - which is the first time ever that server has had this issue.

I went to SQL 2016 SP2 because it was required for NTA migration.   About 7 days ago Microsoft released SQL SP2 CU1 - which includes a fix for parallelism, and I've applied that.   Touch wood and all that.   But I have been adjusting all sorts of things to get this beast to stay upright.

not noticed an alerting problem on mine but have you seen the fix for this issue Solarwinds is now horribly unstable. might be worth doing as they are going to be applying/fixing this is in a future release

0 Kudos

yes, I was one of the first to turn back from streamed to buffered, and that did seem to help, but didn't solve all the problems.

I currently have 9 outstanding tickets open with Solarwinds for all sorts of problems.  I have opened more tickets in the last 4-5 weeks that in the last 3 years.   This has been a nightmare upgrade.

0 Kudos