Comments
-
We are running 2023.2.1, would "not" advise upgrading to this version, time will tell with 2023.3.
-
Dang, we are seeing these errors as well with 2023.2.1 Our Case Number is: 01415926 For us the biggest result is the Alert Service stalls, i.e., stops processing alerts. We have made several modifications, some of which helped a lot, some not so much so (that we can see). Waiting on the results from a meeting between our…
-
Go ahead you trendsetter you.
-
Keep an eye on the JobEngine and the Alerting Service/Engine on your main Poller, just a suggestion.
-
Initially it was around 4:00am every morning, we found a unmanage task that I built in 2016 did "not" get properly moved to the new poller 1 server when we upgraded. It was slamming us, we recreated the job and thought that this event WAS the root cause, for almost a week we were solid, then this last weekend, here it…
-
HA environment, 10 active, 10 DR. All production pollers (active), same network segment 10gb links, database on same network. New primary (physical) Dell PowerEdge MX750c, Dual Proc, 32 cores, 64 logical processors running on average processing at 13 to 18% consumption, 128gb of memory, all SSD’s, 10gb network connection.…
-
I wish I could give you 42 more ^ (likes), rofl.....
-
bharris1, if you are stable and you stated you are, suggestion > HOLD We are having an issue with the Alerting Engine where it just stops processing, (lights out). I wonder, if the JobEngine is the source, does it feed the alerting table, idk, yet ~~ We were solid at 2020.2.6 HF5, upgraded to 2023.2.1, and here we…
-
WE just set this up in our environment (yesterday). We are letting it cook over the weekend. I will see if we encounter the same behavior, if we do, I will update you and once we do find a solution and pass it along. So far, it looks good.
-
So do I.
-
I have the same question.
-
Hey John, for us it was a development provided hotfix that solved the problem. (I am not sure if the fix was included in the GA 12.4x release). Fortunately for us there was one other customer that saw the problem before us and had already engaged development where development had already built and was validating a fix. The…
-
we put in CORE-12365 today aLTeReGo, it worked for RabbitMQ. We still have an issue with MSMQ, one single queue (but we are killing it, meaning running a lot of messages through it), but we will continue to work with A. and get that one going our way as well. >…
-
We put in a UDT buddy drop today (provided by dev for us and what we are seeing with UDT) aLTeReGo; did not fix it, still queuing on both the MSMQ and RabbitMQ. I will tell A., tomorrow about the CORE-12365 and see if that issue tracks with what we are seeing here. Thanks for the response. niccat (We just upgraded to…
-
The issue was with the 3rd party set of DLL's SAM uses to SSH to linux nodes WOD > https://www.weonlydo.com/SSH/ssh-activex-component.asp > WeOnlyDo.Client.SSH.dll > WeOnlyDo.Client.SSH.FIPS.dll The issue was identified and fixed in SAM 2020.2 JIRA 00431006 SAM 2020.2 Release…
-
Resolved - Our issue was self inflicted, it was a configuration issue. If you are having issues with transactions going unknown, look closely at your players and how they are configured :-)
-
Oh and we have you beat nickcat, we saw the RabbitMQ/CortexEvents queue get to 19 million, still syslog, traps, events, alerts, polling completion rate, database syncs, all solid, no apparent issues, it's crazy. Go Dell, go physical, a little FusionIO on the backend helps a lot to :-)
-
Envision having to run through the configuration wizard for EACH Poller for EACH product. FYI, do NOT run the web optimization portion of the configuration wizard if you have multiple pollers until the LAST poller, rant over.....
-
OK you can reference SW15358 if need be, but there is a "Buddy Drop" for this problem: Core-2016.1-BD-CUST-18255 Apparently if you change values in a trigger action the core writes a NULL value to the database, this causes the "Object reference not set to an instance of an object" error. What it should do is blank out the…
-
We implemented a solution today, let it cook for awhile.
-
What we are seeing is on no real pattern transactions will go "unknown", when this happens we have alerts setup to fire on the transaction monitors, so we fire false positives. We can't go into production in this state. smoked_angus I did add the SEUM users to the local admin group on the player server, testing that change…
-
That's assuming they have fixed this link to include the latest version of the installer, see 00288415, April 8th, 2019. We were upgrading from 12.3 HF6 to 12.4 with core 2018.4 HF3. The PM has been made aware. As far as I can tell however it is not fixed yet. I do agree it is very important to run the "correct" installer.…
-
I am seeing the same thing. But my setup is a little different, I am testing connections from a ServiceNow mid-server to target devices with the Solarwinds SNMPWalk tool. So with the little data I see here, I have to believe it is corruption but how to find it is another challenge indeed. I am guessing it is something to…
-
That's not to say queuing with RabbitMQ queuing is not a problem, at all. We are seeing that to, but to this point thought it was a symptom of a larger problem, idk.
-
I did what Janene16 suggested (no reboot) and it worked.
-
I see no issue with polling completion, dbsync's, or collection. I will follow up on the errors we see once I have them. Have turned on debug on different component monitors waiting to catch one that goes unknown.
-
Misprint above, 19 straight hours it took to upgrade 8 pollers/1 web/1 database Orion set of servers.
-
Sakshi, what is the case number?
-
You guys let me know once you get all the bugs worked out, smiling, I will wait here. Normally I am one of the first ones to take the plunge, not this time. Nothing but love, you go you trendsetters you.... For those folks that have implemented HA, be advised, if you select "Updates & Evaluations" and you have not hit this…
-
That explains a lot, thank you, now my lies won't be lies. > we prioritize polling and alerting first and foremost, these services typically recover in roughly two minutes. Faster in some instances depending upon hardware. The web interface is the last thing to return to service when a recovery condition occurs. This…