rcbarr — THWACK

Comments

Not sure this will help, but if you are blocked (this is a little dated, but) * What about Carbon Black (if you run it on your servers) * What about CrowdStrike (if you run it on your servers) I get you were able to download other files, but this kinda sounds suspect b/c I am unaware of any one else seeing this issue.…

in Error trying to upgrade to 2023.3 Comment by rcbarr August 2023
I wish you the best of luck. I did just confirm in our bi-weekly meeting with SolarWinds this bug DOES exist in 2023.3. For this issue: (Alerting engine STALLS and stops processing Alerts) In the Alerting.Service.V2.log on your main poller. Look for (1) WARN SolarWinds.Orion.Core.Alerting.Service.AlertConfigurationLock -…

in Alerting Service Processing Issue (2023.2.1) Comment by rcbarr August 2023
YES, it is also in 2023.3, found out today! We are running 2023.2.1 across (4) environments, Test, QA (a small HA test environment), our Watcher Environment (monitors our production Orion environment exclusively) and production, a HA environment with 9 APEs. Have no experience with 2023.3 yet. I was waiting on you to tell…

in Alerting Service Processing Issue (2023.2.1) Comment by rcbarr August 2023
Development was able to recreate this issue today, confirmed bug in 2023.2.1. Description: The Alerting Service/Engine stops processing alerts for an extended period of time. Initially for us, hours. We then made several changes to lesson the runtime load on the engine this in turn lesson the time for the stall to minutes…

in Alerting Service Processing Issue (2023.2.1) Comment by rcbarr August 2023
We are running 2023.2.1, would "not" advise upgrading to this version, time will tell with 2023.3.

in Anyone else having job engine issues affecting polling. Comment by rcbarr August 2023
Dang, we are seeing these errors as well with 2023.2.1 Our Case Number is: 01415926 For us the biggest result is the Alert Service stalls, i.e., stops processing alerts. We have made several modifications, some of which helped a lot, some not so much so (that we can see). Waiting on the results from a meeting between our…

in Alerting Broken After 2023.2 Upgrade Comment by rcbarr August 2023
Go ahead you trendsetter you.

in Anyone else having job engine issues affecting polling. Comment by rcbarr August 2023
Keep an eye on the JobEngine and the Alerting Service/Engine on your main Poller, just a suggestion.

in Anyone else having job engine issues affecting polling. Comment by rcbarr August 2023
Initially it was around 4:00am every morning, we found a unmanage task that I built in 2016 did "not" get properly moved to the new poller 1 server when we upgraded. It was slamming us, we recreated the job and thought that this event WAS the root cause, for almost a week we were solid, then this last weekend, here it…

in Alerting Service Processing Issue (2023.2.1) Comment by rcbarr August 2023
HA environment, 10 active, 10 DR. All production pollers (active), same network segment 10gb links, database on same network. New primary (physical) Dell PowerEdge MX750c, Dual Proc, 32 cores, 64 logical processors running on average processing at 13 to 18% consumption, 128gb of memory, all SSD’s, 10gb network connection.…

in Alerting Service Processing Issue (2023.2.1) Comment by rcbarr August 2023
I wish I could give you 42 more ^ (likes), rofl.....

in Anyone else having job engine issues affecting polling. Comment by rcbarr August 2023
bharris1, if you are stable and you stated you are, suggestion > HOLD We are having an issue with the Alerting Engine where it just stops processing, (lights out). I wonder, if the JobEngine is the source, does it feed the alerting table, idk, yet ~~ We were solid at 2020.2.6 HF5, upgraded to 2023.2.1, and here we…

in Anyone else having job engine issues affecting polling. Comment by rcbarr August 2023
WE just set this up in our environment (yesterday). We are letting it cook over the weekend. I will see if we encounter the same behavior, if we do, I will update you and once we do find a solution and pass it along. So far, it looks good.

in IBM Websphere status returns unknown for 80% of time Comment by rcbarr May 2023
So do I.

in Installing NPM, SAM, VNQM and Netflow and multiple APE's in AWS Cloud Comment by rcbarr October 2022
I have the same question.

in SolarWinds Scalability - AWS Direct Connect Comment by rcbarr September 2022
Hey John, for us it was a development provided hotfix that solved the problem. (I am not sure if the fix was included in the GA 12.4x release). Fortunately for us there was one other customer that saw the problem before us and had already engaged development where development had already built and was validating a fix. The…

in RabbitMQ - Multi-Subnet HA - CortexEvents Queue Comment by rcbarr July 2019
we put in CORE-12365 today aLTeReGo, it worked for RabbitMQ. We still have an issue with MSMQ, one single queue (but we are killing it, meaning running a lot of messages through it), but we will continue to work with A. and get that one going our way as well. >…

in RabbitMQ - Queues without Consumers Comment by rcbarr April 2019
We put in a UDT buddy drop today (provided by dev for us and what we are seeing with UDT) aLTeReGo; did not fix it, still queuing on both the MSMQ and RabbitMQ. I will tell A., tomorrow about the CORE-12365 and see if that issue tracks with what we are seeing here. Thanks for the response. niccat (We just upgraded to…

in RabbitMQ - Queues without Consumers Comment by rcbarr April 2019
The issue was with the 3rd party set of DLL's SAM uses to SSH to linux nodes WOD > https://www.weonlydo.com/SSH/ssh-activex-component.asp > WeOnlyDo.Client.SSH.dll > WeOnlyDo.Client.SSH.FIPS.dll The issue was identified and fixed in SAM 2020.2 JIRA 00431006 SAM 2020.2 Release…

in SAM 2019.4 Monitor Unknown Issue Comment by rcbarr July 2020
Resolved - Our issue was self inflicted, it was a configuration issue. If you are having issues with transactions going unknown, look closely at your players and how they are configured :-)

in WPM is alerting false positives Comment by rcbarr February 2018
Oh and we have you beat nickcat, we saw the RabbitMQ/CortexEvents queue get to 19 million, still syslog, traps, events, alerts, polling completion rate, database syncs, all solid, no apparent issues, it's crazy. Go Dell, go physical, a little FusionIO on the backend helps a lot to :-)

in RabbitMQ - Queues without Consumers Comment by rcbarr April 2019
Envision having to run through the configuration wizard for EACH Poller for EACH product. FYI, do NOT run the web optimization portion of the configuration wizard if you have multiple pollers until the LAST poller, rant over.....

in Upgrading questions Comment by rcbarr March 2017
OK you can reference SW15358 if need be, but there is a "Buddy Drop" for this problem: Core-2016.1-BD-CUST-18255 Apparently if you change values in a trigger action the core writes a NULL value to the database, this causes the "Object reference not set to an instance of an object" error. What it should do is blank out the…

in Solarwinds & ServiceNow Integration In NPM 12 Comment by rcbarr August 2016
We implemented a solution today, let it cook for awhile.

in RabbitMQ - Multi-Subnet HA - CortexEvents Queue Comment by rcbarr April 2019
What we are seeing is on no real pattern transactions will go "unknown", when this happens we have alerts setup to fire on the transaction monitors, so we fire false positives. We can't go into production in this state. smoked_angus I did add the SEUM users to the local admin group on the player server, testing that change…

in WPM is alerting false positives Comment by rcbarr July 2017
That's assuming they have fixed this link to include the latest version of the installer, see 00288415, April 8th, 2019. We were upgrading from 12.3 HF6 to 12.4 with core 2018.4 HF3. The PM has been made aware. As far as I can tell however it is not fixed yet. I do agree it is very important to run the "correct" installer.…

in How to rebuild an APE Comment by rcbarr April 2019
I am seeing the same thing. But my setup is a little different, I am testing connections from a ServiceNow mid-server to target devices with the Solarwinds SNMPWalk tool. So with the little data I see here, I have to believe it is corruption but how to find it is another challenge indeed. I am guessing it is something to…

in Receiving "Hardware polling failed: Error 31224" when using SNMPv3 polling in NPM Comment by rcbarr May 2016
That's not to say queuing with RabbitMQ queuing is not a problem, at all. We are seeing that to, but to this point thought it was a symptom of a larger problem, idk.

in RabbitMQ - Queues without Consumers Comment by rcbarr April 2019
I did what Janene16 suggested (no reboot) and it worked.

in Unexpected Website Error The settings property 'DPA.DPASummaryViewID' was not found. Comment by rcbarr September 2016
I see no issue with polling completion, dbsync's, or collection. I will follow up on the errors we see once I have them. Have turned on debug on different component monitors waiting to catch one that goes unknown.

in SAM 2019.4 Monitor Unknown Issue Comment by rcbarr December 2019