Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

Anyone else having job engine issues affecting polling.

lcsw2013

We are on version 2023.2.0

What we have noticed since upgrading to this version is an issue where suddenly interface start to drop out of polling. Then randomly devices bounce in and out of polling. Then Agents start to disconnect the monitored servers were not touched. And then we no have unreliable polling because of this.

when I check the job engine logs i see this:

2023-05-25 09:38:45,014 [29] WARN SolarWinds.JobEngine.AgentSupport.Routing.JobRouterAgentProxy - (null) Error during 'Clear scm' call. SolarWinds.JobEngine.JobEngineCommunicationException: Unable to communicate with agent JobEngine on endpoint 6fb9d5fd-4531-4c6a-96a4-55e919fc12cc. Response to message 'Clear' was not received in '00:02:00'.

This is from top to bottom of the logs with a few other random warnings in between. This is not typical. Usually I look in the job engine and it's always normal operations so to see several logs and all them say the same exact thing on all pollers indicates the job engine is the problem.

Now I've done a repair on on engine I've been working on. It didn't work.

I cleared the SDF files and had them regenerate. It didn't work.

I cleared up a whole bunch of agents no longer in use. Cleared up credential issues. etc. I did a good effort on house cleaning. This didn't help.

I'm at a point I'm considering doing a complete clean uninstall on that engine and a reinstall. But kind of skeptical if that will help if the above did not. The Agent messaging service was the service that was having issues as well and that log has a bunch of warnings and errors too.

Same with the collector. It almost seems as if the core polling components are all not working properly. Has anyone had similar issues?

Find more posts tagged with

polling_engine

job engine v2

npm

polling issues

error polling

Accepted answers

All comments

adam.beedell

Which log file specifically did you find this in?

lcsw2013

This was in the Job Engine v2 log

adam.beedell

Had a look in my C:\ProgramData\SolarWinds\JobEngine.v2\Logs\SolarWinds.JobEngineService_v2023_2.log, no matching record, is that's the one you were looking at might be an environmental thing

lcsw2013

hmm. That actually doesn't surprise me. I'm going to try some repairs later tonight. Hopefully it'll fix the issue. Thanks for the help

NightEdition

I am experiencing similar polling frustrations on 2023.2.

Since upgrading I now often experience issues where polling randomly stops for all nodes on a specific poller.

When this occurs there's no indication in the Solarwinds poller health stats to indicate a problem. We are therefore having to manually monitor the LastSystemUpTimePollUtc values for our nodes in the SQL DB to spot this issue.

Restarting the services on the affected poller resumes polling for nodes on that poller.

I have a case open with Support and they have advised it is a known issue in version 2023.2 and their developers are currently working on a fix for this. I assume it's also still applicable on 2023.2.1.

For anyone looking to upgrade to 2023.2 (or 2023.2.1) I would hold off!!

lcsw2013

It's good to know we aren't alone. Please share the word because it's a bug. We have 2 cases with support and we are even involved with their upper support management team and they are refusing to send this up to their developers.

I'm done butting heads with support.

lcsw2013

Just realized you mentioned that they told you it's a known issue. Because they never told me on 2 tickets I have with them. What's your ticket number if you don't mind me asking? I'd like to mention it on my ticket so they can correlate this.

lcsw2013

@aLTeReGo Would you have any further information on this by chance? Is there any issues that are known with the current NPM? Apparently I'm not the only one having unreliable polling in our environment. And I was just curious to know if possibly there is anything known and anything with development currently? Looks like someone on my thread mentioned that they where told development is working on a fix??

I just wanted to ask as this is extremely impactful and currently gives us no trust in the data polled. Polling just keeps stopping on its own. And a long list of things have all been tried and every last attempt to fix has failed.

donrobert5

Have you tried gathering the logs and then submitting Orion Insights? Just to check if there would be potentially environmental issues that you can eliminate to improve the Polling?

lcsw2013

No, I have not because we have manually checked everything. We have multiple tickets with support many escalated all the way to the top and support doesn't believe it to be environmental. Which is something we share or it would of shown itself more widely. This is in the mechanism of the software itself. Where there is no log that explains why it happens but it does.

The biggest thing I've found is that this wasn't an issue at all on the previous version. And had I known this new version has polling issues I would of not upgraded or if I caught this sooner I would have reverted everything back to the previous version. But I only figured it out after my server team deleted our backups we had done previous to upgrading.

Digging around thwack I'm finding the issue isn't only me there are others with very similar if not the same problem we are seeing. All different things that keep further proving a bug of sorts on this version. If it was environmental the issue would of just been us.

That's why I'm going everywhere I can and trying all the KB articles etc to try to fix this. Or get the attention of developers so they can look into this.

NightEdition

01367990

lcsw2013

Thank you!

lcsw2013

Further context for anyone who might be reading this. Polling isn't happening at all. It literally has been stopping 100% in all pollers and staying stuck till I manually restart the job engine.

Seems like it's a known issue with a broken job engine.

HerrDoktor

In the same boat with at least one of my clients.

Case # - 01360057

HerrDoktor

Jeremy is no longer with Solarwinds for almost 2 years now, so he might not answer.

dsimpkins

Same, we have an alert to look for SNMP polling not responding and it looks that the Last SystemUpTimePoll and also the last discovery time.

We are seeing the last discovery time freeze and have to restart the services or the server to get everything to come back.

Glad to see i'm not alone and hopefully a fix comes out soon.

tobyw_loop1

We've got a customer willing to hold off until the Q3 2023.3 release as they've recently upgraded and have a large estate.

dilbert1234

Can anyone confirm whether this issue is still with the latest Hotfix 2023.2.1?

NightEdition

I suspect issue is still applicable.
I'm on 2023.2 and experiencing the issue.
2023.2.1 was already released when I spoke to support and they said dev were working on a fix.
If I was already fixed in 2023.2.1 they'd of advised me to upgrade.

schrumma

I thought this was just my issue since we recently migrated to Google Cloud Virtual Environment (GCVE). I know there were firewall issues with this migration and thought I had everything working but then I noticed my reports I created had no data. WPM's were going in the unknown status. Maybe other issues that I haven't uncovered yet. Anyway I was very happy with the latest upgrade process just not so happy with the after affects.

lcsw2013

Darn I didn't know that. He was my goto when I needed some answers. Do you know anyone else we could tag in this post in a similar position at solarwinds? I don't know the names of anyone else at least on here in thwack.

lcsw2013

Open a ticket with support. I believe the more tickets we can open with support the better the chances they realize they need to get this fixed quicker.

I've tried everything already. To the point of complete uninstall and reinstall of the entire environment and it didn't help. Modified many things and it didn't help. Repaired specific components and it didn't help. I mean we ran the entire list of things we could think of and polling continued to fail.

That's when I came to thwack and found out I wasn't alone. Apparently there are many environments out there affects since upgrading. But I think solarwinds won't move on this quick enough unless they have enough tickets. Once they can see that this is a larger issue they'll move on it a bit quicker.

tony.johnson

I can confirm this is a known issue seen in some larger environments. It is tracked internally under OO-19959 and will be fixed in the 2023.3 release.

dsimpkins

thanks Tony, are you aware of a workaround or patch becoming available?

tobyw_loop1

the .3 release will be July time hopefully

planglois

What can be considered a "Large environment" for this to happen?

decust

Been encountering the same issue.

Luckily I have dashboard screens on which I show a map with my Polling Engines with the default SAM Polling Engine applications assigned to the VIP's. So when I see a poller go haywire I check the jobs lost counter and when It's starting to lose jobs I'll initiate a failover. It ain't pretty, some days I go without issue and some days I have to fail over multiple pollers multiple times. However, It never seems to happen for my Main Polling Engine.

dsimpkins

Jobs lost counter, where did you find this?

decust

It's part of the built in polling engine templates within SAM.

We've added the polling engine VIP's as WMI nodes, so we always poll the application on the active server.

Within Performance Monitor you can add the counter by going to "SolarWinds: Job Engine V2" and select the Jobs Lost counter.

Right now at the point that we made alerts that whenever Jobs Lost goes over the critical threashold that it triggers a failover on that HA Pool.

lcsw2013

Thanks Tony. I have a ticket with support about this that is currently being investigated by application engineers and your devlop teams. Per support since this was just recently uncovered they don't have answers yet.

And mostly because yes the high level problem is issues with polling but different environments are seeing different issues according to support.

In our issue specifically, our logs aren't even catching and logging an issue even with the logs adjusted to show everything. It only shows correct functions. But then again some log files where not even being written too and they issue is very strange and weird where for example a device may poll device information like cpu and memory but it won't poll the interfaces then later the interfaces poll but device information won't poll then some pollers just stop polling all together.

It's like there are random poll cycles that partially works for some things but then fails. We did notice in our environment that there is alot of events in the event viewer that seem to indicate the worker process of the job engine is crashing. But again no logs back this up.

It's a intermittent partially working error that has no reason to it. And the event viewer and system logs have no leads on them. Meaning support has had to setup manual troubleshooting steps to see if we can catch the error as it happens and record it.

Support has found several active bug reports internally all related to job engine problems and they have passed them all to developers to see if we can find what the heck is happening.

Support stated that when something fails it should be all or nothing. Not somethings failing and the rest works and then later the parts that don't work start to work but the parts that were working stops working.

It's a head scratcher because we don't have any leads. So we are having to find creative ways to try and figure this out.

lcsw2013

The update I've gotten from support is that this has become a front and center issues for them and they are attempting to investigate and find a way to troubleshoot this and find a resolution.

As for us, we have been manually polling devices and restarting services just to keep the system going. Otherwise it stops polling and never comes back. Or sometimes it partially works but is still highly unreliable.

It's a super weird issue. But hopefully solarwinds figures an interim solution while we wait for their next version. We cannot go with gaps in data till the next version is out.

rkr69

I am also getting similar kind of issue. we upgraded recently. Our APE stops ping response, however, SNMP polling still works or sometime issue with both. But everything looks green and showing working fine As we are monitoring the SolarWinds server through another tool, we came to know that it does not respond to ping. when we login into the server and try to do self ping or pinging the gateway IP, it does not respond. Just stuck with CMD prompt and not even giving single packet loss as well. This is very unusual behavior. We have to restart the service. When we show this to SolarWinds support, they simply deny and saying that it is environmental issue or network issue. But definitely it is not. How come network came her if we are not able to do self ping or gateway ping. And how it start working after restarting the service. Not sure if anybody found any solution.

lcsw2013

There isn't a fix yet. I've been working with support nearly every day for several weeks now. They are narrowing down a solution but not there yet as they said that at a high level polling affected but on their research they see that different environments are failing differently this is why they haven't officially yet publicly said there is an issue.

I'd say if you want next time you talk to them mention this thread and the ticket numbers above. So they can correlate your ticket with all the others that are experiencing this same issue. Ask for an escalation they might be able to get your issue looked at.

But there isn't a fix right now. I'm getting ready to send them a bunch of metrics and logs tomorrow per their request. Hopefully it helps them hone in on the problem so they can get it fixed. The found somethings not working but as it was explained to me is that the issue is showing up differently in different environments which has made it hard from them to figure out the actual issue.

Send them this thread and they should be able to involve you with the rest of the tickets similar to this.

Samuel52

The "fix" they provided us doesn't seem to do anything. Our pollers still stop randomly and we either need to restart services or reboot the server. I personally don't have days to spend on the phone with support on this any longer. This has been going on for so long I have no faith in the data being recorded by the tool which makes it very difficult for management reporting. If this doesn't get resolved in the next software update, we will be moving away from Solarwinds.

lcsw2013

According to the folks I been talking to at support there isn't a fix yet. What I do is as soon as I notice poll next times start to fall behind for a poller I manually restart the job engine on that poller. I'm trying to figure out how to properly automate this. I've attempt to script it a few times but it doesn't work properly.

However, once I have it figured out it's what we will be doing using a total of 2 to 3 scripts to basically keep kicking solarwinds every time those poll next times keep falling behind. And hopefully this will get us through till solarwinds releases a fix.

They say that at a high level the problem is related to polling however in the different cases they have the polling issues have been caused by different things so it's been hard for them to pin-point what's going on. However it seems like they continue to hone in on a fix so I don't think it'll be too much longer before something is released.

Seems to be affecting many people out there.

rjrothwell

We are having similar problems. I got a support case open. So far we made it past the basic questions and now waiting on a response. Reading the case notes for SolarWinds Platform 2023.2.1, there is mention of a fix called "Alerting issues that occurred after the upgrade to 2023.2 were addressed." This is vague since if SNMP is not polling, some alerts are not working.

All the SolarWinds services are running and when I restart the services, it normally comes back up. This is a pain since as had a few major event happen on campus and SolarWinds did not pick it up as designed since the SNMP Polling stops randomly.

lcsw2013

Mention this thread to support. I started to try and consolidate similar instances of everyone having polling issues so they could investigate on their end.

They have identified there is a polling issue as it appears like everyone is reporting the same thing once service is restarted it works for a bit then it stops. But they haven't figured out the root of this yet as in different environments different things are causing the same or similar errors making it hard to figure out what exactly is the root of the issue but they appear to be honing in on a possible solution. At least it's what I understand based on what they have told me.

But feel free to mention this thread to them so they can start putting all of our cases together under these polling issues. They have grabbing performance counter logs manually on our services to try and catch the error as in our case our logs aren't catching much of anything yet it's clear as day that polling randomly just keeps stopping regardless if the logs are reporting anything or not.

Hopefully they can catch it with these manual logs to understand what's going on. I've been manually keeping an eye on the next poll times on the pollers if they all start to fall back that's usually when I restart the job engine on that particular poller and it gets things going again. Haven't figured out how to automate this yet in a way that works correctly. So I've been manually doing this.

I just hope solarwinds figures this out quickly. As it's becoming a rather heavy effort to keep things stable.

rjrothwell

Done! I let support know.

fitzy141

would this bug effect polling via agents .. i have multiple agents across diff PE that seem to be flakey not alerting when thresholds were hit or allowing me to list resources ... i opened a ticket and the email back from support was this - so easy to do on production servers.

Uninstall the current agent.
Delete the node.
Reboot the server.
Installed the latest agent, which is downloaded.(Manually)
Add the node.

lcsw2013

Those steps to me seem like it would cause you to lose historical data. We do have agent polling issues we are seeing but in our case it doesn't appear to be agents that are acting up but rather the poller itself. From what I understand from support.

What I've been doing is I restart the job engine and many times it's enough to go on a little longer before it fails out again. And less frequently I need the agents rebooted when they get into a stuck state. But yes the issue generally speaking does affect agents as well.

I've been telling people if they want, to go ahead and mention this thread to support so they could hopefully correlate all the different tickets and hone in on a fix.

But on my ticket I have not had a fix provided yet. Solarwinds has been going back and forth with me but have been unable to find the root cause so far to be able to provide a resolution.

If they do find a fix hopefully they can notify everyone affected by this.

fitzy141

appreciate the response I have been pulling my hair out trying to figure out whats going - yeah i wouldn't get the approval to start removing agents and messing around on production servers anytime soon

decust

While it may be a workaround. This is not really something I would consider within my environment where we have over 2000 agents. x)

I also see the odd behaviour among agents. But mine mostly fail because all of them are deployed as Server-Initiated. So when the agent stop responding we know it's time to failover the pool with the malfunctioning poller.

lcsw2013

No worries! I was just happy to see that it wasn't just us having issues. Hopefully solarwinds finds a fix for all of us.

msites

Thanks for this! This just started for us on Sunday and it's getting bad. What's super frustrating is that I spent half my day proving to Tier 1 that it wasn't a SQL database issue. Then I finally get to Tier 2 and they are like "yea, it's a known issue" within 5 minutes of the call. It's also crazy that when they have these 100% known issues they refuse to update their release notes for the known issues section. If someone has a large environment then they may not want to take 2023.2 since it would probably cause them an outage like us. At least SolarWinds got a new logo!

bharris1

We have an upgrade to 2023.2.1 scheduled for tomorrow and I just found this thread. Is it a given that this will happen in an environment? We have 9000+ nodes so I'd consider ours a large environment.

lcsw2013

As mentioned in my other response if I was in your shoes I'd be cautious and wait. But if you must upgrade at least save the backups and hang on to them because if the problems do happen you'd want to revert back.

bharris1

Has there been a more recent update than about 2 weeks ago mentioning 2023.3 will be coming out in July? I would imagine they would want to put out a buddydrop or hotfix to fix this before a version upgrade. This seems like a major issue that has me leaning towards waiting for 2023.3 before upgrading.

lcsw2013

Unfortunately under our escalated tickets they have not mentioned anything. I think that the date mentioned might be speculation I would assume. But as for us we have not been told of any buddydrops or hotfixes.

The only thing we have going on is testing where their development asks for something we do it and the back and forth just keeps going endlessly.

But in the mean time I've taken measures to manually intervene and restart services when the issues happen. Usually I'll shut down the job engine and restart the collector and as soon as I restart the job engine it'll work for around a week maybe a little less then I have to do it again.

I just haven't found a proper way to script this and automate the process but it's been what we've been having to do here until we hear back with a fix from solarwinds.

I've questioned them on a buddy drop or hotfix but they always seem to divert the conversation back to the troubleshooting and refusing to answer questions about a fix. So I quite trying to get answers. I just do what needs to be done to maintain things going.

Looks like some people in the thread have had some success with their own fixes or some suggestions from support. But for us I've attempted many things said and nothing works. Only service restart does the trick.

bharris1

I have decided to postpone our upgrade from 2020.4.2 => 2023.2.1 until this fix goes out. We are very stable now and would hate to upgrade and have this bug bite us.

msites

yes, I wouldn't touch 2023.x for a long time. Once they started to merge things with 2022.3+ it's been 17 bugs after 17 bugs with each release.

saw10

Case #: 01371560 - Additional Polling Engine High CPU utilization
We closed the case as there was no help forth coming.
Our solution is to restart the server. This happens randomly across all of our polling engines.
And of course we get the classic response “This issue will be “resolved” in the next release”.
This is the type of issue that needs to be a Hot Fix. We lose polling of device on the APE that is affected.

NOTE - We are asked to upgrade anytime a CVE is addressed in the release notes, by our Security Team. This is due to the whole issue at the end of last year.
Good point to the release of CVEs as a separate "patch"

decust

unfortunately I’m in the same boat. We had negated the cve’s already with how we designed the environment and logging in. Security team rammed it through because cve’s need to be patched asap. Now the environment is unstable while I was surprised how stable 2023.1 was running for me.

msites

We have the same policy of being forced to take upgrades when they have CVEs listed and which means we are always early adopters of the super buggy upgrades with little help. If we could go back in time, we would take that 2020.4 release and stick with it as there have been zero real new features released in 1.5 years, just nothing but new bugs.

tobyw_loop1

RC for 2023.3 has been announced today documentation.solarwinds.com/.../solarwinds_platform_2023-3_release_notes.htm

Samuel52

Looks like the 2023.3 RC doesn’t appear to address this issue from the fix notes. Extremely disappointing.

lcsw2013

In our environment we usually hold off Till General Availability. We allow solarwinds to work out kinks in RC versions before we consider an upgrade. And our Rep's at solarwinds haven't confirmed this would fix our specific problem.

But hopefully it does. Looks like solarwinds is moving relatively quick to try and fix this.

adam.beedell

I read this line in as "possible fix"

1315625, 1336784, 1337676, 1358723, 1359224, 1362623, 1372298	The issue where JobEngine was unable to submit a job which resulted in polling issues was addressed.

lcsw2013

We are awaiting word directly from the Rep's working with us. We normally do not consider RC Versions. We normally wait till General availability to make sure all kinks are worked out.

But in this case we aren't upgrading till we have direct confirmation the upgrade will fix the issue.

adam.beedell

Heard from solarwinds staffers they're trying to not use the hotfix nomenclature after a recent decision, but it's just the same thing in smaller releases, so if you ask for a hotfix and get a year.number.number answer it's that anyway.

lcsw2013

Interesting.

Samuel52

Hmm maybe I looked in the wrong place. Ty

tobyw_loop1

Fingers crossed

lcsw2013

Normally we wait RC versions. But in this case we are awaiting direct confirmation from the Rep's we are working with before we consider making any changes to our environment. No one here wants to make a change unless solarwinds can guarantee it works.

saw10

Found in the

SolarWinds Platform 2023.3 Release Notes

documentation.solarwinds.com/.../solarwinds_platform_2023-3_release_notes.htm

bharris1

Any updates from your rep?

lcsw2013

No Update. We actually just had a meeting with a Support director who admitted that they understand the issue but do not know the root cause. And they will be assigning our ticket to their highest engineer with the most experience in the software to work with development and others to try and sniff out the root cause and find a resolution.

It was a rather tense meeting because my management here were involved so they spoke their mind and I believe SolarWinds now understands how difficult their software has made it for us to actually provide monitoring that isn't like swiss cheese and full of holes and gaps.

We have readjusted ourselves to use other tools in the mean time to supplement solarwinds where it's not reliably polling.

Kind of scary when a company releases a software. They developed the thing. They quality tested and at the end they end up with a program they cannot root cause? wow!

But we are just going to keep on moving with our current arrangement and just hope that the root is found sooner rather than later so we can finally get this resolved and have a system that's reliable and trust worthy when it comes to data collection and monitoring.

lcsw2013

Just an Update for anyone following this. SolarWinds has confirmed a known bug with unmanaged interfaces and volumes causing polling to get stuck and stop. It polls perfectly manually but the polling is broken in the the next poll time will keep going into the past. This is confirmed a known issue discovered in our environment.

Again, the temp fix is to go into list resources and uncheck unmanaged volumes or interfaces temporarily. Placing a interface or volume in unmanage state will break polling for the interface and volume monitoring of the device.

Agents monitoring issues still on-going and we hope that solarwinds can resolve that next or at least find a temp fix. But figured I'd share this just in case anyone is experiencing the same.

RaviK

Thanks for sharing. Any hints on the tentative timelines for the 2023.3 release (Hope it is a stable release).

@abdhijasharma @sagar.b @Shravya29 @99kushal @mtr

lcsw2013

It's in RC so I'd assume they will launch soon I guess. As far as fixes go I think some fixes are included but not all of them. Some things are still being worked out.

bharris1

I have decided to upgrade from 2020.2.4 to 2020.2.6 so we are still in support until next May vs this November. How do I approach upgrading to the .6 Service Release vs it upgrading to 2023.2? Do I just grab the installer for 2020.2.6 and drop it on all pollers the old fashion way? I really don't want to accidentally go to the 2023.2 release.

jhorng

2023.3.0 is out as of today. Who is going to be first to give it a try?

msites

There hasn't been a stable release in over a year...i'll let you go first

jhorng

Interestingly 2023.1.0 has been relatively stable. There are some bugs that I've manually dealt with such as the job engine not automatically starting after a reboot of the VM.

But the CVEs are building up so 2023.3.0 is tempting.

Samuel52

We are probably going to upgrade today. Our environment is so unreliable right now that it really doesn’t matter much. All our metrics reporting to management is a disaster and has made it almost impossible to get an accurate picture of node stability. In fact, if this release doesn’t make the environment more stable, I will likely begin the process of finding a replacement. Our renewal is coming up next year and I’ve already requested the budget to test new tools in Q1.

After being a customer for like 15 years, at several companies, I’m about done with it.

jhorng

Ugh, sounds rough. Hope it goes well. Let us know how it goes!

mwire

Keep us posted on how the upgrade goes. From our experience with this issues it seems to only resurface about every 7-10 days, and they clear up every time we use one of the various work-a-rounds support provided us.

LeBeauUK

Did the Upgrade yesterday to 2023.3.0 (Due to the CVEs being published, sort of forced our hand) went OK, a couple of UAC issues on the Primary Servers, but no where near as bad as the last upgrade I did...

rcbarr

I wish I could give you 42 more ^ (likes), rofl.....

Samuel52

I did upgrade our production system and it has been running fine since. We haven’t noticed any issues or anomalies as of today. Until it’s been running a month it will be hard to tell if the issue pops back up. It’s been running about a week and a half so far…

HerrDoktor

i'd go with tony.johnson on tagging

bharris1

Did the polling issues get fixed in 2023.3? It looks like the latest post in here was from about 3 months ago.

tobyw_loop1

No further problems witnessed our side, we generally recommend to our customers to maintain upgrade paths. N-1 is what we generally recommend. 2023.3.1 has been out and stable for some time.

bharris1

Thanks for your info. We are planning on upgrading in January and hoping we can get on 2023.4 to get SDWAN polling. Are there any gotchas to be aware of going from 2020.2.6 to 2023.x?

umutdedeoglu

We just upgraded from 2020.2.6 to 2023.3.1 directly by migrating new servers without a problem. Ex servers were on outdated OS (Server 2012) so please check OS and SQL requirement before the update

adam.beedell

It'll be a total reinstall at the front end, so potentially stuff to consider there around customizations.

In SAM there's a "Legacy" setting that'll break some powershell stuff if you have it

The script approval thing comes in

If you have a bad certificate on website launch everything slows to a crawl

"Platform" stuff comes in, there'll be more services running and possibly a new DB or two to create

I think that's the lot, 2023.3.1's pretty good.

shanehocking

We had below issues with 2023.3. Fixed in 2023.4. (01372311)

1	If agent software on endpoint in uninstalled, Can't redeploy agent from APE. Message is "Agent deployment in progress" and wheel keeps spinning
2	when restarting agent, spins on message "send restart command to select agents" and then console message for Agent is -> "Agent Status" is "Agent restart attempt failed. Make sure the agent is running and try restart again"... -> If restarted from Service on end point, "Agent is running" -> if move to agent to Main Poller" "Agent Status" reverts to "Agent is running"
3	if message "Job Engine Agent Plugin : Plug-in Load Failed Plug-in failed to start after 1 attempts. » Reinstall plug-in" -When hit "Reinstall Plugin", Agent Plugins - Stuck in "Installation Pending" -> Restart agent does not initiate -> List Resources shows "Installing Agent Software" -> Switch to main poller. -> Initially shows. "Installation Pending". After a couple of minutes reverts to "Plug-in Load Failed Plug-in failed to start after 1 attempts. » Reinstall plug-in " -> Without hitting Reinstall plug-in, software is being installed on target system -> Installs 4.8. "Agent Status" = "Reboot Pending" . ".Net Framework Version" reverts to 4.7.2. -> Continues to show "Plug-in Load Failed Plug-in failed to start after 1 attempts. » Reinstall plug-in"
4	hit Edit settings , then troubleshooting , says "Loading" and wheel spins until says "Agent is not responding. Remote troubleshooting is not available" ->move to main poller and issue does not exist.