cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Product Manager
Product Manager

NPM 12.3 Orion 2018.2 Upgrade Feedback

What has your upgrade to NPM 12.3 on Orion Platform 2018.2 looked like? We on the product manager team would like to hear about it all, the good the bad and the ugly! For a starting point here is a quick getting started blog post on upgrading to 2018.2 Orion Platform: Preparing for the Upgrade to 2018.2

Tags (1)
310 Replies

Thanks serena we're planning to the HF4 next week.

MVP
MVP

Hi Serena,

An upgrade to 2018.2 failed (12.0.1 to 12.3). The error encountered was a Database configuration failure.

DB-config-failed.png

Due to paucity of downtime we had to roll back the upgrade and fall back on 12.0.1. We did open a Support ticket #00156141.

Has it something to do with the Orion DB Size, we have over 50GB. Are there any areas to look out for before we reattempt the upgrade ?

Product Manager
Product Manager

RaviK  wrote:

Hi Serena,

An upgrade to 2018.2 failed (12.0.1 to 12.3). The error encountered was a Database configuration failure.

DB-config-failed.png

Due to paucity of downtime we had to roll back the upgrade and fall back on 12.0.1. We did open a Support ticket #00156141.

Has it something to do with the Orion DB Size, we have over 50GB. Are there any areas to look out for before we reattempt the upgrade ?

It looks like from your support case that the support rep has a nice handle on things. I'll keep an eye on the support cases and follow it for updates.

0 Kudos

Were you able to grab a diagnostics from the machine or the ConfigWizard log file as it would have given more information on the specific table that was getting stuck.

- David Smith

Hi David,

Unfortunately, we were not able to grab the diagnostics from the system or even the log files.

regards

0 Kudos

RaviK:

The config wizard log files should be stored for the last few runs at least:

C:\ProgramData\SolarWinds\Logs\Orion

configwiz.png

We had the same issue. It's not a size issue but an issue of "your table contains a number that is too big." We had to truncate some of the tables for APM to get rid of this data. I'll be making our 3rd upgrade attempt tonight (2nd failure wasn't Solarwinds' fault.) I'm told by Customer Service that this has successfully resolved the issue for other customers.

Hi jemertz

Maybe I sound like a novice, but can you share the steps you undertook to truncate some of the tables ? The bigger challenge is knowing which tables to truncate.

Also, not sure if I understood this part "Customer Service stating that this has successfully resolved the issue for others" - do you mean truncating the tables?

Thanks for the assistance

Ravi

0 Kudos

Hi RaviK - What jemertz is referring to is a known issue with SQL Row count exceeding a certain count it is based on the table type (int) it has a value limit of 2,147,483,647 rows, and if you exceed that you can run into problems with the data not being stored properly. A common way SolarWinds recommend to resolve this is to TRUNCATE the tables in question. Without seeing your log files it's impossible to know which table(s) were causing the issue. The most likely culprits would be Traps and SysLog. You could check this in your existing table and see:

DBCC CHECKIDENT ([Traps], NORESEED)

* Change the Traps for TrapVarBind and SysLog and see if any return a result.

A healthy system might return a result such as:

Msg 7998, Level 0, State 3, Line 1

Checking identity information: current identity value '5040', current column value '5040'.

- David Smith
Level 12

Upgraded last Tuesday, been having struggles since Thursday. The upgrade process itself went smoothly, but the stability of the environment can now be described as "a hot mess". Basically need to reboot primary and additional pollers every two days to stop them from locking up hard and needing reboots via ILO. Bunch of errors out of JobEngine v2.

I wouldn't put it in the 11.5 upgrade horror show category but close behind.

Fingers crossed that hotfix 4 will resolve our woes today.

If anyone hasn't upgraded yet, I would recommend holding off for now. Not fit for Production.

0 Kudos

When are you planning to deploy HF4?

Inquiring minds on the sidelines are eager for your results..

0 Kudos

Just finished deploying HF4 - still seeing JobEngine v2 failures:

Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e

Faulting module name: ntdll.dll, version: 6.3.9600.18895, time stamp: 0x5a4b127e

Exception code: 0xc0000374

Fault offset: 0x000e6214

Faulting process id: 0xe94

Faulting application start time: 0x01d42e5a68515173

Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe

Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

Report Id: af31ce2a-9a4d-11e8-80f2-0017a4777d32

Faulting package full name:

Faulting package-relative application ID:

.......

Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e

Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000

Exception code: 0xc0000005

Fault offset: 0x1543d00b

Faulting process id: 0x2f60

Faulting application start time: 0x01d42e5a4bed3188

Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe

Faulting module path: unknown

Report Id: a518f290-9a4d-11e8-80f2-0017a4777d32

Faulting package full name:

Faulting package-relative application ID:

Time to dial into the support phone queue.......

prawij  wrote:

Just finished deploying HF4 - still seeing JobEngine v2 failures:

Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e

Faulting module name: ntdll.dll, version: 6.3.9600.18895, time stamp: 0x5a4b127e

Exception code: 0xc0000374

Fault offset: 0x000e6214

Faulting process id: 0xe94

Faulting application start time: 0x01d42e5a68515173

Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe

Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

Report Id: af31ce2a-9a4d-11e8-80f2-0017a4777d32

Faulting package full name:

Faulting package-relative application ID:

.......

Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e

Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000

Exception code: 0xc0000005

Fault offset: 0x1543d00b

Faulting process id: 0x2f60

Faulting application start time: 0x01d42e5a4bed3188

Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe

Faulting module path: unknown

Report Id: a518f290-9a4d-11e8-80f2-0017a4777d32

Faulting package full name:

Faulting package-relative application ID:

Time to dial into the support phone queue.......

What version of .NET Framework are you running on your Orion server? If you're not already running .NET 4.7.2, I would recommend upgrading as this has resolved similar issues for other customers exhibiting similar symptoms.  You can find what version of the .NET Framework you are running by going to [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full] in the registry as shown below.

pastedImage_3.png

Updated .NET to 4.7.2 , still seeing errors for SWJobEngineWorker2s:

4.7.2.png

post472-1.png

post472-2.png

0 Kudos

prawij  wrote:

Updated .NET to 4.7.2 , still seeing errors for SWJobEngineWorker2s:

4.7.2.png

post472-1.png

post472-2.png

Have you tried following the steps outlined in the below KB article?

Large number of .JET files in C:\Windows\Temp - SolarWinds Worldwide, LLC. Help and Support

Hi alterego, Jet file check looks good, I have a ticket open with a bunch of diags uploaded, Case 00153573.

pastedImage_0.png

I do have this one with warning status, but haven't heard back from the case owner yet:
pastedImage_0.png

0 Kudos

Just to close the loop, with support we reinstalled CollectorInstaller and JobEnginev2, and also fixed some permissions the permission checker found on our primary and one of our additional pollers. Maybe some kind of bug where the Config Wizard isn't running the Orion Permission Checker/Fixer correctly when doing the 12.3 and 2018.2 HF4 post upgrade Config Wizard runs?

In any case we seem to be back to stability now.

Seeing unexpected growth in our NetPerfMon DB since we applied Hotfix 4 on the 7th - anybody else experiencing this?

netperfmon-growth-1.png

0 Kudos

aLTeReGo​  Since I applied HF 4 this morning and rebooted my Orion app server I get this for my JET files - pastedImage_0.png I just ran a poll Orion to see if that will fix it.

0 Kudos

If you open up Orion Diagnostics and then at the bottom click on "Try SolarWinds Active Diagnostics" that will do the jet file check for you.

Thank you!!!