What has your upgrade to NPM 12.3 on Orion Platform 2018.2 looked like? We on the product manager team would like to hear about it all, the good the bad and the ugly! For a starting point here is a quick getting started blog post on upgrading to 2018.2 Orion Platform: Preparing for the Upgrade to 2018.2
Thanks serena we're planning to the HF4 next week.
Hi Serena,
An upgrade to 2018.2 failed (12.0.1 to 12.3). The error encountered was a Database configuration failure.
Due to paucity of downtime we had to roll back the upgrade and fall back on 12.0.1. We did open a Support ticket #00156141.
Has it something to do with the Orion DB Size, we have over 50GB. Are there any areas to look out for before we reattempt the upgrade ?
RaviK wrote:
Hi Serena,
An upgrade to 2018.2 failed (12.0.1 to 12.3). The error encountered was a Database configuration failure.
Due to paucity of downtime we had to roll back the upgrade and fall back on 12.0.1. We did open a Support ticket #00156141.
Has it something to do with the Orion DB Size, we have over 50GB. Are there any areas to look out for before we reattempt the upgrade ?
It looks like from your support case that the support rep has a nice handle on things. I'll keep an eye on the support cases and follow it for updates.
Were you able to grab a diagnostics from the machine or the ConfigWizard log file as it would have given more information on the specific table that was getting stuck.
Hi David,
Unfortunately, we were not able to grab the diagnostics from the system or even the log files.
regards
RaviK:
The config wizard log files should be stored for the last few runs at least:
C:\ProgramData\SolarWinds\Logs\Orion
We had the same issue. It's not a size issue but an issue of "your table contains a number that is too big." We had to truncate some of the tables for APM to get rid of this data. I'll be making our 3rd upgrade attempt tonight (2nd failure wasn't Solarwinds' fault.) I'm told by Customer Service that this has successfully resolved the issue for other customers.
Hi jemertz
Maybe I sound like a novice, but can you share the steps you undertook to truncate some of the tables ? The bigger challenge is knowing which tables to truncate.
Also, not sure if I understood this part "Customer Service stating that this has successfully resolved the issue for others" - do you mean truncating the tables?
Thanks for the assistance
Ravi
Hi RaviK - What jemertz is referring to is a known issue with SQL Row count exceeding a certain count it is based on the table type (int) it has a value limit of 2,147,483,647 rows, and if you exceed that you can run into problems with the data not being stored properly. A common way SolarWinds recommend to resolve this is to TRUNCATE the tables in question. Without seeing your log files it's impossible to know which table(s) were causing the issue. The most likely culprits would be Traps and SysLog. You could check this in your existing table and see:
DBCC CHECKIDENT ([Traps], NORESEED)
* Change the Traps for TrapVarBind and SysLog and see if any return a result.
A healthy system might return a result such as:
Msg 7998, Level 0, State 3, Line 1
Checking identity information: current identity value '5040', current column value '5040'.
Upgraded last Tuesday, been having struggles since Thursday. The upgrade process itself went smoothly, but the stability of the environment can now be described as "a hot mess". Basically need to reboot primary and additional pollers every two days to stop them from locking up hard and needing reboots via ILO. Bunch of errors out of JobEngine v2.
I wouldn't put it in the 11.5 upgrade horror show category but close behind.
Fingers crossed that hotfix 4 will resolve our woes today.
If anyone hasn't upgraded yet, I would recommend holding off for now. Not fit for Production.
When are you planning to deploy HF4?
Inquiring minds on the sidelines are eager for your results..
Just finished deploying HF4 - still seeing JobEngine v2 failures:
Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e
Faulting module name: ntdll.dll, version: 6.3.9600.18895, time stamp: 0x5a4b127e
Exception code: 0xc0000374
Fault offset: 0x000e6214
Faulting process id: 0xe94
Faulting application start time: 0x01d42e5a68515173
Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: af31ce2a-9a4d-11e8-80f2-0017a4777d32
Faulting package full name:
Faulting package-relative application ID:
.......
Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x1543d00b
Faulting process id: 0x2f60
Faulting application start time: 0x01d42e5a4bed3188
Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe
Faulting module path: unknown
Report Id: a518f290-9a4d-11e8-80f2-0017a4777d32
Faulting package full name:
Faulting package-relative application ID:
Time to dial into the support phone queue.......
prawij wrote:
Just finished deploying HF4 - still seeing JobEngine v2 failures:
Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e
Faulting module name: ntdll.dll, version: 6.3.9600.18895, time stamp: 0x5a4b127e
Exception code: 0xc0000374
Fault offset: 0x000e6214
Faulting process id: 0xe94
Faulting application start time: 0x01d42e5a68515173
Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: af31ce2a-9a4d-11e8-80f2-0017a4777d32
Faulting package full name:
Faulting package-relative application ID:
.......
Faulting application name: SWJobEngineWorker2.exe, version: 2.13.0.1337, time stamp: 0x5ae9c63e
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x1543d00b
Faulting process id: 0x2f60
Faulting application start time: 0x01d42e5a4bed3188
Faulting application path: C:\Program Files (x86)\Common Files\SolarWinds\JobEngine.v2\SWJobEngineWorker2.exe
Faulting module path: unknown
Report Id: a518f290-9a4d-11e8-80f2-0017a4777d32
Faulting package full name:
Faulting package-relative application ID:
Time to dial into the support phone queue.......
What version of .NET Framework are you running on your Orion server? If you're not already running .NET 4.7.2, I would recommend upgrading as this has resolved similar issues for other customers exhibiting similar symptoms. You can find what version of the .NET Framework you are running by going to [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full] in the registry as shown below.
Updated .NET to 4.7.2 , still seeing errors for SWJobEngineWorker2s:
prawij wrote:
Updated .NET to 4.7.2 , still seeing errors for SWJobEngineWorker2s:
Have you tried following the steps outlined in the below KB article?
Large number of .JET files in C:\Windows\Temp - SolarWinds Worldwide, LLC. Help and Support
Hi alterego, Jet file check looks good, I have a ticket open with a bunch of diags uploaded, Case 00153573.
I do have this one with warning status, but haven't heard back from the case owner yet:
Just to close the loop, with support we reinstalled CollectorInstaller and JobEnginev2, and also fixed some permissions the permission checker found on our primary and one of our additional pollers. Maybe some kind of bug where the Config Wizard isn't running the Orion Permission Checker/Fixer correctly when doing the 12.3 and 2018.2 HF4 post upgrade Config Wizard runs?
In any case we seem to be back to stability now.
Seeing unexpected growth in our NetPerfMon DB since we applied Hotfix 4 on the 7th - anybody else experiencing this?
aLTeReGo Since I applied HF 4 this morning and rebooted my Orion app server I get this for my JET files - I just ran a poll Orion to see if that will fix it.
If you open up Orion Diagnostics and then at the bottom click on "Try SolarWinds Active Diagnostics" that will do the jet file check for you.
Thank you!!!
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.