The other thread is closed so I figured I would start a new one I usually get more help here than actually contacting support.
So same issues as before but instead of the server not responding in 36 hours or so it took maybe a week but it is the SAME issues.
1. Server stopped sending alerts out sometime around 11AM on the 4th.
2. Logged onto server and opened Orion service manager and both the module engine and the administration service were going back and forth between running and stopping.
3. Orion could not connect to SQL
4. I have some alerts that at are going out but not sure if they are legit or not.
5. After the reboot I notice that a good chunk of my nodes interfaces are 'unknown' this looks like it fixes itself but again something else going on.
I have applied the 'hotfix' that you all pushed out to try to fix this.
I have done the change from streaming to buffered
I have done the registry change for the ports
The only thing I have not done is revert the snap shots back to June 14th prior to the update so Solarwinds is stable again.
At this point I am going to schedule a task in VM Ware to reboot the server every night. That is pretty much the only way I will know Solarwinds will actually work.
Solved! Go to Solution.
dodo123 is correct. The 'Most Commonly Used' group by can be a very expensive query depending upon the number of items that could be returned. If the query never returns, please open a case with support so we can determine what exactly is going on. Simply removing this group by option from your selection though should make the query return much faster.
So Support just said it was because I am not running things as the local admin? Oh I have a case open with support The current one is 00150177 - but have had issues since updating on June 18th so I think we are coming up on the two month anniversary of unstable Solarwinds suite. Well actually DPA is the only thing that has been rock solid so far but that doesn't really touch Orion... Thoughts aLTeReGo?
So Support just said it was because I am not running things as the local admin? Oh I have a case open with support The current one is 00150177 - but have had issues since updating on June 18th so I think we are coming up on the two month anniversary of unstable Solarwinds suite. Well actually DPA is the only thing that has been rock solid so far but that doesn't really touch Orion... Thoughts aLTeReGo ?
It's possible, however unlikely that you may encounter permissions issues when installing or upgrading as a domain admin account. This is because some environments apply user level policies to domain admin accounts, which aren't applicable to local admin user accounts. If the support engineer is seeing permission failures in the install or configuration wizard logs, that's a good indication that some degree of policy/security tightening is going on. I would start with running the Permissions Analyzer and fixing any permissions issues identified there.
For those playing at home to get to the permissions checker - (Drive that has Solarwinds installed) \Program Files (x86)\SolarWinds\orion\OrionPermissonChecker.exe
Ran it and the only two errors were the Network Service could not write to C:\Windows\Temp - so ran the fix and now it has access.
Still having all the same problems. What would you like me to try next? aLTeReGo
Are you using DPA to monitor your SolarWinds database? Ever since the 12.3 update we've had:
We failed over both our DB and MP to run out of a co-location site and still no improvement on the random blocking. Support has the latest DPA results with the blocking statements, but so far, no answer.
hendersonwa - I actually had a ticket open because one of my DBAs found some indexing issues in the Solarwinds Database and we reached out to Solarwinds for support - and this got passed out to development and I have not heard a peep since. Part of me is thinking this DB issue is related.
Are you running NTA in your environment? NPM 12.3 and NTA 4.4 released at the same time and we updated both to the latest version after waiting a few weeks. With NTA 4.4 they switched over to SQL server for flow data. Our DBA's internally have speculated that this might be causing overall issues due to the size of our NTA environment. I am with you martian monster, in thinking that it is db related as well, but no official answer from support yet.
hendersonwa - Yes we are running NTA and I did the upgrade to SQL when I updated to NTA 4.4. I am in agreement with you on the NTA 4.4 upgrade to SQL and it causing problems and I have been thinking about that as the core of the issue this whole time. Things were rock solid prior to updating NTA to SQL now that NTA is on SQL that is running both Solarwinds and DPA DBs maybe adding this 3rd new database is choking Solarwinds somewhere. I wonder if I could spin up a new SQL VM and move the NTA DB to a new server to see if that would fix the issue - all the errors I am seeing are all between the APP server and the SQL server. Thoughts on the NTA SQL upgrade causing these issues aLTeReGo ?
Thankyou Martin , I can feel the heat now. I am in between the upgrade process and while doing this I found so many bugs. I am with support team since system was down. Now it looks working. I will not be surprised if In find more bugs later on like you said.
I had issues while upgrading from 12.2 to 12.3. But now it is completed but I had to reboot my polling engine server multiple times in order to complete the configuration wizard.
I never seen this issues in of the previous upgrades but this time it took me 4 hours extra because of rerunning the configuration wizard.
Also scalability engine download concept from primary poller is very hectic since for few servers internet connection is slow and latency is little high between Orion main servers and that particular APE so it is still downloading the package since yesterday. I wish I could have latest APE setup offline.
Getting "error while rendering the Netflow collector services", Custom properties description is not updating when you edit custom property definitions. I am expecting many other issues.
Thanks. So, how many previous in place upgrades were performed on this host?
It seems that building out a new host is a good counter measure but not everyone has the time and resources to do that.
Perhaps the PowerShell script support uses to wipe Orion away may need to go public.
We are planning to upgrade our Orion Platform from 2017.3.5 to 2018.2, please tell em if this is stable or not? We can halt our upgrade process if there is any bug in the new version.
I did my upgrade back on June 18th. Everything ran fine for a few days until I joined the group that had issues with port exhaustion and the streaming versus buffered issues that were happening for roughly 3 weeks or so until Solarwinds figured out what was going on and then they released HF3. Which this fixed the issue pretty much and things were stable for a few weeks then I started having more odd issues. Opened a new ticket on 7/27 for the new issue that seemed to be like the old one but Solarwinds never really crashed it would spike the CPU for 30 mins or so and do other odd things then kind of return to normal. Last week Thursday I applied HF 4 and things seem to be stable for now. If Solarwinds stays up for a month with no issues I will call HF 4 a success.
So you might be good as long as you apply all the hot fixes only time will tell.
Hope this helped. - Dave
I debated opening a separate thread, but figured that I would reply here to maintain visibility to other users. We applied HF4 yesterday to our environment and are still experiencing the same issues that we have been since we first opened our case on July 9th. At one point, support suggested simply building new VM's but we would prefer not to pursue that option. We have our SW environment monitored by DPA and we continue to see significant blocking generated at random times from the main polling engine.
aLTeReGo, does HF4 address the need to update the TransferMode or should that still be changed?
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.