cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Solarwinds is still not stable

Jump to solution

The other thread is closed so I figured I would start a new one I usually get more help here than actually contacting support.

So same issues as before but instead of the server not responding in 36 hours or so it took maybe a week but it is the SAME issues. 

1. Server stopped sending alerts out sometime around 11AM on the 4th.

2. Logged onto server and opened Orion service manager and both the module engine and the administration service were going back and forth between running and stopping. 

3. Orion could not connect to SQL

4.  I have some alerts that at are going out but not sure if they are legit or not. 

5. After the reboot I notice that a good chunk of my nodes interfaces are 'unknown' this looks like it fixes itself but again something else going on. 

I have applied the 'hotfix' that you all pushed out to try to fix this.

I have done the change from streaming to buffered

I have done the registry change for the ports

The only thing I have not done is revert the snap shots back to June 14th prior to the update so Solarwinds is stable again. 

At this point I am going to schedule a task in VM Ware to reboot the server every night.  That is pretty much the only way I will know Solarwinds will actually work. 

Thoughts?  serenaaLTeReGo

1 Solution
Product Manager
Product Manager

Orion Platform Hotfix 3 was released yesterday to address the ephemeral port exhaustion issue which is likely the cause of the issue you are experiencing.

View solution in original post

129 Replies

All updates on all your issues are watched with interest!  Please continue to let us know how things are proceeding.

Level 8

There needs to be better viability into ongoing issues.

FYI - Looks like HF 5 is due out on Aug 20th

https://support.solarwinds.com/Success_Center/Orion_Platform/Orion_Documentation/Orion_Platform_2018...

0 Kudos

dilbert1234  wrote:

There needs to be better viability into ongoing issues.

FYI - Looks like HF 5 is due out on Aug 20th

https://support.solarwinds.com/Success_Center/Orion_Platform/Orion_Documentation/Orion_Platform_2018...

Out of curiosity, how did you manage to stumble upon this link?

0 Kudos

FYI: When I went to this link (just now) I landed on a "sign in" page (link has been taken down, as expected), BUT, earlier this week I found a similar page/link relating to SAM 6.7 that I landed on while looking at the SAM 6.61 docs/release notes? that was incorrectly linked from there (like I wanted to see a hotfix for 6.61, and got linked to the 6.7 release notes).  When I go to that link now I get the same login page.

So maybe someone's doing some updates to the support pages, or something on the web side is incorrectly resolving links to the wrong places?

0 Kudos

Wow.  For something made available for us all to read on ate least August 16, and yet specifically noted to NOT publish until at least August 20, someone dropped the ball.

How does someone miss the bold, capitalized, RED note, and publish it too early anyway?

pastedImage_0.png

MVP
MVP

Solarwinds is still not stable - we have faced plenty of issues post upgrade and continue to do so despite applying the HF4:

  • Database configuration errors - Arithmetic Overflow error, conversion to int - indicating some tables have large values that could not be converted
  • Solarwinds Agents not responding
  • SWIS service stability issue (have had to restart the services at least once a day)
  • Additional Polling engine upgrade fails (stuck at NetFlow upgrade)
  • CPU utilization issues
  • Performance Issues
0 Kudos

RaviK  wrote:

Solarwinds is still not stable - we have faced plenty of issues post upgrade and continue to do so despite applying the HF4:

  • Database configuration errors - Arithmetic Overflow error, conversion to int - indicating some tables have large values that could not be converted
  • Solarwinds Agents not responding
  • SWIS service stability issue (have had to restart the services at least once a day)
  • Additional Polling engine upgrade fails (stuck at NetFlow upgrade)
  • CPU utilization issues
  • Performance Issues

Do you currently have a support case open for these issues?

Yes aLTeReGo

We do have cases opened for the issues faced

1. 00145840 - Solarwinds server down

2. 00156662 - Polling Server down

3. 00143315 - SWIS services not responding (hang state/down)

4. 00156141 - Upgrade fails with a DB Configuration error

gangadhar.k

c_sameer

abdhijasharma

RaviK  wrote:

Yes aLTeReGo

We do have cases opened for the issues faced

1. 00145840 - Solarwinds server down

2. 00156662 - Polling Server down

3. 00143315 - SWIS services not responding (hang state/down)

4. 00156141 - Upgrade fails with a DB Configuration error

gangadhar.k

c_sameer

abdhijasharma

RaviK , it appears all of those cases referenced above are currently closed. If you're still experiencing issues, please open a new case with support and post your case number here so I can look into it.

Level 12

Anyone seeing unexplained growth in their NetPerfMon DB size since Hotfix 4 was applied?

netperfmon-growth-1.png

0 Kudos

Have you checked the logs to see if DB Maintenance is running? It's possible that its just not cleaning up the old data because maintenance hasn't completed.

0 Kudos

I was trying to add a widget to a few different pages on Orion and it could not connect to the server and got a server error message.... so looks like things are still not running 100%  Here is the error.

{

"data": {

"Message": "There was an error processing the request.",

"StackTrace": "",

"ExceptionType": ""

},

"status": 500,

"config": {

"method": "POST",

"transformRequest": [

null

],

"transformResponse": [

null

],

"jsonpCallbackParam": "callback",

"headers": {

"Accept": "application/json, text/plain, */*",

"Content-Type": "application/json;charset=utf-8"

},

"cache": false,

"url": "I REMOVED THIS LINE BECAUSE IT REFERENCED OUR SERVER

"data": {

"request": {

"GroupName": "Type",

"ViewType": "Summary"

}

},

"params": {

"viewId": 532,

"swAlertOnError": true,

"swLogOnError": [

401,

403,

500,

501,

502,

503,

504,

505,

506,

507,

508,

509,

510

],

"swToastOnError": [

401,

403,

500,

501,

502,

503,

504,

505,

506,

507,

508,

509,

510

]

}

},

"statusText": "Internal Server Error",

"xhrStatus": "complete"

}

0 Kudos

same here, better you use old method of adding widgets , I mean edit the page (view) search for the widget and then submit.

I am now running HF4.  Seems to be stable for the most part.  Is anyone having an issue that the initial login is taking 3-4 minutes for the site to load?  Once inside the webpage everything seems good speed wise.  If I log out and back in it loads quickly.  Event viewer looks clean on the server.  Resources are good too.

It takes WAY more than that for me. More like 20-30 minutes to cache all that stuff.

0 Kudos

Are you having any odd issues with your Solarwinds Agents?  On Saturday around 5CST all of my agents went offline and some of them came back online automatically but others did not.   I still have a few I need to check on why they did not start.   I kind of forgot about the agent issues until this morning.  I guess I could always adopt the mantra - "Reboot until it works" lol

0 Kudos

We have a similar issue with our agents as well. I haven't been able to tell if its the actual agent acting up, or an issue with the polling engine communication. I'm inclined to lean toward the polling engine as we will usually drop an entire poller's worth of agents at once.

0 Kudos

So this morning I was trying to add some nodes to a monthly SLA report we run for up time on our Infrastructure Equipment. I went to try to add the custom property and now get this.  It has been sitting on loading for a good 10 mins now.  Hot Fix 4 = NO GO. It actually made things worse.  Anyone know what is actually going on here?  I am having a Team meeting this morning and was going to go over the report BUT due to Solarwinds instability will be shelving this for now. Thoughts  aLTeReGo​ ?pastedImage_0.png

0 Kudos

the group by 'most commonly used' caues this for me and has done for a while.

try removing that

How does one go about removing that group?

0 Kudos

Sorry just click and select one of the others from memory I think there’s an option like none or something