This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Swis v3 spikes with searching alert variables.

I wanted to see if anyone in community has had an issue like this.

1. Open alert manager.

2. create an alert. Setup up parameters.

3. On the Trigger condition email. Click insert variable, type in the search box a variable name and have the loading spin forever, or have it return something other than what you have searched for.

After some time passes by, you notice the site getting slow, and eventually hang and become non-responsive. You RDP to you primary server to find it either not responding or responding but extremely slow. You open task manager to find swis v3 pinned your cpu at 100% and has used over 98% of the servers ram. Practically grinding your server to a halt.  Our server has 24 cores at 2.8 ghrz, and over 30 gigs of ram. It has one fiber connection at 10 gigs. And just doing a variable search managed to peg out even a strong server as this.

When we dig into the system logs for swis v3, we find a massive expansive query triggered every time you click search on the variable field. And this query loops until it kills swis v3.

Spoke with 4 SW tech's.. couldn't find the cause. Went through 2 application engineers and nothing. Keep in mind, the server isn't logging system logs properly. And has many other issues but this one being the most critical as it has rendered our alert manager and web report writer useless. And those are core components of npm.

Would love to get insight from any, including SW dev's since this case is with them currently.

Thanks!

  • I find that the search is very hit or miss, some clients it will execute not problem and give me the values instantly, others it just spins forever.  I know how to find most of the variables manually so I just don't use the search anymore because I hate having it stall out while a client is watching me.  Doesn't seem to be obviously related to the capabilities of the server or responsiveness of the website as sometimes the search is fast for people with slow consoles, other times it is slow for people with lightning fast consoles.

  • Our environment is a one-off special case problem child. We where close to wiping these servers clean and starting from ground zero, but decided we wanted to chase the root cause instead of going that route to prevent the problem from ever happening again.

    Only reason I posted this to the forum is because I was curious in hearing and finding out if anyone shared similar pains and what they might of done to improve things. This made it to DEV when management jumped on the last call with SW demanded a developer check our environment before they consider a different solution. And SW passed it to them. They have already found one problem relating to browser issues that they fixed. But more is needed.

    If anything, I'll put an update on here on what is found. Just in case should someone ever have a similar issue they'll know what the fix is.

    I have confidence we'll get this corrected. It's just taken forever to get to a solution.

  • The funny part and frustrating at the same time is that not even solarwinds dev department knows what's going. I find it incredible that so many minds are involved and everyone is scratching their heads.

  • Did you ever find a resolution to this? I am experiencing this in the latest HF release.

  • I'm also seeing this. Hardware health variables take 20-30 minutes to show up in the "Insert Variable" window and the website slows way down. Looks like the query causes Information services to struggle, or struggling information services make the queries fail. Either way, Core Services are full of Long running queries, and long running Invokes now.  Tried ensuring everything was fully set per the "repairing orion core services" whitepaper, ran the repair and config wiz and still see the same behavior.  Creating a case in the morning for it. on the latest Hotfix 3 as well.

  • My ticket is currently being investigated by developers, once I know more I will update.

  • Hello,

    Have you found the root cause of this behavior? It has been an "ever since" issue that is still present in the latest release.