cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 12

Terrible Customer Support Experiences Lately!!!!

Is anyone else having issues with SW Tech support?

I have three open cases with an average duration of over 2 weeks and it seems as if no one at all in Tech Support cares or wants to render assistance!

My Orion NPM was having major issues, so I moved it to a new server.  Then the problems started and have continued incessantly!

No web page at first.  Finally upgraded to 10.4 on my own, that fixed this issue.  No attempt at communications from Tech Support.

Next after fixing the above issue, now my default Node Detail page seems corrupted and will not work with any node!  Sent Diagnostics.  Still no contact from Tech Support!

So here I am with a SW Orion NPM that really isn't usable for over two weeks and I am not getting anywhere with Tech Support.

I am curious if anyone is having issues.

I have been a loyal SW customer for approximately 5 years.  I currently own and manage the below modules:

NPM

SAM

LEM

NCM

and Network Engineer's Toolset and LANsurveyor Express....Oh and I am SW certified!

I am looking into VNQM, but now I am thinking about scrapping the whole shebang because of this terrible Tech Support!!!!

Still I am not getting support worth a darn recently!!!!!

Tags (4)
111 Replies

huh. I have an issue where every 1-2mos our DNS resolution for NTA just tends to die, but simply restarting the netflow service fixes everything. The tech actually kinda sucked, because he suggested moving things to on demand instead of persistent DNS - which isn't a fix for the issue at hand. I don't have the opportunity to spend a ton of time chasing down a minor bug with a relatively minor bandaid, but wish I could get it resolved permanently someday.

No it did not.  Still working with Support on the issue. Any information would help. 

0 Kudos

Sure.  Here is my email address.

pklipa@planethomelending.com

I would be happy to collaborate with you on your issues.  I pretty much just did the exact same thing you did, and resolved a majority of the Web Console errors, and got the system back up and running.  Still periodically run into a few SQL messages in the web console but they usually go away once I stop and restart the services on the Orion server.  I am going to open a ticket with support to get them permanently addressed next week.  Let me know how I can help.

Regards,

/Pete

0 Kudos
Level 7

Did your technical issues with the new server that you moved get resolved?  I know probably what your problems are a result of.  I can give you a few pointers if you are still in a pickle with the new server.  Let me know and I would be happy to give you some pointers.  It will require access to look inside the SQL tables after looking at the web console error messages.

/Pete

0 Kudos

I think I will getting on the phone today.  My IPAM is having serious issues with "hanging/stuck" scans and my client is getting impatient. 

0 Kudos

Let us know how it goes Eric... I haven't had to open a ticket in a little while but I've been having some issues with a UnDP that won't correctly graph the data... it seems like the graphs show part of a day and that's it... like it's not showing all the historical data for some reason.

0 Kudos
Level 9

I usually just call in to get the fastest resolution.

They have very professional employees that are very knowledgeable.

I have not had any problems myself.

Last week I had a very positive experience with SW tech support, where a process was out of control and preventing my NTA from displaying anything--and as a side effect it was also causing NPM to bog down and eventually stop displaying.  I built a case online with all the diagnostics, then called into Support.  I was on hold for a fair amount of time, but when someone picked up they were able to help me immediately.  The person was VERY competent with the advanced level troubleshooting, and he recognized I knew what I was talking about--he didn't ask me time-wasting newbie questions, but got right down into the guts of it all.  He made a few tests, explained what I was seeing in the WebEx session, and applied a simply change to correct the issue and restarted a service or three.  Just like that, all was working perfectly again--NPM was fast, and NTA was displaying all the data I needed for forensics.  He ran the diagnostics service and I uploaded them all to SW later in the day.  He got back to me right away and had one step of cleanup for me to complete.

100% satisfaction.  One call and I was back in business.  I was happy to have that kind of outcome on my first call to SW Tech Support in over a year.

rschroeder​,  Are you able to post a link pointing to the information about your issue?  

CourtesyIT​, here's some info after the issue was resolved:

Initial indications:  NPM  became very slow to respond.  NTA displays were empty.

I contacted my Server Administrator who found the NPM server was attempting to connect to a Solarwinds database server (not SQL) and was being refused.  He found a process on that database server was running 100% of the CPU, and saw it had been increasing in regular steps over the last several weeks, until it was out of CPU resources.

CPU load graph on the SW database server:

pastedImage_0.png

Process consuming the CPU:

pastedImage_1.png

This created an error on the server, which he shared with me:

===================

ErrorCode: NetPath_50014

Message: Flow integration error: RunQuery failed, check fault information.

Error while executing FastBit query: 'SELECT SourceIP AS C3, DestinationIP
AS C4, CASE WHEN ((Flags&1)<>0) THEN InterfaceIDRx ELSE 0 END AS C5,
CASE WHEN ((Flags&2)<>0) THEN InterfaceIDTx ELSE 0 END AS C6,
(SUM(CASE WHEN ((Flags&1)<>0) THEN Bytes ELSE 0 END)/540) AS C1,
(SUM(CASE WHEN ((Flags&2)<>0) THEN Bytes ELSE 0 END)/540) AS C2

FROM Flows

WHERE (((((((Flags&1)<>0) AND (InterfaceIDRx IN
(41587,41530,41509,41443,41508,41495,41494,41605))) OR
(((Flags&2)<>0) AND (InterfaceIDTx IN
(41587,41530,41509,41443,41508,41495,41494,41605)))) AND
(((TimeStamp>1495027066) AND (TimeStamp<=1495027666)) AND
((Flags&16)=0))) AND ((Flags&3)<>0)))'. Ended with error: Could
not connect to net.tcp://e-6atd-swndb01:17777/orion/nta/FlowStorageService. The
connection attempt lasted for a time span of 00:00:01.0313075. TCP error code
10061: No connection could be made because the target machine actively refused
it 10.30.26.54:17777.

Exception:
System.ServiceModel.FaultException`1[SolarWinds.InformationService.Contract2.InfoServiceFaultContract]:
RunQuery failed, check fault information.

Error while executing FastBit query: 'SELECT SourceIP AS C3, DestinationIP
AS C4, CASE WHEN ((Flags&1)<>0) THEN InterfaceIDRx ELSE 0 END AS C5,
CASE WHEN ((Flags&2)<>0) THEN InterfaceIDTx ELSE 0 END AS C6,
(SUM(CASE WHEN ((Flags&1)<>0) THEN Bytes ELSE 0 END)/540) AS C1,
(SUM(CASE WHEN ((Flags&2)<>0) THEN Bytes ELSE 0 END)/540) AS C2

FROM Flows

WHERE (((((((Flags&1)<>0) AND (InterfaceIDRx IN
(41587,41530,41509,41443,41508,41495,41494,41605))) OR (((Flags&2)<>0)
AND (InterfaceIDTx IN (41587,41530,41509,41443,41508,41495,41494,41605)))) AND
(((TimeStamp>1495027066) AND (TimeStamp<=1495027666)) AND
((Flags&16)=0))) AND ((Flags&3)<>0)))'. Ended with error: Could
not connect to net.tcp://e-6atd-swndb01:17777/orion/nta/FlowStorageService. The
connection attempt lasted for a time span of 00:00:01.0313075. TCP error code
10061: No connection could be made because the target machine actively refused
it 10.30.26.54:17777. (Fault Detail is equal to InfoServiceFaultContract [
System.Exception: Error while executing FastBit query: 'SELECT SourceIP AS C3,
DestinationIP AS C4, CASE WHEN ((Flags&1)<>0) THEN InterfaceIDRx ELSE
0 END AS C5, CASE WHEN ((Flags&2)<>0) THEN InterfaceIDTx ELSE 0 END
AS C6, (SUM(CASE WHEN ((Flags&1)<>0) THEN Bytes ELSE 0 END)/540) AS
C1, (SUM(CASE WHEN ((Flags&2)<>0) THEN Bytes ELSE 0 END)/540) AS C2

FROM Flows

WHERE (((((((Flags&1)<>0) AND (InterfaceIDRx IN
(41587,41530,41509,41443,41508,41495,41494,41605))) OR
(((Flags&2)<>0) AND (InterfaceIDTx IN
(41587,41530,41509,41443,41508,41495,41494,41605)))) AND
(((TimeStamp>1495027066) AND (TimeStamp<=1495027666)) AND
((Flags&16)=0))) AND ((Flags&3)<>0)))'. Ended with error: Could
not connect to net.tcp://e-6atd-swndb01:17777/orion/nta/FlowStorageService. The
connection attempt lasted for a time span of 00:00:01.0313075. TCP error code
10061: No connection could be made because the target machine actively refused
it 10.30.26.54:17777.

at SolarWinds.Data.Providers.Orion.Netflow....).

===========================

As an NTA / NPM user/admin, I'd see this error when trying to connect:

pastedImage_2.png

I contacted SW Support and shared the info via the highest listed urgency in the online solution.  Once it was all loaded up and I had an incident number, I called SW support and provided that number.  I was on hold for a little longer than expected, but the person who picked up the phone was extremely competent, and had a very positive attitude.  True or not, I felt he'd get the issue resolved quickly, just based on his attitude.  Kudos go to Daniel Polaske for his skills and style, and for fixing this efficiently and quickly.  Notes from the case and from him follow.

Update for Case #1169364 - "NPM continually hangs due to NTA issues associated with a database. We've lost ability to use NTA. The server is at 100% CPU."


We needed to reboot the NTA Flow Storage Database server and after running the NTA Flow Storage Configurator to repair services, it seems all is well now.

I've included the instructions to apply AV exclusions to that server as well as the diagnostics collection and upload instructions so that I can do a postmortem to see if we can determine why the CPU was stepping up to 100% over the course of a few weeks.

https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/Files_and_directorie...

How to gather and send Diagnostics:

In order to further investigate your issue can you please provide me with following?

1. Screenshots of the issue that you are having and any errors that occur

2. A set of diagnostics.

Our diagnostics program gathers a detailed set of application logs from your SolarWinds installation. These logs will assist us in rapidly diagnosing your issue.

How to Send Diagnostics:

1. Open
- All Programs
- Solarwinds Orion
- Documentation and Support
- Orion Diagnostics
- Please do not press the Active Diagnostics button as this will not provide the necessary details
- Press Start

How to Gather Flow Storage Diagnostics [from server where Flow Storage DB resides]

Start -> SolarWinds Orion -> Netflow Traffic Analyzer -> NTA Flow Storage Diagnostics
Press Start

2. This will create a zip file of the results - it will contains logs, events, and some information regarding the database etc.
3. If the file is smaller than 5MB please attach it in the reply to this email.
4. If it is larger than 5MB please do the following
- Open your browser and navigate to solarwinds.leapfile.com
- Click on the link labeled Secure Upload.
- In the Recipient Email box put support@solarwinds.net
- Enter your name and email address.
- In the subject line field put "Attn <agent name> case # <case number>."  A subject is required to move to the next page.
- Click the link for Select files to send (Regular Upload).  The Enhanced upload can take a long time so please do not use this option.
- Add any instructions to clarify for the SolarWinds tech how to use your files. 
- At the bottom, click the button for "Select files to send (Regular Upload)".
- On the next page, attach the files you want to send.
- Please include any screenshots that you consider relevant to the case.
- At the bottom click the "Upload & Send" button.

Important note:

The Leapfile system does not notify your SolarWinds agent that they have files available.

Please send your support representative an email to let them know that your files are waiting.

------------------------------------------------------------------------------

SW Tech Support  found some issues with the NTA Flow Storage Database updates files which are used to temporarily cache data when applying IP groups updates, etc.  They arranged a WebEx with me and cleared those files out, then restarted services to test.  All NTA services/display immediately began working properly, where they'd be hung before.

-------------------------------------------------------------------------------

Follow up/Clean up after running the above steps:

Update for Case #1169364 - "NPM continually hangs due to NTA issues associated with a database. We've lost ability to use NTA. The server is at 100% CPU."


Stop Netflow Services on the Flow Storage Database using services.msc - please stop the SolarWinds Netflow Storage Service and the SolarWinds Netflow Storage Server Watcher.

Start the NTA Flow Storage Configurator in the SolarWinds Orion > NetFlow Traffic Analysis program folder.

Take note of the path to the current NTA Flow Storage destination folder.

Click Cancel to close the Configurator.

Navigate to that folder and move the following files to the desktop:

update.jrn

update.tmp

Please then run the NTA Flow Storage Configurator.
I suspect that this should resolve the issue permanently.

Level 12

Another upgrade, another failure and now I am stuck with unresponsive support. This time the failure is due to wrong procedures handed by the previous support. And the current support is getting offended and starts shouting when I express my frustration. This is a new low!!!

I saw some big threads about what Solarwinds is doing to improve support channels, I would really like to see some tangible results. For heaven's sake, please set aside a proper escalation channel! Now we are at the mercy of whoever picks up our ticket.

0 Kudos

Update: I was able to get in touch with team lead who lined up an APE to work with us. He even offered to arrange an APE to review our upgrade preparation going forward.

The chasm between the skill level of APE and first level support is huge and needs bridging. For something as simple as cleaning up left over programs after unclean uninstallation, the support spent around an hour fishing in the registry trying to find out which ones to delete. When APE jumped in, he straightaway downloaded fixit tool from microsoft and ran it. That issue was fixed in a matter of minutes and we were able to move on to the next step.

I would expect Solarwinds to have a standard procedure for a basic recurring support task such as this, for internal reference. I know that such reference procedure doesn't exist because I have seen another APE clean up bad installation by deleting registry keys, except that he was much quicker in identifying the relevant registry entries. Same level of familiarity cannot be expected from a junior support and it is up to documentation and procedures to bridge this gap. It leaves a very inconsistent experience for customer when every troubleshooting procedure is different and left to the whims of the engineer.

If anyone wants help in the UK setting up Solarwinds correctly I'm free from the end of June and can make your environment play nicely. Just send me a message and I'll do my best to help. I've just done a great little project implementing solarwinds for an NHS body in the UK.

0 Kudos
Level 14

In all fairness, I believe the product is such a huge, complex overbearing beast from a user standpoint that we personally are seeing overall quality dropping with 12.0.1. There are so many moving pieces and parts now, especially for people like myself with multiple products in one environment, on one box.

The amount of bugs and problems we are seeing with 12.0.1 is making our management extremely nervous since we rely on SolarWinds to monitor over 10 very large companies. Our entire SolarWinds deployment has been half broken since 12.0.1. Services crashing every 10 minutes, features broken, Agents that we are monitoring for Azure and AWS dropping all day - availability reports shot to hell because of this and PMs wondering WTF is going on (remember when SolarWinds tried to charge for Windows agents, then realized how crappy they are and now give them away?), slowness and overall bugginess makes me wonder what kind of mess we are in with the 12.0.1 nightmare.

I think also that the piling on of several different components on the same box wreaks havoc with the frequent hot fixes and upgrades. I need to look into distributing multiple products across multiple servers, any advice on that?

Sounds like you need an infrastructure monitoring specialist to sort it all out for you. I find that users tend to blame Solarwinds for problems in the environment itself.

Many environments are setup by engineers who frankly don't have a clue and don't stick to guidelines. I see application developers deploying programs that require constant application pool resets.

I have seen users totally ignore alerts and reports and then moan when a system crashes or breaks down. If an agent drops there is usually a very obvious reason why and usually its because of the way that the node is configured.

I was once asked to copy an availability report for a customer that was just a big green dot with 100% on it. I asked them what it represented and they had no clue.

I gave them a proper availability report with other information, then I sat down with them and explained to them the value of monitoring and reporting correctly along with what this information represented.

The customer was much happier.

There is a lot going on with Solarwinds, and this does cause problems with requests from all over the business exposing the infrastructure monitoring engineer to many different technologies. With good project management, a structured approach along with a solid environment to monitor, you can maintain excellent standards and provide a great level of service.

It isn't easy but its worth the investment.

Our environment was setup and deployed by one of the high profile SW vendors. The guy was very competent until he sort of disappeared on us near the end of the project when the PM realized SAM for SQL only 3 or so alerts out of the box per the Admin PDF and caused an uproar since we lost a TON of unique alerts that the other tool gives you out of the box and we would have to build our own to get the same functionality but that would take a few months to reverse engineer to port to SAM.

I cannot blame any of our issues on a poor deployment. Another unique environment I manage for a large Human Resources company has the EXACT same issues so it has nothing to do who sets it up. There are blaring problems in 12.0.1 which is why I see that many people start fresh when upgrading and do not do in place upgrades. For us, doing that would be very difficult as the networking setup is very complex and not easily duplicated by just setting up another box and restoring. Our main polling engine is physical. We can't just re-do a massively complex setup just because the software we spent a TON of cash on isn't reliable for in place upgrades of several different components.

There is nothing that we did wrong in upgrading when we follow the upgrade steps outlined by SW. Executing an MSI package and clicking next, next whether done by my grandmother or an MVP, makes no difference.

I just got SolarWinds a huge order for another client so I'm not the angry hater some guy here portrays me to be. I am a proponent of the tools, however I have seen that it's turning into such an extremely fragile and easily broken, gigantic fragmented mish-mash of code. 3 different environments that were setup by competent people have the same Application crashes in the logs, the same bugs, same slowness. So I can't blame anyone and say that these environments were simply setup by college interns who had no clue what they were doing. Just the opposite.

I appreciate the complexity that we're all dealing with in managing this fragile beast of a product. But the stability and consistency seems to definitely be lacking when I see it affecting 3 totally unique environments.

It seems getting NPM and the other applications under its umbrella to a point of a perfect model of stability is a long way away and an epic challenge. But I am rooting for whoever is in charge to do it somehow. Soon. Please.

Hi,

I just wana share something that i faced.

There was one customer who was facing issues on version 11.5.2, so they upgraded to 12.0 so they were having alot of issues, and I was there for about 2 weeks, troubleshooting the issues and fixing the issue with and without TAC Support. But when we made another machine and installed 12.0.1 from scratch, the customer went so cold like it never existed and I haven't heard from them since, and 2 to 3 days earlier I met an employee who was a senior in position, we discussed about how the new installation is doing, and i found out, it has been perfect since the 1st day of installation.

Just to be clear, you are talking fresh install of NPM 12 but not a fresh DB, right?

fresh install of NPM, obviously you will take backup of your DB and and sync your sql database with the NPM while configuration.

Perfect, that is as I thought. Thank you muhammad.imran