This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion's amazing disappearing and reappearing map trick..

Hi all.

I am trying to determine if this is issue is something we are experiencing because of a database issue or if we are facing multiple issues. Right now, I believe that we are experiencing multiple symptoms that all lead to the same issue.

We have two instances of Orion, and these are kept separate from one each - they don't share a database, they aren't on the same servers, etc etc. Our first instance of Orion has NPM and NTA, and our second instance only has NPM.  Both instances are running: Orion platform 2018.2 HF#6, NPM 12.3, and our first instance has NTA 4.2.3.

We have been battling database performance issues for a while as both instances are fairly large and monitor several thousand elements. In October, I started seeing a message in AD saying our active maintenance failed to delete a file because it was being used by another process and to check our DB health. I've checked, and while I can't clear that that error, the file is gone from Windows temp and it appears that we are completing maintenance nightly (as we should). I've also manually run it since then and it completes within a few minutes so it appears normal; however slowly more errors have began appearing such as TLS 1.0 not being enabled or installed on the SQL server. This is an infrequent one that will show up, along with the "Check Engines and OrionServers integrity (Orion Database health)" error. I've checked the tables noted in the knowledgebase article and al of our pollers are as they should be with nothing extra. Solarwinds did find an "namedpipes" instance that was causing some connectivity issues to the SQL database which explains why we will randomly have issues loading the maps, or even the website. Solarwinds recommends we disable namedpipes, our DBA recommends we keep it on.

Several agents have also thought we were dealing with port exhaustion, but the more experienced agents have ruled this out since our connections were within reason.  There are numerous other things, but I think all of these are pointing to an issue with our database; however, our newest issue started last week. It first happened on our second instance of Orion that only has NPM installed. We have a map that displays the health of each site we have devices in. Last week, two of the states were deleted. There are only two people with account permissions to do that, myself being one, and my co-worker who alerted me to the issue. So this wasn't human error.  In Network Atlas, it will show that the state map is gone, but the city maps for that state will all be there. So, my co-worker redid the state map for one of the two missing states. I asked him to look at it yesterday and he said another state has since disappeared in the same manner - city maps remain in Network Atlas, but the state map is gone.

Today on our first instance of Orion, the one containing NPM and NTA, we had this happened with multiple states. So, on this instance of Orion, we have been having issues where the maps will not render when loading the site, the site will become very laggy/unresponsive, or all other resources will come up but the map. Normally if I bounce services, and at the "extreme" restart the main poller and additional web engine poller our issues disappear. So Solarwinds investigated this, and this was when the found the issues with "namedpipe" being enabled and said our problems stem from database performance issues and if we clean it up it'll be fine.  This was our last communication, so I have been trying to go through the errors that AD shows and resolve the issues. So today our new map issue happens - several states disappear. When I say "disappear" I mean that you can still see that state on the map, but the node light indicating it's health is gone.  Because our second instance had done this, I automatically assumed that the maps had been deleted and went to the main poller to start running diagnostics and to open a case with Solarwinds; however, within five minutes, everything was back to normal and the maps were restored as if nothing ever happened.

So now I am pretty confused. They didn't ever come back on our second instance. Normally when Solarwinds starts "acting up" , I'll check the website and a new upgrade is out and all is fixed. Since they changed the requirements for NPM 12.4 we can't upgrade because the versions of SQL/Windows are no longer supported. I don't have that option. I don't know if this is something that other people are experiencing, or if this falls in line with the weirdness that we are experiencing in our database. Any suggestions/advice/help anyone can provide is greatly appreciated.

.