Hi everyone,
I was wondering if anyone else has had the joy of working with technical support on auditing errors in their information service log and were advised to simply ignore failed queries related around cloud monitoring ?
Few hundred of these per hour no big deal, just ignore it.
"2019-07-01 22:44:41,814 [STP SmartThreadPool Thread #1892] ERROR SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.LimitationSnapshotService - (null) (null) Orion.Cloud.Aws.Instances!=Orion.Cloud.Instances"
While I do trust support when they simply say that I can ignore these errors I also am currently going thru a very unstable database at the moment.
Maybe its just me but in my world if you have a 12 hour outage and writing a post mortem report to management its very difficult for me to type out analysis of your current log activity post repair and state to management to ignore the 285 errors listed above because "Support said so"
I know I can adjust logging to where it looks pretty but come on really ? No way to disable cloud services in the core application ? This goes for all the recent enhancements in regards to cloud monitoring.
And since I am on my step stool barking like a a-hole, It also grinds my gear that anytime we have a problem, majority of the time after we are "strongly recommend" upgrading it typically introduces a new bugs.
We deployed this product 4 years ago globally, My whole team were certified by SW before implementation. We have spent upwards of 2-3 million bucks. Well, we have outgrown the system, and the system has outgrown our budget. Every upgrade since 12.1 has been a nightmare.
We own NPM, NTA, SAM, VIM, SRM, UDT. Essentially the core products work great. But the schema and DB design is crap. The non support for SDK is inconvenient and quite honest very odd. You have a API but you don't officially support it ? Any questions to support regarding SDK you get a blanket message to visit Thwack ? I love this forum and its great KB and kick ass collaborative community but that is not enough.
How about the IIS latency and performance in general ? Performing upgrade or patch? Prepare yourself for the joy of canned objects coming into scope by default. I also like to golf clap on the awesome permissions granularity, it makes so much sense to have basically 3 levels of permission sets. That makes life easy on the end user support admins.
Shall I go on with the lack of Data warehousing options, or the fact that the EOC console is worthless? Reporting and reporting distribution is another major thorn. Try to have 5 users run a report on a 6 double stacked polling system that is collecting 45000 node and 115000 interfaces. I understand why the report is slow to generate, the load it puts on the database, etc ,etc - But if you had a flipping data warehousing option or a statistic feed like "R socket" you could then have dedicated reporting server, archive server. This was a lesson learned after spending 200k on building out a HA deployment and implementing CDC tables with memory optimization on a RAID 5 SSD SQL Server w/ 32 CPU and 10TB of RAM. It works great for first 10 hours post DB maintenance but the table get so fragmented after that you might as well tattoo timeout on your forehead. We have tried every which way on reducing this problem and now giving up on CDC and looking at external ETL system that can replicate , parse and throttle based on resources available.
So as I sit here waiting on support to call me back after the reviewed my DIAG collection upload, I wonder what rock I will kick over next. I sure as hell not starting up Hubble and watch the hundreds of overhead license query's execute........
label me a complainer but 12.3, 12.4 & 12.5 has gone so well for my company that we might never upgrade to 12.6.
BTW - Mr. Tanner - this is not directed at you or your kick ass SDK. - That part of the application has been creative and fun, with the sole exception of no API interface for IP-SLA and CBQoS activities.
-Rant over
Sincerely,
Disgruntled Tired Will