SQL log for SolarwindsOrion db expands to fill disk space every night during maintenance. This is new since the upgrade to 2020.2.1 HF2.
Anybody else having the same issue?
The issue is still on-going and Solarwinds hasn't been able to get a work-around so far. So I have 2x 100GB SQL log drives and they are usually only a few % in use, and our main database is about 239GB with 63GB free. It's kind of weird because I seem to remember this being about 90GB with around 20GB of free space, but looking back 90 days it's been around this level. I installed HF2 on the morning that it became available, 15th December. Before then the main SQL log drive was hardly used, now during maintenance it just runs out of space.
A little more info. It took some time before my ticket finally got assigned. The response to my ticket was disappointing - support simply reported what I knew I already had in terms of pollers, suggested I re-balance them, that was it. Never even mentioned SQL or the Solarwinds maintenance job. This problem occurred the moment I went to HF2 (from 2020.2.1 no HF).
We have VMAN installed, which might be why others didn't see the issue, or see it yet. I checked the Solarwinds database using AppInsight for SQL and saw that the top table was VIM_VirtualMachineDisksStatistics_CS, with a row count of 181 million.
I checked the retention parameters, and the VMAN thresholds are not with the other retention thresholds, but separate.
We keep data for 90 days as that keeps the database reasonably responsive (it's on physical h/w with a ton of cpus/memory/fast ssd), but the VMAN settings were for 180 days, and when I took the top 10 records from the table and I could see data was timestamped 6 months ago. So I changed it to 90 days and ran Database Maintenance manually. It filled up the disk. I added another log file, another 100GB, and it filled up that.
The maintenance job seems to be failing now because we have 181m rows and it can't handle it, or it handles it in some inefficient or storage-intensive way. Funny because no other table has ever caused the log files to inflate massively. I knew it was this because I sat there watching the disk space and as it suddenly started disappearing rapidly I checked C:\ProgramData\SolarWinds\Logs\Orion\swdebugMaintenance.log and could see it was running maintenance for that table
2021-01-04 12:26:49,581  INFO SolarWinds.Data.DatabaseMaintenance.TimeSeriesTableHandler+TimeSeriesMaintenanceExecutorContext - Running maintenance for VIM_VirtualMachineDisksStatistics_CS
That job took down our Solarwinds installation, web interface, etc. I restarted the database but it's now in SQL Recovery mode - so we've had our first Solarwinds outage this year, about an hour.
Now I will just sit here and wait for SQL to recover the database and get back to the real work I'd supposed to be doing.
Btw we Have SQL 2016 and I’m planning to update to SQL 2019 very soon.
We had the exact same problem
We truncated the table which the allowed DB maintenance to complete but we are now seeing errors about mismatching column in other tables.
Following errors were detected: - 2021-01-13 09:10:02 : Error in thread: System.Data.SqlClient.SqlException (0x80131904): dbm_ExecuteMaintenance: dbm_AggregateData_part: dbm_AggregateTimeSerie_part: ALTER TABLE SWITCH statement failed because column 'AvgIOPSTotal' at ordinal 3 in table 'SolarWindsOrion.dbo.VIM_VirtualMachineDisksStatistics_CS_Hourly_stg' has a different name than the column 'MinIOPSTotal' at the same ordinal in table 'SolarWindsOrion.dbo.VIM_VirtualMachineDisksStatistics_CS_Hourly_hist'. ALTER TABLE VIM_VirtualMachineDisksStatistics_CS_Hourly_stg... - 2021-01-13 10:33:02 : Defragmentation for Table: APM_ComponentStatus_Detail Index: IX_APM_ComponentStatus_Detail_ComponentID_TimeStamp Timeout: 2648 Fragmentation: 25.5071022369966 ForceRebuild: False failed: System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out...
Having looked i can see we have another two more tables with old data in it
We have a ticket with support and the developers are investigating
Anyone else seen the same ? Or better still anyone got a fix ?
We have had to expand the SQL log volume for our DBAs a couple of times as our environment has aged and grown. It may be that you were close to the threshold already before applying HF2.
Of course, it's also possible that HF2 is writing more data to the DB (which would get written to SQL logs, and then committed to the DB during maintenance or backups). Do you have a history of SQL log utilization for your Orion DB? Out of curiosity, I'll also check with my team and see if we have any changes in this area.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.