This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

DB Maintenance - How to Handle It When Some Tables Have Tens of Millions of Rows?

As an accidental DBA, I'm having trouble determining how to handle maintenance on our Orion DB. When using the built in DB maintenance plan, the maintenance jobs inevitably end up running longer and longer due to HUGE amounts of data being dumped in a few tables daily. Eventually, the jobs run for days on end. The issue is several tables are growing to tens of millions of rows with one growing to hundreds of millions before the configured 7 day retention pruning occurs. Without shortening the retention period to clean out the daily tables quicker or turning down the volume on collected node info, I'm weighing my options for running maintenance to re-org/rebuild indexes and update statistics without having locks on these tables for days.

Here's my thoughts - open for comment:

1. Disable the built-in DB maintenance (how...rename the .exe?)

2. Create stored procedures for each of the following (based on Ola Hallengren's SQL Maintenance Solution at https://ola.hallengren.com/​​):

     a. Database backups nightly

     b. Data integrity checks nightly

     c. Nightly index re-org/rebuilds and update statistics (actions determined by % fragmentation)

It's my understanding that stored procedures run more efficiently than SQL Maintenance Plans or maintenance jobs built by SQL queries (which is how I believe the .exe is doing it). So, how do those of you with large environments and tons of data being stored in your DB's handle this?

Thanks for any help!

Parents
  • So I am fighting a similar battle myself and I find some of your points interesting.

    Here is my scenario:

    When we turned on SNMP traps, we flooded the database, Traps and TrapVarbinds tables...this caused us to run out of space on the databas...this seemed to have had many significant down stream effects, other tables were not getting maintained because the DB was simply in an unhealthy state then.  Once that happens, everything gets fragmented horribly, although I have since turned on the Defragmentation bit and everything seems to get fragmented daily anyway

    We proceeded to Truncate a bunch of the tables (Traps, TrapsVarbinds, InterfacedAvailability_Detail, NPMMultiast, maybe a couple more, and after each truncate I would run the DB Maintenance job manually to see if it made it past the "bad" table...when it did I noticed it get stuck on a couple more we would truncate those and then move on until it made it through the entire job...I can't tell if its the database or the table spaces causing the issue, but we ran fine for 7 days after this...on the 7th day, something happened during the DB Maintenance where a Select statement was getting blocked by a Delete, and the DBAs killed it...from then on, during this last week I haven't been able to run DB maintenance and we are again going to truncate the InterfaceAvailability_Detail table...this table has the most rows in our database, and is where the Maintenance job is currently getting stuck.

    My first question is, there is a second job that runs every Sunday called the Interface Baseline, I wonder if these 2 jobs conflicted at that time, one doing the Delete and the other doing a Select on the same table, or if some other outside force was hitting the DB at that time, user generated query maybe.

    I have found the DB Maintenance to work very well on a healthy database, and from the sounds of yours, if the maintenance is taking longer than an hour, you may need to Truncate tables to fix things up.

    You can tell which tables to truncate by watching the swdebugMaintenance.log file on the C:\ProgramData\Solarwinds\Logs\Orion folder on your main poller...the bigger tables do take a few minutes longer to get through, but we are talking minutes not hours, when its hours, there is an issue.

    Support came back with kind of a canned response about re-installing the InformationServicev3 on the poller, but I don't see an issue there (anyway, but thats a different story)

    We are truncating tonight to fix the DB up again and seeing where we get in 7 days, I am concerned that this will happen again, and thats where I find your solution interesting using Stored Procedures instead of the Nightly Maintenance.

    Have you gotten anywhere with this?

    Jason

  • jxchappell

    Did you truncate the whole InterfaceAvailability_Details table?  If so, did you run into any issues in the web interface with any errors relating to truncating the table?

Reply Children
No Data