So I am fighting a similar battle myself and I find some of your points interesting.
Here is my scenario:
When we turned on SNMP traps, we flooded the database, Traps and TrapVarbinds tables...this caused us to run out of space on the databas...this seemed to have had many significant down stream effects, other tables were not getting maintained because the DB was simply in an unhealthy state then. Once that happens, everything gets fragmented horribly, although I have since turned on the Defragmentation bit and everything seems to get fragmented daily anyway
We proceeded to Truncate a bunch of the tables (Traps, TrapsVarbinds, InterfacedAvailability_Detail, NPMMultiast, maybe a couple more, and after each truncate I would run the DB Maintenance job manually to see if it made it past the "bad" table...when it did I noticed it get stuck on a couple more we would truncate those and then move on until it made it through the entire job...I can't tell if its the database or the table spaces causing the issue, but we ran fine for 7 days after this...on the 7th day, something happened during the DB Maintenance where a Select statement was getting blocked by a Delete, and the DBAs killed it...from then on, during this last week I haven't been able to run DB maintenance and we are again going to truncate the InterfaceAvailability_Detail table...this table has the most rows in our database, and is where the Maintenance job is currently getting stuck.
My first question is, there is a second job that runs every Sunday called the Interface Baseline, I wonder if these 2 jobs conflicted at that time, one doing the Delete and the other doing a Select on the same table, or if some other outside force was hitting the DB at that time, user generated query maybe.
I have found the DB Maintenance to work very well on a healthy database, and from the sounds of yours, if the maintenance is taking longer than an hour, you may need to Truncate tables to fix things up.
You can tell which tables to truncate by watching the swdebugMaintenance.log file on the C:\ProgramData\Solarwinds\Logs\Orion folder on your main poller...the bigger tables do take a few minutes longer to get through, but we are talking minutes not hours, when its hours, there is an issue.
Support came back with kind of a canned response about re-installing the InformationServicev3 on the poller, but I don't see an issue there (anyway, but thats a different story)
We are truncating tonight to fix the DB up again and seeing where we get in 7 days, I am concerned that this will happen again, and thats where I find your solution interesting using Stored Procedures instead of the Nightly Maintenance.
Have you gotten anywhere with this?
We ended up truncating the TrapVarbinds table several more times while the app owner was tweaking what data we were collecting. Turns out the massive data collection was coming from traffic monitoring on our ASA - apparently finding info around SNMP detail from Cisco is worse than finding needles in a haystack. In the end, we were able to dial in our monitoring and data collection to keep the table sizes in check. Afterwards, the maintenance plan would "generally" finish with no issues. I was checking swdebugMaintenance.log daily and cleaning up any indexes not related to daily tables on a regular basis.
Since then, I have done the following and things are much nicer now (although we still have a few small offenders and a storage hardware challenge):
1. Shrink the database to a size based on the tweaked data collection (monitored size for two weeks and shrunk based on that baseline)
2. Moved the data and log files to 64kb block size volumes (SQL Server stores data in 64kb blocks)
3. Added three data files to the PRIMARY file group to increase read/write performance on data stored in that filegroup
4. Implemented reindexing nightly and DBCC integrity checks twice each week via SQL SPs
The key to fixing the Orion Maintenance reindexing failures was tweaking the data we were collecting - close those flood gates and you can actually swim!
Hope this helps! Sorry it took so long to reply.