We upgraded our database server this past weekend. It wasn't a shabby box prior to the upgrade. It was an HP DL360 with dual Intel Xeon E5-2660 @ 2.2GHZ for a total of 40 cores and 256GB RAM. We were also running a 4x800TB SSD array. All of this to support an environment with 17 additional polling engines and about 13,000 nodes (nearly 98,000 elements) and about 87,000 application component monitors. Our DB is right around 550GB.
Seems like a lot of power, right? We decided to add another 512GB RAM (for a whopping 768GB total) and added a mirrored set of SSD drives for our SQL log files. Our expectation was that if we could run our DB in memory (SQL is allocated 736GB) and move our SQL logs to a dedicated array that we would absolutely crush any sort of hardware constraints for DB performance. Yesterday was our first full business day running with the new server in place. Here is what we saw in DPA. (You have DPA, don't you??)

Yep, that's right. We threw all the hardware we could possibly throw at this database and our total waits INCREASED. In fact, they jumped from 114 hours of total wait on Monday, August 22nd to almost 156 hours of total wait on Monday, August 29th. What the heck is going on?
We dug a little deeper. Those two large blocks in the last stacked bar graph are for custom poller inserts and rollups. Those two queries accounted for 24% of the overall waits. The *vast* majority of that wait comes during our DB maintenance (12-5AM) as shown below for the insert query (the purple block above). The bottom blue block all CPU waits and is spread across the day fairly equitably.

There is a 3rd offender that contributes about 5% of the total wait time and is also an insert statement related to custom poller statistics.

We do have a lot of custom pollers (113 unique pollers) with nearly 70,000 poller assignments. That makes for a lot of data. Like 752,944,412 rows in our CustomPollerStatistics_Detail table
We collect those statistics every 5 minutes, but it still a lot of data.
No, we don't have CPU utilization issues and CPU queue length is good.

Even with SSDs out the wahzoo we still get spikes in both read and write latency during our maintenance window. (Maintenance actually starts at 9PM for us)


So, here is my ask for the Thwack community. Do you see similar wait profiles in your environment? If not, and you have the same use profile for custom pollers, did you do anything to help address the issues?
Here are my ideas:
1) Convert custom pollers (UnDPs) to SAM components. (This is problematic as we use some of those UnDPs in transforms to calculate new data points, but it could work for many of them.)
2) Create new pollers for memory and CPU pollers via the Orion UI and assign them to the custom poller nodes. (This allows us to do transforms, but I am not sure if all of our UnDP transforms could be converted over)
I'm open to anything to help reduce the size of that CustomPollerStatistics_Details table and improve our associated wait times.