We have been testing a NetApp SSD to compare performance from our Hitachi SAN. One of our "problem" databases is SolarWinds_NPM which has an enormous amount of WRITELOG waits.
After migrating the server to SSD, the WRITELOGs went down significantly but the Memory/CPU waits went UP, almost by the same amount, almost eliminating the performance gain of the SSD.
Why would this occur? See attached. Migration to NetApp was on June 6
Thanks for any advice
Couple of thoughts here:
Couple of things to look at:
OK, so your profile is looking better from a wait category aspect (doing primarily logical I/O operations in cache now). However, there can still be inefficiencies that cause too much logical I/O. The next step is to check out table tuning advise (SolarWinds Database Performance Analyzer Workload Optimization - YouTube ) and perhaps look for inefficient SQL, especially custom SQL or SWQL that isn't tuned. Is the version of DPA at least 12.0?
Thanks for the info!
Not on AlwaysOn. It was initially installed with FULL recovery model, we switched to bulk logged but no improvement. I need to check with the app owner so see if we can go to simple (should be ok),
DPA version is 12.0.3074. I need to check with Network Team on version of NPM.
No custom code on NPM. if there is inefficient SQL, it's coming from you guys!
Hmmm, so the SSD was really the performance improvement for WRITELOG. Do you know what kind of storage your transaction logs are on when not Netapp SSD? I guess what I'm asking is are they on RAID 5 or a RAID that may be doing parity calcs when doing write operations?
A good thing at this point would also be to start digging into specific activity that's contributing to the wait time. Click on a specific day. Then in the upper left of the chart you should see a drop down where you can change it from 1 hour to 1 day. That puts all of your waits into 1 bar. Click on that bar. Now you can drill into one of the waits like WRITELOG to see highest contributing SQL.
That may provide additional insight into where the pressure is coming from? If you feel comfortable sharing here, I'll keep an eye out for your post. If it looks like something coming from core or NPM, you may consider engaging support to see if they have any ideas...?
Thanks for the help! Hard to get the SAN details at the moment, our SAN is externally managed and our Infra Team does not know the disk config specifics. Summer holidays, short staffed, everyone seems too busy at the moment,
Our Infra guy says that it's certainly got parity enabled. Don't know the RAID level or even if RAID is used, probably custom specific config for Hitachi SAN.
I'll try to get a definitive answer soon.
Attached are two screenshot of the NPM query details for the WRITELOG waits.
No, that's not a lot. 8 )
I was thinking it may be a volume related issue, but not seeing it in that number.
I think you are on the right track with the parity calc overhead at the storage tier where the transaction log lives. That being the case, there is another thing you can check.
Go to the Storage I/O super tab in DPA, drill into the same hour that you show above for the update interfaces table SQL.
Select the Orion database transaction log file from the files drop down (should be a .ldf file type). Also, make sure the Write tab is showing (instead of Read)
Select the link for Throughput vs Latency.
That chart will give you an idea of what kind of write latency you are experiencing for that specific file (as seen from SQL Server).
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.