This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Decrease in WRITELOG waits leads to an increase in Memory/CPU waits

Hello,

We have been testing a NetApp SSD to compare performance from our Hitachi SAN.  One of our "problem" databases is SolarWinds_NPM which has an enormous amount of WRITELOG waits.

After migrating the server to SSD, the WRITELOGs went down significantly but the Memory/CPU waits went UP, almost by the same amount, almost eliminating the performance gain of the SSD.

Why would this occur?  See attached.  Migration to NetApp was on June 6

Thanks for any advice

attachments.zip
  • Couple of thoughts here:

    • Looks like overall waits went down significantly, which is good (half what it was!)
    • Less time spent waiting for a transaction to commit, the more time for actually processing the next insert, update, or delete
    • Healthy systems should be spending their time in memory/CPU (anything else is considered a suspended activity)

    Couple of things to look at:

    • Is your NPM repo a database that's part of an Availability Group (AG)? If yes, no worries, but that can also impact commit response time depending on replication mode
    • Is your NPM repo database running in simple recovery mode? If not, that is recommended by SWI for performance. Success Center
    • If part of an AG, you have to run in full recovery mode, so no choice unless you remove it from the AG...

    OK, so your profile is looking better from a wait category aspect (doing primarily logical I/O operations in cache now). However, there can still be inefficiencies that cause too much logical I/O. The next step is to check out table tuning advise (SolarWinds Database Performance Analyzer Workload Optimization - YouTube ) and perhaps look for inefficient SQL, especially custom SQL or SWQL that isn't tuned. Is the version of DPA at least 12.0?

  • Thanks for the info!

    Not on AlwaysOn. It was initially installed with FULL recovery model, we switched to bulk logged but no improvement. I need to check with the app owner so see if we can go to simple (should be ok), 

    DPA version is 12.0.3074. I need to check with Network Team on version of NPM.

    No custom code on NPM.  if there is inefficient SQL, it's coming from you guys! emoticons_wink.png

  • Going to simple recovery mode will help a lot!

  • Not seeing any improvement in SIMPLE mode....any other ideas to improve?

    2019-07-09_17-04-48.png

  • Hmmm, so the SSD was really the performance improvement for WRITELOG. Do you know what kind of storage your transaction logs are on when not Netapp SSD? I guess what I'm asking is are they on RAID 5 or a RAID that may be doing parity calcs when doing write operations?

    A good thing at this point would also be to start digging into specific activity that's contributing to the wait time. Click on a specific day. Then in the upper left of the chart you should see a drop down where you can change it from 1 hour to 1 day. That puts all of your waits into 1 bar. Click on that bar. Now you can drill into one of the waits like WRITELOG to see highest contributing SQL.

    That may provide additional insight into where the pressure is coming from? If you feel comfortable sharing here, I'll keep an eye out for your post. If it looks like something coming from core or NPM, you may consider engaging support to see if they have any ideas...?

  • Thanks for the help!   Hard to get the SAN details at the moment, our SAN is externally managed and our Infra Team does not know the disk config specifics. Summer holidays, short staffed, everyone seems too busy at the moment,

    Our Infra guy says that it's certainly got parity enabled.  Don't know the RAID level or even if RAID is used, probably custom specific config for Hitachi SAN. 

    I'll try to get a definitive answer soon.

    Attached are two screenshot of the NPM query details for the WRITELOG waits.

    NPM.png

    NPM2.png

  • How many records do you have in the interfaces table?

  • 3,232.  is that a lot?  i'm not even sure what an interface represents here. emoticons_wink.png

  • No, that's not a lot. 8 )

    I was thinking it may be a volume related issue, but not seeing it in that number.

    I think you are on the right track with the parity calc overhead at the storage tier where the transaction log lives. That being the case, there is another thing you can check.

    Go to the Storage I/O super tab in DPA, drill into the same hour that you show above for the update interfaces table SQL.

    Select the Orion database transaction log file from the files drop down (should be a .ldf file type). Also, make sure the Write tab is showing (instead of Read)

    Select the link for Throughput vs Latency.

    That chart will give you an idea of what kind of write latency you are experiencing for that specific file (as seen from SQL Server).