Need Help with Cortex consuming all disk space.

Hi,

With our windows systems we have them all on agent monitoring. And recently we've noticed an increase in alerts with disk space being fully consumed. When we checked the Cortex part of all agents had over 30gigs of disk space consumed and in many servers this was enough to consume all available disk space. Meaning solarwinds was crashing our tools and servers because the cortex service out of the blue decided to crap out. 

More specifically the files seem related to cache for volume polling. We've already opened two tickets with support but aren't getting anywhere with support so I wanted to ask the community. 

The cache files appear to be DB files. That balloon out of control. And as far as we've been able to tell the agent doesn't lose connection so we can't understand why it's not flushing this cache and causing this problem to happen. Support had given us a script to run along with reboot of the agents and we did step by system what they recommended to no avil. The issue continues. 

I feel like we just scratched the surface instead of attacking the root of this problem. 

When we chart disk space we see slow steady increase till one day all disk space is taken. Temporarily we have been deleting the files manually. This seems to partially make the agents unstable because as soon as we delete the files it says that it cannot flush the files because it doesn't exist. Which doesn't make sense if the file is only growing. 

I can provide more details if needed. 

Parents
  • We had a similar issue about year ago, not sure about the exact build number. We were given the following workaround:

    For the workaround can we perform the following:

    1) Clean Volumes from Cortex DB with following query:
    delete from Cortex_Documents where Data LIKE '%"ModelType": "Orion.Volume"%' OR Data LIKE '%"Type": "Orion.NodeToVolumes"%'
    2) Stop Agent/Cortex on Agent machine
    3) Delete all *.db files in c:\programdata\SolarWinds\Cortex_Agent\
    4) Start Agent/Cortex again

    Please note that it is safe to delete the Volume information from Cortex_Documents tables because in this case, the Cortex is used only for realtime polling. Realtime polling is able to create correct records as they are needed. The *.db files are just cached data. They probably contain incorrect records and the Cortex is polling Volumes even when it shouldn't.

    I would suggest upgrading to the current version if you are not on it as it was fixed later on.

  • Support has us do this. Even though it looks like your query is just a little different than the one we were provided. We did that. We saw all the disk space regained among all our servers. But the issue came right back. It wasn't resolved. 

Reply Children
No Data