cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

SolarWinds Databse : Cortex_Document Table

Hi,

Does anyone know what does this table do? If we dropped this table and recreate meaning delete all 14000000 rows, will it cause any issue?

Thanks

0 Kudos
13 Replies
Level 12

The reply from @bharris1 is definitely the correct answer here.

There is some bug in the 2019 and 2018 versions of the Orion platform where the Cortex_documents table doesn't get cleaned up when it should. The Cortex services are very chatty so over time the record count gets into the millions, causing general performance of the Orion platform and will come to a complete halt. Stopping all Orion services on the servers, then clearing out the records, fixes the problem for my side. I'm stuck on running Orion platform 2018.02 for a while longer (last version supported on Windows 2012) until we can upgrade the OS. So this is a problem my team just has to deal with from time to time. Clearing out the records resolves the performance issues.

You'll know if you have a problem simply by checking the record count in the Cortex_Documents table. Your mileage may vary, but it seems like performance issues really start to occur when the table starts to hold more than 1 million records.

These are the queries I run directly against the Orion database to clear out the table.

 

DELETE FROM Cortex_Documents WHERE [Data] LIKE '%Orion.Node%'
DELETE FROM Cortex_Documents WHERE [Data] LIKE '%Orion.Volume%'
DELETE FROM Cortex_Documents WHERE [Data] LIKE '%Orion.SnmpCredential%'
DELETE FROM Cortex_Documents WHERE [Data] LIKE '%Orion.Cpu%'

 

Since we use SAM, we have a SQL User Experience Monitor that runs a query against the Orion database to collect the metric count of records in the "Cortex_Documents" table. We set the alert threshold greater >= 500,000 records. That seems to give us enough time to react before things get out of hand.

 

SELECT COUNT(*) FROM Cortex_Documents

 

Edit: Regarding the question "If we dropped this table and recreate meaning delete all 14000000 rows, will it cause any issue?", I say nope. I suppose it is possible for some monitoring data loss to occur if some of the data hasn't been processed. But we have not had any issues in clearing out this table when it gets large. My team has not noticed any data loss or problems and we've been clearing out the Cortex_documents table about once a month.

0 Kudos
Level 14

kevin.trantgeihlich

I was able to get my performance back to normal.  I used the query from tgeihlich​ to clear out only the virtualization entries and the performance dropped from 95-100% constantly to 60-70% normally to very intermittent spikes to the 90s.  Thanks for your help!

Level 7

Not sure how relevant this is, but we had an issue where we had 15 mil records in the table which was causing huge memory usage on the ServiceHost process on the MPE & APEs (Orion 2018.2 HF6) as well as smashing the DB on that table

This is what we got given from support

Please refer the following for the suggestion from the developer for Orion platform 2018.2:

  • The previous SQL script that drops duplicated data from Cortex_Documents has not been executed probably because there is vSAN element found in the database and the historical data below will be lost:
    • Cluster Details page - storage tab (Resource Utilization)
    • Esx Host Datails page - Virtualization Summary tab (bottom part of Resource Utilization)
  • After investigation, the developer found that the above mention need to be clear in order to fix the issue (even after upgrade).
  • If you are ok to lost the historical data from the vSAN element, run following SQL script to drop the duplicates data (it will take couple of minutes):
    • DELETE FROM Cortex_Documents WHERE Data LIKE '%"Orion.Virtualization%'
    • Restart SolarWinds Cortex service on all pollers to release the duplicated data from the memory.

It may be worth doing a "SELECT COUNT(*) Cortex_Documents WHERE Data LIKE '%"Orion.Virtualization%' " to see if its the same issue

We didn't see any noticeable data loss after nuking that data and it did resolve our memory / SQL performance issues

Have you opened a support case for it?

tgeihlichbharrisinvolta​ I am in this predicament now. I have the same Orion platform of 2018.2 HF6 and I have performed the Count(*) query via my Cortex_Documents to find that we had roughly 23 million hits for Data Like '%Orion.Virtualization%'. I performed a query to remove Data Like '%Orion.Virtualization%' and LastWriteTime Like '%2017% and another query where LastWriteTime like %2018% to hopefully alleviate this issue. This removed roughly 4 million hits, but it appears that most of my hits are for 2019. I have 19 million hits for 2019 where data is like %Orion.Virtualization%. I performed a restart of the cortex service on every one of my solarwinds servers, but shortly after I restarted the service, it began to creep back up to 95% Memory utilization and most of the servers are constantly at 98-99% for Memory. Some of the APEs are at around 12-14GB for Memory. I have 8 APEs in total for my environment and all have 32GBs. My main server has 256GB, but it is my assumption that the previous admin upgraded the Memory after he found this Cortex issue and did not know the resolution.

I was curious if either of you noticed a loss of any significant data. My fear initially was that this will remove pertinent VMAN information, but from my quick research on the web I realized that vSAN was a component of VMAN. This alleviated my fear as it is my thought that this Orion.Virtualization data only pertains to vSAN and nothing else... please correct me if I am wrong.

For our environment, we only have one vSAN device per our vSAN Summary page. From my research I have found nothing that important pertaining to historical data for vSAN. It just appears that the most useful thing from vSAN would be the ability to use Perfmon Analyzer and occasionally current status report for vSAN information. Nothing pertaining to any historical data. Please let me know if you see anything different from your perspective.

One thing that I was curious about and not sure if you all have the answer, but wouldn't the data keep accruing and get right back to the point where we had to delete this data. More so, is this a true fix, or more so a spot to keep an eye on and remove once it reaches a certain threshold? We have plans to upgrade to v2019 in the upcoming months and unsure if that version would resolve this issue or if this is something to always take a look at.

My plan to remove the Orion.Virtualization data from Cortex_Documents is to turn all the APEs and Secondary server offline. Perform the query via the Main Server and then restart the Cortex service from there. My assumption is that once I turn back on the APEs and Secondary server that the old Cortex_Documents data will be flushed. Not sure if I would still have to restart the service once I bring the servers back online, but I wanted to let you know my plan to see what you all thought.

FYI, here are some snippits below for how much CPU and Memory is being used.

Main Server

pastedImage_0.png

One of my eight APEs

pastedImage_1.png

Another APE

pastedImage_2.png

0 Kudos

Thanks tgeihlich!

Looks like we have similar issue. I did open a support case with SW support and they said to drop the Cortex_Document table as a while which has about 14.8M records. In your case, only drop the Orion.Virtualization data which makes more sense SW support troubleshot in detail to find the issue. This SW support agent asked us to delete all 14.8M records.

We did ask the BDA team to perform backup but I am afraid dropping = delete and we don't have any historical data. We do need historical data.

On your case did you loose any historical data after you deleted Orion.Virtualization data?

Best,

Kev

0 Kudos

Hey Kev,

We didn't lose any historical data for Nodes or Applications

The devs mentioned that there would be data loss for vSAN components but we only had two that had been decommissioned so I'm not sure if we lost data there

Out of curiosity, what sort of memory usage are you getting? Before the fix we were doing 20-35gb (on 32GB APE's, was not a fun time) on the ServiceHost process and after we're doing about 7gb

Cheers,

Tyler

Hi Tyler,

We had memory usage ranged from 12GB - 18GB on 24GB APE on the ServiceHost process. After the upgrade to 2018.4 HF3, the issue went away but it took a long time for things to settle down. We did use the My Deployment to upgrade the APEs. Currently, we are dealing with stuck at Loading issue under All Groups but today when I logged on, no more stuck at Loading issue. We did NOT dropped the Cortex.Document Table from DB.

Best,
Kev

0 Kudos

tgeihlich We seem to be experience similar issues of queries against Cortex_Documents that is running up our CPU and starting to affect the Web Console.  kevin.tran​ were you able to fix your issue?  Did you clear out the whole cortex table?

0 Kudos

Hi bharris,

Yes! We followed SolarWinds support's instruction to clear the table (all records) after backed up Database of course. We also did the upgrade to the latest at that time which was 2018.4 HF3. It took a long time for the upgrade since we have many pollers that didn't have the correct .NET and Windows Updates (yup! all Pollers must be same updates and .NET version).

You can also open a case with SolarWinds because it might be something different from your environment. They will need to look at the Diagnostics Logs.

Good luck!

Best,

Kev

0 Kudos

kevin.tran Do you happen to have the ticket number from support?  I'd like to provide that to support so they can look into it in regards to my ticket.

0 Kudos

CASE # 00339826

Yes please work with Support First Prior to deleting anything. Remember to BACKUP!!!

    Instructions:

Run the following query on the SQL Studio Mangement Console to drop the current
Cortex_Document and re-create those needed files:

DROP TABLE [dbo].[Cortex_Documents]

DROP TABLE [dbo].[Cortex_DocumentTypes]

DROP TABLE [dbo].[Cortex_ExternalDocumentTypes]



CREATE TABLE [dbo].[Cortex_Documents](

[ElementId] [bigint] NOT NULL,

[DocTypeId] [int] NOT NULL,

[OwnerPartitionId] [int] NOT NULL,

[LastWriteTime][datetime2](7) NOT NULL,

[DeletedDate] [datetime2](7) NULL,

[Data] [nvarchar](max) NOT NULL,

CONSTRAINT [PK_Cortex_Documents] PRIMARY KEY CLUSTERED (

[ElementId] ASC,

[DocTypeId] ASC

) ON [PRIMARY]

) ON [PRIMARY]



CREATE TABLE [dbo].[Cortex_DocumentTypes](

[DocTypeId] [int] IDENTITY(1,1) NOT NULL,

[DocType] [nvarchar](250) NOT NULL,

CONSTRAINT [PK_Cortex_DocumentTypes] PRIMARY KEY CLUSTERED (

[DocTypeId] ASC

) ON [PRIMARY],

CONSTRAINT [IX_Cortex_DocumentTypes] UNIQUE NONCLUSTERED (

[DocType] ASC

) ON [PRIMARY]

) ON [PRIMARY]



CREATE TABLE [dbo].[Cortex_ExternalDocumentTypes](

[DocTypeId] [int] IDENTITY(1,1) NOT NULL,

[DocType] [nvarchar](250) NOT NULL,

CONSTRAINT [PK_Cortex_ExternalDocumentTypes] PRIMARY KEY CLUSTERED (

[DocTypeId] ASC

) ON [PRIMARY],

CONSTRAINT [IX_Cortex_ExternalDocumentTypes] UNIQUE NONCLUSTERED (

[DocType] ASC

) ON [PRIMARY]

) ON [PRIMARY]


Start the Cortex Service back up, wait for a few minute to see
if it still not running.


0 Kudos
MVP
MVP

Hi kevin.tran

Cortex is a new service (Solarwinds Cortex Service) that was introduced in 2017.3 core platform. I believe the objective of this service is to consolidate all polling and monitoring activities across various Solarwinds modules.

Maybe this documents should help

Success Center - Solarwinds Cortex Service will not start

Success Center - Solarwinds Cortex Port Requirements

Thanks Ravik. I did see those documents. Didn't say much what deleting the table would do or cause any harm.

0 Kudos