This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Discovery + Remove-SwisObject Causing Orion Web Interface and Information Service to stop

I'm attempting to run a discovery profile that imports all "up" volumes and interfaces and then a series of Get-SwisData and Remove-SwisObject to only keep the ones our organization cares about. About half way through the Remove-SwisObject, I end up getting the following error:

Remove-SwisObject : Access to Orion.NPM.Interfaces denied.

Then I attempt to refresh the Orion web console and get the following error:

There was an error retrieving data from SolarWinds Information Service
Error: A query to the SolarWinds Information Service failed.

It's not a ton of objects, since is a smaller network (only 319 volumes and 1341 interfaces to remove). Here's a stripped down version of the PowerShell that's being run:

$swis = Connect-Swis -Hostname localhost -Credential $cred

$query = @"
SELECT ProfileID, Name
FROM Orion.DiscoveryProfiles
WHERE Name = 'Discovery Automation'
"@

$discoveryProfile = Get-SwisData -SwisConnection $swis -Query $query

Invoke-SwisVerb $swis Orion.Discovery StartDiscoveryProfile $discoveryProfile.ProfileID

do {
    Start-Sleep -Seconds 60
    $Status = Get-SwisData $swis "SELECT Status FROM Orion.DiscoveryProfiles WHERE ProfileID = @profileId" @{profileId = $($discoveryProfile.ProfileID)}
    Write-Verbose -Message "TimeStamp: $(Get-Date -Format yyyy-MM-dd_HH:mm:ss) | Discovery not completed yet. Sleeping..."
} while ($Status -eq 1)

 Once that's finished, this is run:

$queryVolume = @"
SELECT CP.Z_AUTOMATION_BYPASS_Volumes, CP.UNS_Responsible_Team, N.Vendor, N.MachineType, N.NodeName, V.Type, V.FullName, V.Uri
FROM Orion.Volumes V
LEFT JOIN Orion.NodesCustomProperties CP ON CP.NodeID = V.NodeID
LEFT JOIN Orion.Nodes N ON N.NodeID = V.NodeID
WHERE Z_AUTOMATION_BYPASS_Volumes = 'False' AND Vendor = 'Windows' AND Type != 'Fixed Disk' AND Type != 'NetworkDisk' AND Type != 'Mount Point' AND Type != 'MountPoint' OR (Type = 'Unknown' AND (VolumeDescription LIKE '%Label:Recovery%' OR VolumeDescription = 'Physical Memory' OR VolumeDescription = 'Virtual Memory' OR VolumeDescription = 'Memory buffers' OR VolumeDescription = 'Cached memory' OR VolumeDescription = 'Shared memory')) OR (Type = 'Other' AND (VolumeDescription = 'Physical Memory' OR VolumeDescription = 'Virtual Memory'))
"@
$volumes = Get-SwisData -SwisConnection $swis -Query $queryVolume

Foreach ($volume in $volumes){
    Write-Verbose -Message "*****Removing object: TimeStamp: $(Get-Date -Format yyyy-MM-dd_HH:mm:ss) | $($volume | Out-String)"
    Remove-SwisObject -SwisConnection $swis -Uri $volume.Uri
    Start-Sleep -Milliseconds 500
}
$queryInterface = @"
SELECT CP.Z_AUTOMATION_BYPASS_Interfaces, CP.UNS_Responsible_Team, N.Vendor, N.MachineType, N.NodeName , INT.Alias, INT.MAC, INT.InterfaceTypeName, INT.Uri
FROM Orion.NPM.Interfaces INT
LEFT JOIN Orion.NodesCustomProperties CP ON CP.NodeID = INT.NodeID
LEFT JOIN Orion.Nodes N On N.NodeID = INT.NodeID
WHERE Z_AUTOMATION_BYPASS_Interfaces = 'False' AND Vendor = 'Windows' AND Alias NOT LIKE '%[[]MON]'
"@

$interfaces = Get-SwisData -SwisConnection $swis -Query $queryInterface
        
Foreach ($interface in $interfaces){
    Write-Verbose -Message "*****Removing object: TimeStamp: $(Get-Date -Format yyyy-MM-dd_HH:mm:ss) | $($interface | Out-String)"
    Remove-SwisObject -SwisConnection $swis -Uri $interface.Uri
    Start-Sleep -Milliseconds 500
}

 About half way through the interfaces it bombs out. Do I need to have a longer sleep? Is there a better way to do this? I like the flexibility of Get-SwisData in using any type of logic we want to select the appropriate volumes and interfaces.

Anyway, I end up rebooting the polling engine and everything is once again magically delicious. In fact, once it comes back up, I'm able to re-run the second portion of the code above, which has fewer interfaces this time and no volumes, and it works. On the second run there are 0 volumes and 377 interfaces (it made it through 964 interfaces on the first run). I thought it might be a message queue issue. We pull in the message queue every 2 minutes and it never even went above 0. CPU and memory never spike. There's no expensive queries or processes with waits on the SQL server.

A possibly notable error:

2020-05-19 18:24:38,727 [95] ERROR SolarWinds.InformationService.Core.InformationService - (null) (null) Exception caught in method SolarWinds.InformationService.Core.InformationService.RunQuery System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

Anyway, I appreciate any help!

  • I've run into issues with consuming all available database connection pools before, the default is 1000, after which they have a timer where they basically start to release after 2 ish minutes. The way I worked around it at the time was updating my database connection setting to have 10k allowed in the pool , which instantly allowed me to compete what I was working on so I never really revisited the issue.
  • I'll see if I can push my way through change management and get this done today and get back to you on results. Thanks, Marc!
  • Hey Marc  

    This fixed it, thanks a ton man. Might I be able to ask a related question, now that this is working? I can post on a separate question, if that's better.

    I was thinking there was a flaw in my logic. I run discovery to find anything that responds to SNMP and then bring in all "up" interfaces and volumes. I then run a "cleanup" to remove anything we don't want to keep, which works fantastically, because we can use SWQL WHERE statements that are very flexible. Once it's all cleaned up, we set custom properties on objects, which are consumed by alerts. We have some people that can't make their devices adhere to a standard, so there's a boolean custom property that allows them to bypass any "cleanup" of their volumes and/or interfaces. For those, we manually select interfaces and volumes using list resources.

    I had thought that the discovery would have added every "up" interface and volume back to those nodes, but it didn't for some reason. Is there some kind of flag that discovery looks at to know that resources were manually selected on some node, so don't touch them, even with auto-import settings in the discovery profile?

    You think this is a good way to approach discovery? The primary goal was something like this:

    "Hey teams, tell me the subnets your devices live in, the SNMP strings they respond to, and the standards by which you want monitoring on your volumes and interfaces by machine type. I'll run discovery every night and anytime you make changes, you don't have to worry about Orion, it'll just see the new stuff, add it in, and set custom properties for alerts as you specified.If you have anything that can't adhere to the standards, you're responsible to set 'x' custom property to bypass automation and manually select the monitored resources on the Orion web interface. Here's a document on how all of the custom properties correlate to alerts, notifications, priorities, etc."

     - As more people start heavily using the SDK, it may be worth considering bumping the default pool value up from 1000. We have two instances of Orion at our organization and I was running this automation in the small one and it failed on the default pool settings, it would have definitely failed in our large one. I'd eventually like to run this stuff in our large instance. Also, it may be worth considering a -batch parameter for Remove-SwisObject, where we could pass an array of Uri's, so it doesn't consume one connection per cmdlet. As always, we're grateful for the work you and the team put in on the product!

  • Remove-SwisObject supports passing a collection of Uris via the pipeline. Like this:

    Get-SwisData $swis "SELECT Uri FROM Orion.NPM.Interfaces WHERE blah blah" | Remove-SwisObject $swis

    Or:

    $interfaces = Get-SwisData $swis "SELECT Uri, other-properties FROM Orion.NPM.Interfaces WHERE blah blah"
    $interfacesToDelete = $interfaces |? { # do some local computation and filtering }
    $interfacesToDelete |% {$_.Uri} | Remove-SwisObject $swis
  • Hey Tim,

    When we pass on the pipeline, AFAIK, it still runs the process block for each object passed on the pipeline, right? Each of those would technically consume one of the database pool connections? Since we can configure the maximum number of connections, it's not a huge deal, but I was just throwing out there that it might be able to be adjusted to more efficiently account for a batch operation where you're passing hundreds or thousands of URIs to it. I could be totally misunderstanding how it works on the back end though, seeing as how you and team likely wrote most of this, I'll defer to you on it.

  • The PowerShell cmdlet API makes it really easy to collect values provided by the pipeline and commit all at once at the end.

    https://github.com/solarwinds/OrionSDK/blob/master/Src/SwisPowerShell/RemoveSwisObject.cs

  • Hi, I found there are Dispose and Close methods available for the swis connection object. It may help using them: 
    $Swis.Dispose()
    $Swis.Close()

    I don't know the the difference between them. In a future script update I will test this, because we had a IIS connections problem as well after running a script in a loop which connects every time calling it.