GetScheduledListResourcesStatus always returns "unknown" status

I have been trying to discovery the resources for both new and existing nodes using the "GetScheduledListResourcesStatus" API command.  I keep getting a status "unknown" when using this feature on new or existing nodes.  I've tested this with multiple versions of code and on many different types of nodes with no success.  We are currently running 2020.2.6.  Looking for some guidance on how to solve this issue or what I might be missing in the code...

I used the powershell example posted online as the base for our script: Reference Page: https://support.solarwinds.com/SuccessCenter/s/article/Update-resources-on-an-agent-node-using-SWQL?language=en_US

The example code below and debug output is from our proprietary automation platform we use at our company.  However, it is based on Python, so the code is very similar to python examples posted online:.  

********************** List Resources Python Script Example **********************


# Description: List all Resources for a specific NodeID
# Reference Page: support.solarwinds.com/.../Update-resources-on-an-agent-node-using-SWQL

# nodeid="31838" # Router
nodeid="33775" # UPS
# nodeid="9095" # switch

# Invoke ScheduleListResources for a NodeID, wait for response, which is the resulting JobID...
log.error("Kicking off Resource List Request for NodeID = {}".format(nodeid))
jobid = plugin.solarwinds.invoke(entity="Orion.Nodes", verb="ScheduleListResources", args=nodeid, cred='solarwinds_admin')

# Use the resulting JobID from above to retrieve the status of the listresources request above...it may take time to complete
log.error("Waiting until job #{} status = 'ReadyForImport'...".format(jobid))
lr_status = plugin.solarwinds.invoke(entity="Orion.Nodes", verb="GetScheduledListResourcesStatus", args=[jobid, nodeid], cred='solarwinds_admin')
log.error("Wait 285sec for first first status check, then check status every 30sec")
plugin.tools.wait(seconds=285)
while lr_status == 'Unknown':
# Checking for status to change from 'unknown' to 'ReadyForImport'
plugin.tools.wait(seconds=30)
lr_status = plugin.solarwinds.invoke(entity="Orion.Nodes", verb="GetScheduledListResourcesStatus", args=[jobid, nodeid], cred='solarwinds_admin')
log.error("Resource Request Status = {}".format(lr_status))

# Wait for the job status to be "ReadyForImport", then ImportListResourcesResult
log.error("Importing Resource List Results...")
list_resources = plugin.solarwinds.invoke(entity="Orion.Nodes", verb="ImportListResourcesResult", args=[jobid, nodeid], cred='solarwinds_admin')


# Format result (make them look pretty!)
sw_results = plugin.tools.prettify(data=list_resources, indent=1)

# display results
return sw_results

********************************** Debug Output from running the script **********************************
25 workers, 24 are idle, 1 are busy and 0 are suspended.
task running
Kicking off Resource List Request for NodeID = 33775
Waiting until job #6a99b335-387d-4483-8a02-aa9b0edda0e4 status = 'ReadyForImport'...
Wait 285sec for first first status check, then check status every 30sec
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown
Resource Request Status = Unknown

The "unknown" continues on for several hours and never changes...eventually the script times out.   Don't understand why the status never changes?  what am I missing here?

Parents
  • I wanted to bring this issue up as well.  In my experience this part of the discovery engine is extremely fragile.  Even if I'm not using a script to perform this, just using the verbs directly from SWQL studio.  The process of ScheduleListResourcesGetScheduleListResourcesStatus fails so often that automating our node lifecycle is nearly pointless.

    This is occurring in multiple environments for me.  Our production environment, our Lab (dev) instance, and even if I stand up another environment at work or at home using a trial license.  Often it won't work at all even after a brand new install.

    Even if it did work previously, when it does start to fail running the config wizard usually isn't enough to repair the problem, requiring reinstalling components to get this feature to work.

    Are other people experiencing this?

  • I just ran it with the latest platform version (2022.2.0) via the Hybrid Cloud Observability installer, and got back an immediate response on the node ID (with a resulting job ID)

    Then, sending that information to the next verb returned a ReadyForImport status.

    Granted, this is a small lab (under 50 devices), but all that would do would be to increase the amount of time it took to flip from Unknown to ReadyForImport.

    There may be "collisions" (I can't think of a better word) if the call was sent over and over again.  Are you seeing this on every node when you ask it to get resources?  Are you getting the same thing if you do it via the web console?  They should be calling the same verb behind the scene, so if it doesn't work on the web, you definitely need to open a support case.

  • So - let me make sure I've got the process:

    1. Request via the API a list resources scan with Node ID.
    2. Get returned a job ID.
    3. Request via the API a get status of the list resources scan with Job ID and Node ID.
    4. Returns "unknown."
    5. Wait a few minutes.
    6. Request via the API a get status of the list resources scan with Job ID and Node ID.
    7. Returns "unknown."
    8. Go to the Orion Platform web page for the node.
    9. Click "List Resources."
    10. Page loads without incident.
    11. Request via the API a get status of the list resources scan with Job ID and Node ID.
    12. [Do you still get "unknown?"]

    Is that about right?

    What version of the SolarWinds Orion Platform are you running?

  • Yes, that is correct.

    I currently have two environment (issue happening in both):

    • production: Orion Platform 2020.2.6 HF4
    • development: Orion Platform 2020.2.6 HF5

    Generally speaking, I won't get an "Unknown" unless the process is broken, or after I complete the ImportListResourcesResult.  Makes sense for it to occur after successfully importing the results Slight smile, but when it occurs on the first attempt to GetScheduledListResourcesStatus, repeating the invoke of that verb over time it never changes from Unknown to any of the other statuses. If the feature is working, I won't get an 'Unknown' when getting the status before invoking the import.

  • How long are you waiting between checks?  (just for clarity)

  • Depends.  When I'm doing it in SWQL studio, I just feel it out.  Some times a few seconds, I've also waited a minute or so sometimes.  (checking multiple times of course, not just giving up immediately)

    When I did it in the code I had $TimeBetweenChecks between 2 - 15 seconds in duration.  I also had $Timeouts between 45 seconds - 5 minutes.

  • And this was in a while..do loop (or the like)?

    # Asssumes you have authenticated and stored your info in $SwisConnection
    $NodeID = 1009
    $Job = Invoke-SwisVerb -SwisConnection $SwisConnection -Entity 'Orion.Nodes' -Verb 'ScheduleListResources' -Arguments $NodeID
    $Timer = New-Object -TypeName 'System.Diagnostics.Stopwatch'
    # Set an overall timeout
    $Timeout = 600 # seconds
    if ( $Job ) {
        # We got back job information - extract the JobId
        $JobID = $Job.InnerText
        # Validate that the jobID is in a GUID format
        if ( $JobID -match '[a-f,0-9,A-F]{8}-[a-f,0-9,A-F]{4}-[a-f,0-9,A-F]{4}-[a-f,0-9,A-F]{4}-[a-f,0-9,A-F]{12}' ) {
            # Starting the stopwatch
            $Timer.Restart()
            do {
                # if we go over time, just break out of the loop
                if ( $Timer.Elapsed.TotalSeconds -gt $Timeout ) {
                    break
                }
                # Get the Job Status
                $JobStatus = Invoke-SwisVerb -SwisConnection $SwisConnection -Entity 'Orion.Nodes' -Verb 'GetScheduledListResourcesStatus' -Arguments $JobID, $NodeID
                # Pull the status text
                $Status = $JobStatus.InnerText
                
                # if the status text isn't ReadyForImport, sleep for 15 seconds.
                if ( $Status -ne 'ReadyForImport' ) {
                    Write-Warning -Message "Current status is: $Status for $JobID / Waiting for 15 seconds and trying again"
                    Start-Sleep -Seconds 15
                }
            } while ( $Status -ne 'ReadyForImport' ) 
            # Stop the timer
            $Timer.Stop()
            if ( $Status = 'ReadyToImport' ) {
                # Import the Results
                Write-Host "Ready to import!" -ForegroundColor Green
            }
            else {
                Write-Error -Message "Timed out waiting for a result on on $NodeID / $JobID [Last Status: $Status]"
            }
        }
    }
    else {
        # Job request failed
        Write-Errors -Message "Unable to create a List Resources job for node with ID: $NodeID"
    }
    
    

    Results:

    PS C:\> . "c:\ListResourcesOnNode.ps1"
    WARNING: Current status is: Unknown for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    WARNING: Current status is: Unknown for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    WARNING: Current status is: Unknown for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    WARNING: Current status is: Unknown for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    WARNING: Current status is: Unknown for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    WARNING: Current status is: InProgress for c9146487-a418-42ad-bbd3-d4c7023f456a / Waiting for 15 seconds and trying again
    Ready to import!
    PS C:\> 

  • Affirmative.  Shamelessly copied from the samples on the Orion SDK Github.

  • Well then, I'd still open a support case.  It feels like something gumming up that job and possibly others.  I'm not in support, so I'm only going by the symptoms.  Our support team has some tools at their disposal that I do not.  I don't know where those jobs are logged for processing on your system, but you should get something other than "Unknown."

    One last test (if you have the capability: if you have any "down" nodes now, test against one of them and see if it's showing the same results even though it's inaccessible.  I'm hoping it'll throw a different message.

    • Be sure to have a diagnostic captured before you run into this issue.
    • Try on a few different devices (types, O/S, locations, etc.).  I know that some of my old switches had something like 400 interfaces and I also have routers with only the two interfaces.
    • And capture another diagnostic after you generate the issue.

    Then open the case, explain the situation and you can refer them back to this thread if that's helpful.  If nothing else, this should be tracked for both your and our benefit.

  • Sorry for leaving this hanging. I opened a support case, and will report back when we figure it out.

    Case# 01109050 in case you're interested.

  • There's a dev case open for the issue. But there is a workaround: point the API connection at the polling engine that the node is assigned to for the ScheduleListResources (I'm also doing that for the GetScheduledListResources and ImportListResourcesResult). Been working great since we started doing it that way.

Reply Children
No Data