reporting out of SWIS API

We have a reporting platform that we're trying to leverage the solarwinds API to query data from.  It uses a homegrown tool that can use an API to ingest data.  This works great for availability, cpu, disk, memory and response time statistics.

However, we're running into issues with interface related tables where the queries are timing out and the SWIS service itself starts to consume over 12GB of ram and then doesn't go back down until it is restarted, here is an example of one query:

select Nodes.Caption as DEVICE_NAME, Nodes.IP_Address as DEVICE_IP,  'InPercentUtil' as PARAMETER_NAME, InterfaceTraffic.DateTime as CHECK_TIME
,InterfaceTraffic.InPercentUtil as OUTPUT_DATA,  interfaces.TypeDescription as
OUTPUT_DATA2, Interfaces.Caption as OUTPUT_DATA3
from orion.Nodes, orion.npm.InterfaceTraffic, orion.npm.Interfaces
where InterfaceTraffic.nodeID=Nodes.NodeID AND interfaces.nodeid=nodes.nodeid and MINUTEDIFF(InterfaceTraffic.DateTime,getdate()) = 1

tdanner​, would you have any suggestions?

Id say we have a medium to large deployment and interfaces are definitely the biggest portion of the environment:

3 Polling Engines

Network Elements 32814

Nodes 2496

Interfaces 21062

Volumes 9256

Thanks!

  • First, you probably want "GETUTCDATE()" not "GETDATE()" in the WHERE clause - SWIS operates in UTC even when the underlying database tables use local time.

    My hunch is that account limitations are getting involved and making this query more complicated. Can you try this query with an account that has no limitations?

  • The account we’re using doesn’t have any limitations.

    And interestingly enough, I was performing some tests earlier and the same query above returns about 126,000 records in about 10 seconds according to swql studio (21,060 interfaces * 6) where 6 is the amount of data points per interface per hour.

    If I do that query with just a top 127,000 it completes everytime. If I take the top 127,000 out of there it constantly times out more often than not. Not sure what the deal is there

  • So in this case a top x is much more computationally efficient than a where by itself,  the top command just pulls data until it's done at that max number of entries,  the where logic is still evaluating all of the entire history of every interface to see if they match the condition (default settings would have data points going back a year on these).  I'm boarding an airplane soon so I don't have a good solution off the top of my head for how to improve the efficiency of your query but hopefully that helps some.

  • Thank you mesverrum​.  That being said, it sounds like implementing a top X wouldn't work to get results from all interfaces in the last hour because its only taking the top X from the table and then doing the hourdiff on those?  Basically my question is, does the TOP X happen before the hourdiff or after?

  • Another observation I made is a select * without a top x, will timeout in swql studio like I mentioned and I noticed the SWIS service will keep consuming RAM until there inst any left and will never go back down until it is restarted.  Not sure if this is a SWQL studio thing or not because I do get the 2 minute timeout warning from swql studio.

  • Yeah,  that's not specific to swql studio.  If you wrote a query that pulls too much data and stick it in the web consoles or a report you can cause the info service to balloon like that. I've learned this through painful experience in client environments. It seems like the info service doesn't actually give up on the query, just the ui gives up on waiting for the response.

  • Top x is after the where condition until it hits the cap,  then it says it is done. But this thread on the same situation happening in plain sql seems to point toward your including a top statement could impact the execution plan in a way that makes it run more efficiently.

    https://stackoverflow.com/questions/1393508/sql-massive-performance-difference-using-select-top-x-even-when-x-is-much-higher

  • So I was doing some troubleshooting in SQL Management studio and I can see the query come through as SolarWinds.InformationService.ServiceV3@domain-Orion.  It runs and actually executes rather quickly, doesn't even register as an "recent expensive query" however it does immediately go into an async_network_io wait state and then the SQL server suspends the query.  Based on what I read, that particular wait (async_network_io) occurs when SQL is sending the data back to the client faster than the client can ingest it.  Not knowing anything about how the SWIS service works, are there technical limitations to SWIS that would cause a large query like this to completely fail?  It seems as if our SQL server is not the issue, and is rather a problem with SWIS.  Do we need more ram on the poller where we're doing the API call on?  We only have 16GB, would 32GB be sufficient? will SWIS continue to use all available RAM and not scale back down?  Whats strange is I can do TOP 127,000's all day long and SWIS can handle that, the moment I take it out, it crashes.

  • So I've been experimenting with comparing dates in order to get data from the last hour with a different method instead of doing the hourdiff.  If I do a datetime > adddate('hour',-1,getdate()) the performance improvement is staggering.  But this doesn't appear to return data from the last hour like I think it should in my head.  I'm probably doing something wrong.  maybe tdanner​ you could offer a suggestion?

    This is my query thus far, which to be honest, I'm not sure really what its doing as it returns 30 rows per server:

    select
        N.Caption as DEVICE_NAME,
        N.IP_Address as DEVICE_IP,
        'InPercentUtil' as PARAMETER_NAME,
        IT.DateTime as CHECK_TIME,
        IT.InPercentUtil as OUTPUT_DATA,
        I.TypeDescription as OUTPUT_DATA2,
        I.Caption as OUTPUT_DATA3
    from Orion.Nodes AS N
    inner join orion.npm.InterfaceTraffic AS IT
    on N.nodeid = IT.nodeid
    inner join orion.npm.interfaces AS I
    on N.nodeid = I.nodeid
    where datetime > ADDDATE('hour',-1,GETDATE()) and nodeid = 1
  • Try changing GETDATE() to GETUTCDATE(). SWIS operates in UTC.