cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 13

Interface Errors

Jump to solution

I am curious about how Orion is polling for interface errors.

We have a couple of devices that have a history of getting errors on interfaces. After we clear the errors on the devices, Orion still shows the errors.  I had thought that after I cleared the errors, came to the Orion server, did a rediscover/poll now, it would read the interfaces again, see there are no more errors and clear them from the table we display.

Perhaps I am just missing something?

1 Solution

Don,

mdriskell is correct about the statistics being stored in the database for historical reference.  I think for most of the charts and graphs, the default time frame is set to Today.  If you haven't changed that setting, those charts would show all input/output errors, discards, etc. since midnight. 

Hope this helps!

View solution in original post

10 Replies
Level 15

I've created 4 custom reports with report writer which appear on the interface tab of my Node Details page.

  1. TX discards greater than 0 Today
  2. TX errors greater than 0 Today
  3. RX discards greater than 0 Today
  4. RX errors greater than 0 Today

pastedImage_5.png

This is a generic report that applies to all interfaces in the Orion DB, but I then apply the following SQL filter on the resource in the interface tab:

IP_Address = '${IP_Address}'

pastedImage_2.png

What these reports do, is show me any interfaces on the device which have errors or discards.

Most of the tables are empty with "No activity to report." so it takes up little real-estate on the web page.

It is any easy way to get an overview on the health of all the interfaces on your device without having to search for a problem.

It is there - "in your face" when opening the interface tab.

Here is the Report Writer SQL query for TX Discards (it is actually limited to the top 10 interfaces with discards on a device):

SELECT TOP 10

Nodes.NodeID AS NodeID, Interfaces.InterfaceID AS InterfaceID, Nodes.Caption AS NodeName, Interfaces.Caption AS Interface_Caption, Interfaces.OutDiscardsToday AS OutDiscardsToday, Interfaces.OutDiscardsThisHour AS OutDiscardsThisHour, Interfaces.Outbps AS Xmit_bps

FROM

Nodes INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)

WHERE 

(

  (Interfaces.OutDiscardsToday <> 0)

)

ORDER BY 5 DESC

0 Kudos
Level 13

I am going to bring this up again - as I am still not real clear on why my counters on Orion continue to increase, after manually clearing the counters on the devices, including a reboot.  How are these being processed by Orion?

If I reboot the device, the counts will be 0 - so, how many consecutive polls with a  0 factor will be needed before the charts correct themselves?   Its very frustrating to have to explain to management that a chart in Orion is NOT correct.

For example - we have a resource on our home page that shows "High Errors and Discards for Today (interfaces with errors/discards greater than 10000 today)".   We clear these errors on a switch (either manually or by reboot) but the device chart never updates - it keeps that same number, even if there have been NO errors on the device for 48 hours.

Any thoughts/helpful advice?  I know my  network is not as bad as Orion seems to think it is.. here is an example -

From Orion:

pastedImage_2.png

From the Switch:

pastedImage_3.png

0 Kudos

Keep in mind the source of the data for each case. For the Orion table the source is your Orion database. For the Cisco device the source is the Cisco agent. Clearing the Cisco agent does not clear the Orion database.

This type of data mismatch is common in network management. Folks will often compare NPM interface utilization with NTA interface data and wonder why they don't match. In this case it is usually because NetFlow graphs use very different data periods and filtering like top X to show flow trends. NPM SNMP just counts data and does not care what type it is. So the mismatch is because the data sources and the nature of the data is different, not that either is wrong.

If you are good at SQL queries you can update the interface error or discard fields in the database when you clear the counter. This also requires that you are pretty familiar with the tables and know what to touch and what to leave alone.

If you have Enginner's Toolset you can see the counters off of the device in near-realtime.


Andy

0 Kudos

ok - well, I need LIVE data, not stale data from a database. 

0 Kudos

Remand your query to Alerts and Discards this hour(Specifically Error's); not From Today.  This will keep your counts current from the last hour.

Add this to your filter so you see Error's more than discards; up the # if you have a lot of error's and are still seeing discard info mainly.

and (InErrorsThisHour > 0 or OutErrorsThisHour > 0)

Thanks. The query helped.

I don't exactly like that list because it keeps Discards at the top of your list. So even if no errors occur and your over bandwidth, or your Firewall is doing it's job you will get discards.

I try to limit my top lists to the Error's only (even adding a filter that keeps it tied to nodes with more than 1 or 0 errors.... and (InErrorsThisHour > 0 or OutErrorsThisHour > 0)

SolarWinds will keep the history, and your web display will keep track of what is going on either as a whole day(since midnight), or your time period(last 1 hour).  That is a rotating hour from what I understand. SO current discards are counted, rather than being reset at the top of each hour.

Orion counts on it's own, it DOES NOT look at the switch to pull the numbers from.

0 Kudos
Level 15

Clearing the errors on the device is simply clearing the counters...Solarwinds is querying that interface and adding all the errors it detected over a period.  Once they are in the database they remain regardless if they have been cleared.

I actually prefer this method because it enabled my NOC team to have this information at their fingertips and didn't have to clear counters or worry because someone cleared them in error.  They can instantly pull a report to look over a given period to know if a problem was resolved.

Don,

mdriskell is correct about the statistics being stored in the database for historical reference.  I think for most of the charts and graphs, the default time frame is set to Today.  If you haven't changed that setting, those charts would show all input/output errors, discards, etc. since midnight. 

Hope this helps!

View solution in original post