10 Replies Latest reply on Jun 15, 2016 11:31 AM by dhanson

    interface down alerts after a rediscovery

    ryan.davis26

      I recently discovered all of our interfaces and added them to solarwinds because our management wants every interface "monitored"  Since then, I've had numerous problems with interfaces that have never been up triggering interface down events.

       

      Anyone else had this issue?

       

      I submitted a support ticket and theyre suggesting modifying the "AutoClearAlertIntervalInMinutes" in the AlertingEngine.exe.config file to the max, which is 129,600, or 89.999999 days. (https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/Interface_alerts_re-trigger_for_Interface_Down)  It's not clear to me what that setting is supposed to do

        • Re: interface down alerts after a rediscovery
          rschroeder

          While I've not had the issue on the general alerting, I have seen it in custom reports and alerts I build to provide special notification of data center ports that change state.

           

          I suspect you'll find the alerts will show up every 89.999999 days, and you'll need to reset them.

           

          Perhaps the best thing you can do is submit a feature request and ask for what you want.  You can click "Create" on the upper right area of Thwack.com and then select Feature Request.  Describe what new feature you'd like Solarwinds to provide, and then vote for it.  Other Thwack members will see your Feature Request and can vote on it.

           

            • Re: interface down alerts after a rediscovery
              ryan.davis26

              rschroeder, thanks for the response.  Yes, that is the article that support guided me to.  But when I read that I'm not sure what its saying its going to do.  Does it imply that solarwinds will automatically re-trigger an alert that has already triggered if its been more than "AutoClearAlertInterfanInMinutes"?  I don't understand the point of this setting, why would I want my alert to re-trigger itself?  What if I never wanted the alert in the first place?  In my case the interfaces were never UP, they had always been down.  But when I "re discovered" them, then solariwnds fired interface down on them

                • Re: interface down alerts after a rediscovery
                  rschroeder

                  I'd love to have the answers to your questions, Ryan, but I'm not an Orion developer, nor am I privy to their logic.

                   

                  However, one may hypothesize . . .

                   

                  • Perhaps SW developers have the mind set that users would only monitor active ports--to save $$ on polling engines and reporting and disk space?
                  • One could imagine a port-status-comparison report that could be built from this information--how many ports were active last quarter versus those active today.  Re-triggering the ports-down-alert might be useful for that.

                   

                  I'd not be surprised that some of the Thwack SWQL experts present in this forum could create a query or report that would filter those alerts out based on criteria you provide.  Imagine wanting to know:

                  • How many ports are unused today versus last week or last year?
                  • Which ports have been down longer than 90 days?
                  • How much money is tied up in unused ports? (multiply ports down for the last 90 days X $200 per port, or whatever your per-port cost is)

                   

                  While waiting for a work around from Support, I'd definitely create a Feature Request and ask folks to vote it Up.  The more votes, the more attention it gets from Orion developers, and since they suggest the Feature Request be built, I bet someone there has an idea how to make it happen if enough votes are cast for it.

                    • Re: interface down alerts after a rediscovery
                      ryan.davis26

                      I appreciate the feedback, but I think my original intent has been lost in the noise here.  I was just simply wondering why that setting even exists and how it relates to the problem of interface down events on interfaces that have never even reported UP in the first place.

                      • Re: interface down alerts after a rediscovery
                        ryan.davis26

                        To clarify, here is an example of what I'm talking about.  On a Cisco Nexus 6000 switch:

                        This interface has never been up.  I don't know why the network team has it like this, but it is what it is.  The "AutoClearAlertIntervalInMinutes" in our solarwinds is set to 14400 or 10 days.

                         

                        The interface downs for this interface:

                        1. 5/19 @ 1055

                        2. 5/31 @ 446

                        3. 6/2 @ 430

                        4. 6/10 @ 440

                        5. 6/14 @ 442

                         

                        The network team assures me this interface has always been down according to the logs on the device.  The Solarwinds Interface Downtime shows its always been down.  Yet solarwinds continues to fire interface down events.  And the point at which it re-triggers the alert is all random and not anywhere close to 10 days.

                          • Re: interface down alerts after a rediscovery
                            pratikmehta003

                            if those interfaces are not required for monitoring then why don't you unmanage/delete them?

                              • Re: interface down alerts after a rediscovery
                                rschroeder

                                Further above, he reported Management requires all ports to be monitored or managed.

                                    • Re: interface down alerts after a rediscovery
                                      ryan.davis26

                                      pratikmehta003 no apologies needed!  This has got myself and our network team scratching our heads but I think we have several internal process issues going on among other things.  And SolarWinds support wasn't exactly a big help - they suggested deleting the interface!

                                        • Re: interface down alerts after a rediscovery
                                          dhanson

                                          Why not add a custom property to all your interfaces...basically used/unused...initially, this requires quite a bit of overhead and management, but once it is in place (assuming it works), it may be helpful.

                                           

                                          Once the property is in place, you can change your alert logic to include the status of the custom property as a part of the trigger rules. For example, only trigger if the custom property indicates that the port is 'used'.

                                           

                                          I'm pretty sure in 11.5+, your alert trigger can edit a custom property based on the conditions of an alert Change a custom property ...so if that's the case, I think you could create an alert that goes something like this:

                                           

                                          Conditions:

                                          interface status is up

                                          <your custom interface property> is unused

                                           

                                          Trigger Action:

                                          change custom property <your custom interface property> to used

                                           

                                          HTH!

                                          Dan

                                           

                                          P.S.

                                           

                                          You can use other statuses for your custom property as well, assuming you want this property to pull some double duty. For instance, you can have a custom property that indicates what the current configuration of the port is (trunk, access, L3, unused) or something about what is connected to it. Keep in mind, to use the alert like described above to automate modifying this value to the appropriate thing, you'll have to work some magic (like SQL/SWQL queries against the polling results of the interface and/or UnDP's), but it could work. If you do this, you'd have to reverse the logic of the initial "interface down" alert - only trigger if the custom property indicates that the port IS NOT 'unused'.

                                           

                                          P.S.S.

                                           

                                          I noticed you mentioned that this occurred following a rediscovery...so I'd suggest finding some time statistic regarding status like an interface's 'last input' or 'last output' to be used as an additional caveat to one or both of the alerts. Something like that should mitigate the risk of having a port's status change inadvertently/incorrectly.