16 Replies Latest reply on Sep 22, 2011 1:48 AM by Ritika

    Setting up trigger condition for an alarm

    Ritika

      We are using solarwinds orion for network monitoring. We have set up orion to forward certain key alerts to another management system. We have this requirement for setting up rule for interface errors and discards:
      1.Monitor the error/discard increment for an interface for every 15 min .... That is, Orion should be able to keep track of the increase in error/discard for every 15 minutes....Set up a threshold level for errors/discards increase
      2. Suppose if the threshold is crossed for first 15 min, dont send alert from orion to our management system....Keep observing the increments for every 15 min and only send alarm to the management system when the threshold is crossed 3 times continuously....

      For example , suppose the 15 minute threshold is 1000.... From 1-1.15 pm, Orion observed an increment of 1500 in interface errors...It wont send alarm now.. It should observe for next 15 min... In that time, increment is 1200...In third 15 min interval, increment is 1300....Now solarwinds forwards the alrm to the management system..

      Can this be done? If yes, I would like to know the step-by-step procedure.

      Thanks,
      Ritika

        • Re: Setting up trigger condition for an alarm
          Andy McBride

          The ability to alert by number of occurrences per time period is an open feature enhancement request. What you can do is specify how often you wish to check a value. This Technical Reference and the Administrator Guide have the details on how alerts work and how to set them up.

            • Re: Setting up trigger condition for an alarm
              Ritika

              Hi Mcbridea,

              Thanks for your response.

              I have looked through the technical reference and admin guide for setting up alerts. however, we have a very specific requirement for errors/discards alerting ( as also given in the previous post). Please refer to that again.

              In the Orion database, I saw that we have dbo.InterfaceErrors_Detail, dbo.InterfaceErrors_Hourly and dbo.InterfaceErrors_Daily tables. I believe that the data in InterfaceErrors_Hourly table is derived from the data in InterfaceErrors_Detail table. If yes, could you tell me how the data is gathered from former and put into latter? I was also trying to find out at what intervals the data is put up in InterfaceErrors_Detail table, but could not find any specific pattern. Please tell me at what exact time intervals we are putting the data in the Details table. There is no fixed time interval after which a row is put in InterfaceErrors_Detail table.

              If you need further details about the issue, please let me know your email Id, I will send you the details.

              Thanks,

              ritika

                • Re: Setting up trigger condition for an alarm
                  Nandish

                  Log into the GUI and click 'settings' on the right top corner of the home page.Then select the 'Polling settings' option from the list.

                  You should see the Orion Polling settings page up on the screen which has all the information you wanted to know.

                  The "Polling Statistics Intervals" defines the how often the data collection should occur on a Device/Interface/Volume.Basically data collection starts right here,entering data in the XXXXX.details table.

                  And then the data in the XXXXX.Details table gets aggregated to XXXXX.Hourly table.Finally data from XXXXX.Hourly table gets aggregated to XXXXX.Daily table.

                  For example,we have a new interface added to Orion.After successful discovery,data collection is initiated.It starts filling the XXXXX.detail table according to the Polling interval.Say we have 10mins.As soon as we have 6 entries in the XXXXX.Details table,it should start aggregating this data and making its first entry in the XXXXX.hourly table.Similarly as soon as the XXXXX.Hourly table has 24 entries,it should start aggregating this data and making its first entry in the XXXXX.Daily table.

                  Then the data retention on each table discussed above totally depends on the "Database settings",in other words the retention time set for each statistics.Check for the "Detailed/Hourly/Daily statistics Retention" settings.

                    • Re: Setting up trigger condition for an alarm
                      Ritika

                      Hi Nandish,

                      Thanks for making it clear.

                      Just to give you an example, this is what we want to achieve:

                      Suppose we set up a threshold for errors/discards as 1000 ( for a 15-min interval).

                       

                      First observation, 1 – 1.15 pm ,     errors + discards = 800 ( below threshold, no worry )

                                                   1.15 – 1.30 pm, errors + discards=1200 (above threshold, but Orion does not send alert to management system as threshold is crossed only once)

                                                   1.30 – 1.45 pm, errors + discards=900 (below threshold, okay again, no alarm needed as the increase in previous interval may be due to one time spike in traffic etc)

                       

                      Second observation, 1.45 – 2.00 pm ,     errors + discards = 1300 (above threshold, but Orion does not send alert to management system as threshold is crossed only once)    

                                                       2.00 – 2.15pm,       errors + discards = 1400 (above threshold, but Orion does not send alert to management system as threshold is crossed twice)

                                                       2.15-2.30pm,          errors + discards = 1300 (above threshold, now Orion sends alert to management system as the threshold is crossed for three consecutive intervals and      

                                                                                       Errors/discards need to be looked at now)

                       

                      So we want Orion to inform our management system about the condition in second example above.

                       

                      Also, we have the default interface statistics poll interval set as 9 min.

                      so, ideally the difference in DateTime between consecutive rows in details table should be 9 min, but it is not....that was the confusion.

                      I believe that Orion is polling the interface for statistics collection every 9 min, but the DateTime value recorded in the table are when the response comes from the device, which can be random and not necessarily separated by 9 min. Is my understanding correct?

                      Also could you tell me what are the values under In_Discards, In_errors etc under the Detail table. Are those the cumulated errors/discards across different time, or are those just the increments in errors/discards over consecutive time intervals when the interface is polled, or are those some kind of rates of increase of errors/discards?

                       

                      thanks,

                      Ritika

                       

                       

                        • Re: Setting up trigger condition for an alarm
                          Nandish

                          Hi Ritika,


                           


                          Here are my answers.


                           


                          I believe that Orion is polling the interface for statistics collection every 9 min, but the DateTime value recorded in the table are when the response comes from the device, which can be random and not necessarily separated by 9 min. Is my understanding correct?


                          Ans:You are right.There can be several reasons to skip polls.


                          Also could you tell me what are the values under In_Discards, In_errors etc under the Detail table.


                          Ans:Values returned according to the SNMP OID used for In-Discards,In_errors etc.Check for this tree 1.3.6.1.2.1.2.2.1 to see the OID for each mentioned above.Basically they are mib-2.


                           


                          Are those the cumulated errors/discards across different time, or are those just the increments in errors/discards over consecutive time intervals when the interface is polled, or are those some kind of rates of increase of errors/discards?


                          Ans:This is simply going to be a SNMP get value coming from the interface.By setting the poll interval to (say) 15mins,you are asking Orion to send SNMP request for every 15mins interval.

                  • Re: Setting up trigger condition for an alarm
                    Nandish

                    Getting back to your primary question on setting alerts.I can give you directions and should be possible with a help of custom SQL query option in Advanced Alert manager.

                    Firstly,you have to set the Interface statistics poll interval to 15 minutes.(Please note this will apply to all the statistics of all interfaces monitored by Orion)

                    Before going to Advanced Alerts tool,lets get to the database.This table "InterfaceErrors_Detail" is what you need to look for.Open up this table and you should see data for In_Discards/In_Errors/Out_Discards/Out_Errors available for every 15mins.

                    Now goto Advanced Alerts Manager tool.You need to configure a new Alert.In the trigger Condition tab,select "Custom SQL Alert" under Type of Property to monitor.This option is going to be the key.You need to get a SQL query written using the table "InterfaceErrors_Detail".May be you can have a SQL expert to write the query according to your criteria mentioned above,it should be possible.

                    You cant use "Interface" as type to monitor to meet your criteria.Because the interface table has the errors&discards data only for the hour and daily.

                    Hope this helps.Let us know the results.

                      • Re: Setting up trigger condition for an alarm
                        Ritika

                        thanks Nandish for making it clearer.

                        Okay, so I am at Advanced Alerts. Under type of Property to Monitor, I have selected Custom SQL Alert.

                        I need to select something for "Set up your Trigger Query". I have selected "Interface"

                        as it looks like a close option.

                        Now, I see a fixed SELECT, FROM Statement written there:

                        SELECT Interfaces.InterfaceID AS NetObjectID, Interfaces.FullName AS Name FROM Interfaces.

                        But I want to use Interface_Details table and not the already written Interface table.

                        Do I need to do USE Interface_Details and then write my SELECT, FROM query?

                        Thanks again for your help on this. Much appreciated!

                        -Ritika

                          • Re: Setting up trigger condition for an alarm
                            Nandish

                            Right you need to use InterfaceErrors_Details in your Query.Once you complete the SQL,you may validate the SQL by selecting "Validate SQL" in the bottom of the page.

                              • Re: Setting up trigger condition for an alarm
                                Ritika

                                Thanks Nandish.

                                I am able to use InterfaceErrors_detail table now.

                                Will let you know the results.

                                Thanks for your help :)

                                -Ritika

                                • Re: Setting up trigger condition for an alarm
                                  Ritika

                                  Hi Nandish,

                                  Thanks for your help so far.

                                  I was testing the alert.

                                  In Advanced Alert, I have written the query and it validates fine. I also tried executing the query in orion database and the results that I am seeing are all okay.

                                  Just to give you more details, I am seeing 5 interfaces ( in the result when I execute the query) where threshold has crossed everytime during the last three 15-min windows.

                                  I have set the trigger action to send me an email , so I am expecting 5 emails for the above 5 interfaces.

                                  But I am getting only 1 email for one of the interfaces. That interface is also there among the 5 interfaces, but the confusion is why am I not getting emails for the other 4 interfaces? I am always getting email for that same interface only. If I modify my query not to include that interface, then I don't get any emails at all ( though I should be getting emails for the rest of the interfaces).

                                  Looking forward to the reply.

                                  Thanks,

                                  Ritika

                                    • Re: Setting up trigger condition for an alarm
                                      Nandish

                                      I cant say without looking into your SQL query.Could you paste them both the Trigger condition and Trigger Action.

                                      We have folks who are good in SQL(in Thwack),so once you paste I believe we should have more people responding.

                                        • Re: Setting up trigger condition for an alarm
                                          Ritika

                                          SELECT Interfaces.InterfaceID AS NetObjectID, Interfaces.FullName AS Name FROM Interfaces ( Default Statement)

                                          My Query:

                                          WHERE InterfaceID IN (select InterfaceID from (select  * from dbo.InterfaceErrors_Detail where DateTime > DATEADD(mi,-35,getdate())) as a group by InterfaceID having min(In_Discards)>1000 )

                                           

                                          Trigger Action: email to me with subject: Alert: Errors/Discards on ${InterfaceID} ${FullName}

                                          Alert Evaluation Frequency (under General tab of the alert): every 50 s

                                          Do not trigger this alert until condition exists for more than 0 seconds

                                          • Re: Setting up trigger condition for an alarm
                                            Ritika

                                            Hi Nandish,

                                            Further details about the problem.

                                            I run the query in SQL Management studio. I get 6 rows as the result. So, there are 6 interfaces where the error/discard threshold has crossed the threshold during the last three 15-min intervals.

                                            Now, my alert is enabled and I should get 6 emails, right? But What I am seeing is very random. Sometimes, I get email for one of the interface almost instantaneously ( sometimes, I have to disable/re-enable the alert for this) , and sometimes I don't get any email at all. Again after random time interval ( sometimes even 3-4 hours) , I get emails for the rest 5 interfaces at almost same time instant. So, it's very random when am I going to get emails.

                                            I don't think that it's a problem with my SMTP server as same server is working fine for other configured alerts.

                                            Any ideas?

                                            ~Ritika

                                            • Re: Setting up trigger condition for an alarm
                                              Ritika

                                              I tested my alert and these are the test logs:

                                              9/22/2011 2:28:54 AM:    Initiating test for alert MyTest.
                                              9/22/2011 2:28:54 AM:    Retrieved 1 actions for alert MyTest.
                                              9/22/2011 2:28:54 AM:    Preparing action type EMail to test alert MyTest.
                                              9/22/2011 2:28:56 AM:    Email 'Alert: Errors/Discards on ${InterfaceID} ${FullName}' from 'Alert: Errors/Discards on ${InterfaceID} ${FullName}' to '"Network Performance Monitor" <nobody@world.deshaw.com>' has been successfully handed to the local mail handler for alert MyTest.
                                              9/22/2011 2:28:56 AM:    Action type EMail for alert MyTest completed successfully.
                                              9/22/2011 2:28:56 AM:    Completed action type EMail for alert MyTest.
                                              9/22/2011 2:28:56 AM:    Completed test for alert MyTest.

                                               

                                              SQL Management studio is giving me 6 rows , one row for each interface. So I should have received 6 emails after testing the alert. But again I got only 1 email. What am I doing wrong?

                                              ~Ritika

                                              • Re: Setting up trigger condition for an alarm
                                                Ritika

                                                I have this case open already.

                                                Case #272216 - "Setting up trigger condition for an alarm"

                                                Thanks,

                                                Ritika