5 Replies Latest reply on Dec 2, 2015 1:20 PM by cgregors

    No reliability in new alerting with 11.5

    mharvey

      After experiencing a number of issues with the 11.5 alerting, I'm starting to question the reliability.  I've already reported a couple of issues to SolarWinds that have been bugs based on what I've found.  I've also noticed that with the change and the alerting history now being logged to the AlertHistory table, there is no reliable mechanism to ensure triggered alerts are written there.  Prior to the upgrade we were tracking the number of alerts that triggered after hours and that triggered to our on-call system.  I'm noticing that none of these alerts are logged into the AlertHistory table, meaning there is now no ability to report on these items.  I'm contemplating reporting them to support but after hearing that there is no timeframe that can be given for the other issues I've reported being resolved, I'm starting to hesitate reporting anything.  I may take the steps to roll back alerting to the previous method to see if this will alleviate these issues, but I'm curious if this will have a negative impact on the new alerts I've created that have condition types that the older alerting engine didn't have.  Anyone else feeling frustration with the web-based alerting?

        • Re: No reliability in new alerting with 11.5
          mharvey

          So I rolled back one alert to the older advanced alerts, but now don't see a way to migrate it back to the new alerting engine, and received no real steps from support other than how to start to revert a single alert from the new web alerting to the older advanced alerts.  

            • Re: No reliability in new alerting with 11.5
              Jan Pelousek

              Hello mharvey. I can describe you how the migration works.

              Some facts:

              • Migration is only one directional, only Old -> New. Doesn't merge new changes after reverting back (this just deletes the new definition to show that in the old managers)
              • Migration doesn't delete the old definitions from the database, just disables them. Per AlertDefId the old alert managers know, that shouldn't show the alerts, operating in the new alerting engine (were migrated)
              • Migration skips already migrated alerts per AlertDefId
              • The old definitions in tables Alerts(basic alerts) and AlertDefinitions(Advanced alerts) contains column "Reverted", which tells the migrator "Don't migrate the reverted alerts". This bit is set to "true" when reverting.
              • The control page for reverting (individual/global) is located at Orion/Alerts/manage/newalertingdisable.aspx

              So if you were operating already in the reverted state, than to migrate again with merged changes you need:

              1. If you have globaly disabled new alerting (not just for indifidual alerts), enable it - Orion/Alerts/manage/newalertingdisable.aspx
              2. Delete the new definition from DB of that specific alert
              3. Set the "reverted" bit to false if reverted previously
              4. Run the migration. You have more choices:
                • Run the Configuration Wizard
                • Use Orion API to invoke the migration verbs. Those are located under Orion.Alerts(Basic alerts, verbs MigrateBasicAlert(alertId) or MigrateAllBasicAlerts) and Orion.AlertConfigurations(Advanced alerts, verbs MigrateAllAdvancedAlerts or MigrateAdvancedAlert(alertdefid)) Again more choices:
                  • Install Orion SDK - Orion SDK Information, find the verbs and invoke them from SWQL studio, or you can write the migration script per your needs
                  • Navigate to Orion/Admin/swisinvoke.aspx, find the verbs and invoke them

               

              I hope it helps to understand how it works and how to possibly start using the new alerting, at least partly for now.

               

              Regards,

              Honza

              1 of 1 people found this helpful
                • Re: No reliability in new alerting with 11.5
                  mharvey

                  Honza,

                   

                  Much appreciated on these instructions. I had only reverted one alert, and have decided to leave the others running in the new alerting for the moment.  My biggest issue has been with the lack of history on some of my node down alerts (one of which is now reverted).  It seems that there's no indication of when a triggered alert is written to alert history, and when it's not.  I have hours of gaps in the AlertHistory table where alerts would have triggered, but no entry is made.  This along with the issues I've reported to support has begun to question the new alerting.  I'm going to see if the reverted alert will behave better writing to the older tables, and if so, I'll look to see if I need to move any others. 

                   

                  I'm also curious, if I created an alert in the new web alerting, I'm assuming there's really no way to "revert it" as there wouldn't be an older alert to reference.  Is this a correct thought?

                    • Re: No reliability in new alerting with 11.5
                      Jan Pelousek

                      There's quite weird issue with that history. I don't have very deep knowledge of that area, but as far as I know the AlertHistory table should be ~equivalent to the old AlertLog and it's logging should work fine (of course there may occur bad situations, when isn't).

                       

                      To the second question: Backward migration New -> old (in fact isn't just revert) is not possible. The migration itself is pretty complicated mechanism, using mapping schemas between old and new structures and other algorithms. The new ones are much complex and complicated, than the old ones, contain more features, constructions, alert sections, which is not possible to migrate to the old alerting. Migration itself was designed for the "Upgrade scenario" to make the old definitions ready for the new engine and the alerts, which is not possible to migrate (e.g. definitions using Custom Properties, which no more exist, or causing other troubles) is leaving the old definitions as they are.

                       

                      One more thing, which I recall - there's also possible to use the web based Alert Import function both for new definitions (just import) and also for the old ones (automatic migration to new alerting), so it's possible to make the migration also this way, but remember, that in fact you're creating new alert, so you should disable the old one and active alerts will be retriggered (regular migration migrates also Active alerts to not fire the actions during upgrade).

                       

                      H.

                • Re: No reliability in new alerting with 11.5
                  cgregors

                  Matt writes: Anyone else feeling frustration with the web-based alerting?

                   

                  Oh, yah.  I've got 3 instances of NPM running with an EOC linking them together (EOC sucks btw).  It wasn't until I had done the 11.5.2 upgrade on the 2nd instance that I noticed the table changes from AlertDefinitions to AlertConfigurations.

                   

                  I had spent a lot of time writing some pretty spiffy (IMO) php code that deconstructed the AlertDefinitions.TriggerQuery definition and then stripping portions of the SQL out to identify the potential targets that the alert could trip against.


                  This was very useful in troubleshooting alert design.  Sometimes we would build an alert that we "thought" should work on a set of elements but the set would be either too big or too small.  The only way to test the whole set was wait for the customer to complain.


                  The closest possible way of doing it now is to write new code to extract the SWQL from the "Edit Alert / Trigger" and "Edit Alert / Reset" web pages like this

                  ScreenClip [1].pngScreenClip.png

                  I would then strip out pieces that are variable and then evaluate it against the database.


                  Example of the above query in initial form

                  SELECT  E0.[Uri], E0.[DisplayName]

                  FROM  Orion.Nodes AS E0

                  WHERE

                    (  (

                    ( E0.[CustomProperties].[Node_Type] = 'NETWORK' ) AND

                    ( E0.[CPULoad] > '80' )

                    )

                    OR

                    (

                    ( E0.[CustomProperties].[Node_Type] = 'ALL' ) AND

                    ( E0.[CPULoad] > '80' )

                    )  )

                   

                  Example of the above query stripped.  Note the CPULoad portion missing.

                   

                  SELECT   E0.[Uri], E0.[DisplayName]

                  FROM   Orion.Nodes AS E0

                  WHERE

                    (  (

                    ( E0.[CustomProperties].[Node_Type] = 'NETWORK' )

                    )

                    OR

                    (

                    ( E0.[CustomProperties].[Node_Type] = 'ALL' )

                    )  )

                   

                  NOTE: I initially considered writing my own XML -> SQL compiler to compile the XML in the AlertConfigurations table.  After I looked at the XML I realized this would be pointless.  Without knowing the exact mindset of the coder of the real XML -> SQL(SWQL) compiler the chances of getting it right are close to zero.  It's better to take the SWQL from the web page and work with it.


                  I'm working on the "extract SWQL" from the web page portion at this time......


                  Chris