SWO Alerting Tips&Tricks

Tips & Tricks: Alerting Aggregation Period and Methods

Hello everyone! Today, I’ll share some valuable tips and tricks related to SWO alerting, specifically focusing on the mysterious “During Last” part of the condition.

In SWO, we offer various aggregation methods and allow you to choose the length of the aggregation period. The six aggregation methods are: Minimal, Maximal, Average, Sum, Last, and Count.

Let’s clarify what the aggregation period means. Contrary to common misconceptions, it does not imply that we’ll wait for the specified amount of time before triggering the alert. Instead, when we set an aggregation period (e.g., 1 hour), we consider data received during the last one hour and apply the chosen aggregation method.

For all the examples below, I’ll use the following condition:
responseTime > 5000 ms

Let’s break down the behavior of three common aggregation methods:

  • Minimal:
    • The alert triggers when the minimum metric value over the last 1 hour matches the threshold.
  • Maximal:
    • The alert triggers during peaks (marked with yellow and orange dots) where the metric exceeds the threshold.
    • It remains active for 1 hour.
  • Average:
    • The alert triggers somewhere around the orange dot peak and will reset in approximately one hour (if future numbers remain below the threshold).
    • It also stays active for 1 hour.
  • Sum:
    • The Sum aggregation method calculates the total sum of the given metric values within the specified aggregation window (e.g., 1 hour).
    • It’s particularly useful for metrics related to cumulative events, such as error counts, resets, or other similar occurrences.
    • However, it may not be suitable for metrics like responseTime, where the sum of values might not provide meaningful insights.
  • Last:
    • The Last aggregation method considers only the most recent value (ignoring the aggregation window).
    • It can be handy when you want to trigger an alert based on the latest data point.
    • Be cautious: If the metric value is consistently close to the threshold, the alert may frequently send trigger and reset notifications.
  • Count:
    • The Count aggregation method focuses solely on the number of metric collections (regardless of the actual value).
    • An interesting use case for Count is when you want to raise an alert when the metric stops reporting. For example, if the metric value equals 0, it indicates that the metric has ceased reporting.

Regarding the aggregation period, a shorter period (e.g., 5 minutes) resets the alert sooner. You may see alert flickering (many triggers and resets) and overall, we recommend to have larger periods – at least 1 hour for the best experience. A longer period (e.g., a few days) triggers the alert once (e.g., in the first peak) and maintains its activity throughout the displayed timeframe.

  • Tips & Tricks: Scope for an Alert

    When setting up alerts, precision matters. You want to focus on what truly matters while minimizing noise. Let’s explore how to fine-tune your alert scope effectively.

    Initial Targeting

    Initially, an alert typically targets all entities of a specific type. This broad approach can be useful for generic alerts, such as detecting when an entity is down.

    When you define an alert with a specific scope and later, if you add more entities that match the same scope, the alert will automatically target both the original set of websites and the newly added ones. In other words, your alert remains effective across all relevant entities, regardless of when they were included.

    Refining the Scope

    However, you can refine the scope to make your alerts more relevant:

    1. Selected Entities: Handpick the entities that interest you. Whether it’s one or all, this selection won’t expand as you add more monitored items. It’s a precise way to narrow down your focus.
    2. Contains Text and Does Not Contain Text: These options allow you to target entities based on specific text. For instance, if you have ten websites containing the term “Solarwinds,” any eleventh website with the same text will automatically fall within the scope.
    3. Search Query: The most powerful tool! It lets you search through existing tags and their values. But how do you navigate this efficiently?

     

    Here’s a clever trick:

    1. Explore: Start by navigating to the Explore section. Choose the desired entity type (e.g., Network Devices).
    2. Filtering Options: Use filters to create your custom search query. Combine tags with values and even include the device name. For example:
      location:[Bastogne,Texas,Brno] healthScore.categoryV2:moderate lab

    This query selects all devices in Bastogne, Texas, and Brno, with a moderate health score and the word “lab” in their name.

    1. Copy & Paste: Once you’ve crafted your query, simply copy and paste it into your alert configuration.

    And there you have it! A precise alert tailored to your needs.

  • Hi everyone,

    I’ll be on a long PTO starting next week, and I won’t be able to prepare the next tip until Friday. While I have some ideas about what might be interesting to describe, I’d also like to get your inputs. Is there anything specific you’d like me to explore or uncover in the upcoming tips?

    Thanks,

    Adam from SWO