Is there a way to properly configure the suppression of service down alerts based on the presence of a server down alert using Prometheus and Alert manager? We have multiple servers, each running multiple services. Currently, we are monitoring both the servers and the services using Prometheus and Alert manager.
In our scenario, we have server instances in the format "XYZ:9182" and service instances in the format "XYZ:8080", "XYZ:8079", and so on. Each server instance is associated with services in the specified format.
To achieve the suppression of service down alerts when a server down alert is present for a specific server, I have attempted to use inheritance by configuring the Alert manager as follows:
# Matchers for identifying device names and suppressing service down alerts.
- source_match:
alertname: 'Dynamic_Device_Down'
instance: '([^:]+):.*' # Regex capture group for device name
target_match:
alertname: 'Dynamic_Service_Down'
instance: '([^:]+):.*' # Regex capture group for device name
equal:
- instance
However, the above suppression logic doesn't seem to be effective. Could you provide some insights and guidance on how to address this issue?