I am currently engaged in a project that requires me to find a solution for this if at all possible so I would really love some feedback here.
1 of 1 people found this helpful
We do have the concept of a state variable. You would probably need:
- A rule that detects the event and sets the variable the first time it sees it to "ON" (the "action" of the rule would be to modify the variable)
- A rule that detects the event you want to be notified on that checks to see if the variable is "ON" and sends email if it matches (the action of this rule is to email)
- A rule that detects the closure event and sets the variable to "OFF" (the action of the rule would be to modify the variable)
You might be able to modify the first two to be one rule, depends on whether there are 2 events or 3 or how you want to visualize it. You can create state variables from Build > Groups (they can contain string or number variables). The action you want is "Modify State Variable" under Actions.
I am pretty sure this will do what you need, but if I need to set up an example to make it clearer, let me know.
That is awesome!
So to take it a step further is it possible to have some sort of action take place at a specified interval as long as the state variable is in a specified state?
For example: Email me every 30 minutes as long as state variable = down?
Hmm, rules have to be triggered by an event (as it stands), so you'd have to have some kind of event that comes in to fire it off of.
Not exists rules do let you do something like:
If you see the event get generated
and you don't see this other event that should come after it within 30 minutes fire an email.
Thresholds do let you re-infer, but right now you can't re-infer on a threshold of one event, and it's not quite the same thing. If it were multiple events, threshold re-inferences would let you do:
If you see 2 of these in 30 seconds, fire the actions associated with the rule
keep firing that alert every 30 minutes if the condition persists
It's not exactly the same thing because it's the event condition that is persisting (i.e. being above threshold), rather than the state variable.
One way would be to pick an alert that is fairly common, but you'd want to be careful that it wasn't something SUPER high volume or every time that occurs it has to be parsed, sometimes at scale that bogs things down.
Here are the details of my specific use case...
I receive a log that indicates a system has been dropped out of a VIP on a Fortinet. That then triggers a state variable to be "down". As long as the state variable is "down" I want an email every 30 minutes reminding me that it's down. Eventually I will receive a log that indicates the system has been re-added to the VIP at which point the state variable will be set to "up" and I will stop receiving emails.
This is what I would like, would it be possible with LEM... or anything else you are aware of?
P.S. We just recently purchased LEM so I don't have much practical experience with it yet so these may be simple things I don't yet know about the product. I also realize that this use case is a bit crazy; however it's what I have been tasked with.
I called support and was informed that what I am trying to accomplish isn't possible. Specifically the re-alerting every 30 minutes. This is because LEM is designed to notify only on changes or events that occur, not on a repetitive basis due to something being in a specific state.
Well, the state variable lets you store the state, but a rule can only fire based on an alert happening. So, it's the "every 30 minutes" part that makes this difficult, without some kind of alert that happens every 30 minutes. We're looking at ways to make scheduled nDepth searches and alerts triggered from them, which would make this possible in another direction (based on a search, rather than real time rules, which I think will accomplish this goal - you'll see this in other systems that can do scheduled searching/alerting with fairly good query language). You can also store time in a state variable, but you can't do math in the rule (only greater than/less than), which also doesn't help.
I wonder if there's a creative way to solve this problem, though, with a mix of alerts that do regularly fire and the state variables mentioned above.
I'll chew on this and circulate it to the rules guys to see if they have any ideas.
PS: thanks for following up the results of your support call.
Hey Byron, we think we've got a way to accomplish this, but would like to test it. In order to get as real-world as possible, can you provide copies of the alerts and/or syslog entries that should set and clear the condition?
I would be happy to provide these; however, to protect the guilty I don't want to post them here in a public place. Do you have a method that I could use to send them to you directly? It doesn't look like you have Direct Messaging turned on for your Thwack profile.
The word from the thwack team is that we have to be friends in order to exchange DMs. The other way to do this is to create a "Private Discussion" between just us (Create > Discussion > Private Discussion). You can also email me at firstname dot lastname @solarwinds.com too.
I have sent you an email with the logs, thanks for looking into this for me!
Did you have a chance to use the logs I sent you to test the idea you and your team had come up with?
Good news, we've tested our theory and confirmed it worked, we're cleaning it up and getting copies of everything so you can see/implement them.
Sweet, I can't wait to see what you have for me!
We built this in such a way that you could test it end-to-end without just using your syslog events. That made it a lot easier for us to trigger, and means that you can test it without your events to get the flow down without figuring out how to send syslog messages on-demand, then create a version with your events that should behave similarly.
- "Step 1 - Trigger BEGIN Event": This is a dummy rule used to trigger the watch state, only used for testing. If you want to test this without using real syslog data, this rule generates an alert that causes the chain to begin.
- "Step 2 - Rule 1 (Find out if device is down and BEGIN watching)": This is the rule that detects the down state and starts the timer. This is the one you'll want to use and put your own syslog data info in (whatever the name/type of alert and any other criteria is for the "we're down, start notifying me every 30 minutes" alert).
- "Step 3 - Rule 2 (Keep watching and send mail on proper interval)": This rule is just used to fire every 30 minutes as long as you're still in the "watch" state.
- "Step 4 - Rule 3 (Find out if device is UP and STOP watching)": This is the rule that detects the up state and stops the timer. This is the other one you'll want to use and put your own syslog data info in (whatever the name/type of alert and any other criteria is for the "we're back up, stop notifying me" alert). You might also want this one to notify you and let you know the condition has been cleared.
- "Step 5 - Trigger STOP Event": This is another dummy rule used to trigger the stop watching state, only used for testing. If you want to test this without using real syslog data, this rule generates an alert that causes the chain to stop.
As a part of testing, our QA team found these filters helpful. They are built around the rules above, but help you track EVERY STEP of the way. They are pretty self explanatory:
- My Device is Down (matches the example's "down" alert)
- Rule 1 Activity (shows activity from the rule marked 1 above)
- Rule 2 Activity (... 2 above)
- Send Email (shows email activity)
- Rule 3 Activity (shows activity from the rule marked 3 above)
- My Device is Up (matches the example's "up" alert)
Testing with the Example
- Create a state variable - the example uses one called "Event" with a single Text variable called "NO" (it will be set to "NO" when the rule should not fire, and set to "YES" when the rule should fire... you can of course create your own things that might make more sense than setting a variable called NO to YES which took me a few minutes to figure out, but you'll need to keep them straight through the example ). (We would provide the group for you to import, but there's an issue preventing group import at the moment)
- Import rules (Gear on the right side > Import - you can ctrl+select to import them all at once).
- Edit the imported rules
- Step 1: Just needs to be enabled and saved.
- Step 2 - Rule 1: Select the state variable you created, even if it's called Event you'll need to re-select it. Into the "NO" field (if you're using the example) drag a Text Constant and type the text "YES". Enable and save.
- Step 3 - Rule 2: Select the state variable you created, even if it's called Event, and the field you created, even if it's called "NO", and drag it over the one in the correlations box to replace the "identical" one. (This has to be done because the one that's there is just a placeholder, that's why the rule has an "Error".) Configure the Email Alert to send to whatever user you want it to send to, or if you are content with the filters, remove the email message action. Enable and save. Click yes on the warning, it's warning us about the possibility of an infinite loop, which is kind of our intent here, the state variable will cancel it out.
- Step 4 - Rule 3: Select the state variable you created, even if it's called Event you'll need to re-select it. Into the "NO" field (if you're using the example) drag a Text Constant and type the text "NO" (anything but YES, really). Enable and save.
- Step 5: Just needs to be enabled and saved.
- Import the filters if you want to track, or create your own (import for filters is on the Filter Group/left side gear > Import).
- To trigger the "BEGIN" event, edit a filter as the "admin" user and click save.
- Enjoy (it will take 2 minutes to enter the loop, then fire every minute in the example)
- To trigger the "END" event, run an nDepth search as the "admin" user (doesn't matter what or how long, just go to Explore>nDepth and make sure a search runs).
Modifying for your Environment
You need a copy of the 3 rules (step 2, 3, 4, marked rule 1, 2, 3). Modify rule 1 to match the initial event you want to detect, modify rule 2 to be the interval you want to notify on using the response window and correlation time, and modify rule 3 to be the cancellation event you want to detect.
And, may the force be with you.
Stateful Log Alerts Files.zip 54.5 KB
Unfortunately I have not had a chance to test this as we came up with a very different solution; however, I have went ahead and marked this as the correct answer because I trust that it would likely work.
Thanks for linking back to those threads. Very helpful for those who come across this information.