Are you referring to your post that I have just placed the location of below?
If so, I was wondering whther for my situation, to see whether my device has STOPPED logging for a certain amount of time (a period of time which I would like to be able to decide on my own), this works. It sounds like I would go with your first option (Scheduling), but I wasn't clear on what exactly that meant.
That's the one. Any of the monitoring options will work for you, but the one you use should depend on how short you want the threshold to be.
The first script is used in a rule that allows all messages from all devices you want to monitor. It logs a time stamp, threshold, and nag timeout for each device every time it receives a message from each device.
The second script handles the alerts. It checks the last check-in time for each device compares that the current time. If you use the all-in-one option, it will also check to see if it has already alerted you within a set timeframe for the device and will wait if it has.
If you want your threshold to be measured in hours or days, then use a schedule for the second script. If you want your threshold to be measured in minutes (like 60 or less) and you are not already using keep-alives for something else, then that is probably the best option.
The last option is what I use because I measure some of my devices with 30 second thresholds. The all-in-one option requires you to uncomment the bottom of the script and throw away the second script, you won't need it. This option checks the thresholds for every device every time it receives a message from ANY device. Obviously you cant monitor a 30 second checkin if you only get a message every 2 minutes.
Let me know if you need any help getting it setup, there are comments in the script too.
Ah I see what you’re saying, but wouldn’t placing it into a scheduler be a bit more difficult than just changing the ChkIn value within the second script to be a much larger amount of seconds, or say just even changing the value from seconds to minute/hours/etc.?
Additionally, I have tweaked your excellent code just a bit, in order to try and better illustrate my code, as attached.
I have successfully installed your script, and have gotten updates to tell me whether I have reached the minimum threshold that I set through your second option. However, for some reason I am getting multiple emails every minute once the minimum threshold was missed for the first time. Is there a way to make a break and stop this after the first email that notified me about missing the minimum threshold to receive a log message? As of now I created a rule with an action that runs the script, in addition to a scheduler to ruin that script also. Should I have only set one of the two?
I apologize for the lack of a response, I was out of town on vacation last week.
I've looked at the script you posted and you're missing the nag interval setting. Do a compare from your script to mine and you'll see that I am logging 2 date/time stamps in the initial message and both are being checked in the alerting step. The first is checking whether or not the device has failed to check in within the threshold and the second is to determine whether or not you have already been alerted for that specific device within the last "Interval" seconds.
Also, while the all-in-one script is a more simple solution, the reason I have the other available options is because if you have a lot of devices to check and a lot of message traffic, you might not want that second part of your script to fire off 500 times/second when your threshold is set to 24 hours. If you schedule the second script to run every 24 hours, you'll get the same result without the need to go through and check every device you're monitoring every time you receive a message. Same concept with using a rule to check for keep-alives, you can set it to check the devices every 30 minutes or so instead of every time you receive a message.
I personally have devices that send messages every few seconds so the all-in-one option makes sense for me and isn't wasting any processing power that isn't needed. My lowest threshold is 30 seconds and my largest is 12 hours.
Either way, the amount of processing power is negligible and not worth fussing with unless you start experiencing performance problems, then just keep in mind, this is something you can tweak to get a few cpu cycles back.
I hope I answered all of your questions.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.