Comments
-
Might get more traction in Report Lab link.
-
I find it easier to find some attributes via the trigger condition than the actions. You can pull out the attribute names to make it easier to find via the trigger actions menu if you pull the alerts SWQL values *Image curtsy of @"wluther"
-
Wondering if it was removed from the installer for new installs, and the updates just dont remove it (yet)?
-
I would set this logic in your ticketing tool if you have one. Interested to see what some others might come up with.
-
This is what I do. Warning from 90-95 and Critical from 95-100. If you have an operations team loop in with them to establish weekly/monthly review of flapping alerts. You can also make a dashboard that pulls the count of alerts over the last 7 and 30 days (I'll try to find the THWACK post I stole mine from and edit when I…
-
I also have this question. I know we have a tool for CPU/processes. Hoping we can get one for disks as well. Getting top 10 processes with trigger condition SolarWinds.APM.RealTimeProcessPoller.exe -n=${NodeID} -count=10 -sort=CPU -timeout=120 -alert=${N=Alerting;M=AlertDefID} -alertId=${N=Alerting;M=AlertID}
-
Would the timer on the trigger actions help? Or set an escalation to make a 2nd ticket to the off hours crew? Might also be able to setup an action in service-now to do time of day routing.
-
I integrated DPA to service-now. Its fairly 'round about' tho. Use DPA alert engine to send trap to SolarWinds. Have SolarWinds make an alert on that trap that makes a service-now INC.
-
Was thinking the same thing, or they are loading the time ranges differently (might not be doing 30 days, but calendar function for days in this month (ex: Feb only has 28 days)).
-
Have you confirmed they got added via SNMP and not as ICMP/Ping only? I had a few UPS's that did something similar years ago and I think it was because they were using whitelabel motherboards (few vendors used the same components, so we had some issues with discovery/vendor labels.)
-
I monitor my monitoring tools from 3rd party tools. Specifically something that can do synthetics or end user monitoring (like WPM, to simulate a real user accessing, logging in, and getting expected data). I know not all budgets will allow for that. Alternatively your load balancer or application firewall might be able to…
-
Groups might be a viable alternative, unless someone smarter comes along and knows if AlertHistory table is in the alert engine.
-
Depending on what trigger condition your using. What are you doing with SNMP? Where are you sending the traps? I also have a global node down that covers 90% of my nodes. If your using a ticketing system I would configure the alert notification there based on CI owner or assignment group.
-
I don't see see Orion.AlertHistory as accessible in the drop down or in the drop down for Custom SWQL. If this was a SAM Template you might be able to go at it from that direction. What use case / condition are you going after? Any extra intel on what your alert object?
-
For anyone lazy I made a quick update to the OP to include this update. Seems like finish time stopped working but not a big deal to me if we have start and duration. SELECT Distinct TOLOCAL(ST.EventTime) AS [Start Time] --,TOLOCAL(ET.EventTime) AS [Finish Time] ,CONCAT(--Downtime formatted (CASE WHEN…
-
Sent you a PM.
-
Interesting link. Wonder if you could do a partial restore of the database on the alert configuration table.
-
I am able to use "Order by" on my custom query (and custom table) dashboards. I always build these out in SWQL studio to make sure they are working and to fine tune, then copy them over into the dashboard. Also highly recommend setting up the search (Search SWQL Query:) on custom queries. Really makes them more usable, I…
-
Not familiar with "DATETRUNC" in SWIS. I personally use ADDDATE without issue. ex: ADDDATE('Minute', -30, GETUTCDATE())
-
Thanks for opening the case and double thanks for posting an update!
-
There is a defect with time based reset conditions. Does this happen around the time you reboot/cluster fail-over your database? That is when I experience most of my issues. Restarting the alerting service on the MPE will fix the issue 90% of the time, a reboot of the server will resolve it the rest.
-
I setup a condition on my ticketing system to alert me if it has not received an alert from SolarWinds. I have a heartbeat alert that fires every couple of minutes. If x have not been received in y period of time it sends me a call out.
-
We didnt do domain admins for security reasons. You just need an account with the right permissions. This can be setup vai GPO so thankfully you only have to do it once. You can also add an account to the Domain Controller equivalent of a local admin (Builtin\Administrators group).
-
4369 and 25672 is for RabbitMQ. APE's will need to communicate to the MPE (and MPE active and standby with each-other). Template firewall rule: SRC: MPE (active and standby), APE DST: MPE (active and standby) PRT: 4369, 25672
-
Confirming your use case. If you use case is the to have a single standard trigger for all alerts to make service-now alerts I have not come up with a good way to do that with the out of the box service-now integration (I had to write a custom script to do this). What is the focus of your alert (I want to alert on:) on the…
-
There are a few posts about similar issues. Does the condition happen when you reboot your main poller or database? Do you have HA or SQL Clustering (for the Orion DB)? 90% of the time restarting the alerting service will get the timed reset actions working again. If that fails a reboot of the server usually gets it going.
-
Not sure what data you will need. We have a large environment. MPE (main polling engine), APE (additional polling engines), AWS (additional web servers), and HA (high availability). API calls to the MPE work fine. The same API calls to the AWS fail. Based on other replies this sounds like a known defect save for PowerShell…
-
Bumping this as I am wondering this as well. I have APE's in different domains and that works fine, but not had a client do it with additional web servers yet.
-
APE's need access to the database. As well as the main poller on several ports (5671 (rabbitMQ), and all the other APE's on 17777 (SWIS)). Shamelessly stolen from port requirements KB.
-
Sorry to jump to the basics but have you tried a controlled restart after the power issue? Bring up the DB, then your main polling engine (then any extra pollers, web servers, wpm, agents). Most alert delays I have experienced have been RCA'ed to the database (usually after cluster fail-over). Orion does not reconnect well…