Comments
-
While there are other issues, you do NEED to monitor the pollers as nodes in order for my version of the report to work.
-
What you are proposing certainly works for me. One thing to keep in mind: If you are using groups, and you have the same account in two groups, whichever group is higher (physically higher on the listing. You can move them up and down) is the rights you will get. SO it's not always "deny", it's just always "whatever I find…
-
And the Sorcerer's Apprentice broom. And the penguin(s)... I can't find the actual quote, but a teacher once said: "Those who do math problems and crossword puzzles in pen are either very confident or very foolish. The rest of us use pencil." to which I add: "But the ones who do those things in crayon are the ones I want…
-
I agree with designerfx's comments above - don't do too much at once. From your last set of answers, my order of changes would be: 1) get that database onto a separate box!! This, more than anything, is impacting your performance and will alleviate the strain you are experiencing right now. 2) as part of that, increase…
-
So I would DEFINITELY say that your database is the top potential slowdown. Not enough RAM and slower CPU (yeah, I know, I just called 2.4Ghz slow). I also forgot to ask, but what kind of disk is the DB running on - how many spindles, RAID, etc. Because this is an inherited system, there could be other issues - too many…
-
NICE work!
-
At the bottom of the alert trigger tab (second from the left) there's a duration drop down. That's where you set how long the condition has to be "true" before an alert is triggered.
-
Interesting. (which I'm using in the same way you use "amusing" - to mean "damn that's frustrating and now how I want to spend my day") Can you list the OID(s) you are using, just so I can look at some of the specifics? Also, when you say you have to use "get table" is that because get or get-next don't work, or some other…
-
First, any query statement that is "where xxx is NOT EQUAL TO yyy" is more expensive than several OR statements, so you should avoid that if possible. Second, this is not a Solarwinds flaw. In the land of databases in general, the handling of NULL values is problematic, to say the least (Head Geek sqlrockstar has A LOT to…
-
MDRISKELL: A couple of things I noticed: First, I would change the field in the last line of your graphic to "Volume Percent Used", the same as the condition group above. Just for consistencies sake. You don't HAVE to, but it's the way I'd do it. That also may be why you can't compare the two fields (maybe a type mismatch)…
-
Just to expand on neomatrix1217's erudite observations: The first field (components with problems) will NEVER work, because that is an application-specific field. Your second alert is focused on individual components. The other fields may or may not populate depending on the component in question. when selecting your…
-
Feel free to mark my comment as the "correct" answer. And you are most sincerely welcome!
-
Is it really an issue of "if money is no object"? I mean, is a cheaper system that is fundamentally broken (or will assuredly break in the foreseeable future) somehow MORE cost effective than a more expensive solution which will actually work? This was the argument that jbiggley, afox ciulei and the rest faced at my…
-
No worries. It happens all the time.
-
Absolutely. As long as the router that is "in between" those two devices is exporting NetFlow data to your collector (ie: the Solarwinds NTA box) you should be able to create a report showing the conversation from serverA to serverB. I don't have a handy copy of NTA right now so I can't get you specific details, but a…
-
IF the custom property is in the Application table, then you should be able to pick it from the list of variables (Insert Variable) and it will be called ${App_group}. If the custom property is in the nodes table, then your field will be a little more complicated. My "didn't try this first" guess is: $SQL{select…
-
I'd have to check to be sure, but in that case you are getting the last trigger time. If it went from bad to good and did an actual reset, that clears the alert out of the active alerts table. When it triggers again, it's a new alert. Unless I'm missing a nuance of how you have things set up.
-
You also may want to put it out there on the THWACK job board: GOT A JOB? NEED A JOB? GET A JOB.
-
Yup, that (clear, instead of "reset") is the one I meant. Working from memory at 11pm will produce inconsistent results. If that fixed your issue, I'm glad.
-
This is the problem with being old sometimes - you remember old things. I quoted what was true back when I looked into it, many moons ago. However, you are correct: Orion Fast Poll and Node Statuses Explained - SolarWinds Worldwide, LLC. Help and Support That said, the essence of what I was trying to say was still correct…
-
See my last update (right before this one to this thread). The key is that you won't see an option for "hard drive". You have to specify the generic UnDP fields ie: PollerName is "blahblah" Value > 27 etc... Details in the other post.
-
I'm pretty sure you should see it in your customer portal now. It went live this morning.
-
Not even slightly. You can credit aLTeReGo for and his band of merry devs for the man-ness. I just hit the problem before you did.
-
Yep, 50% of list is your typical option. However remember that you don't HAVE to buy the same license level that you have in prod. So even if you have an NPM SLX (unlimited) license, you can ask for the lab pricing on NPM SL100, which is $2895 (but $1447.50 for a lab license).
-
This helps, but I am still curious - if you just walked up to the machine without any other information, could YOU tell if it was manually disabled versus actually honest-to-goodness down? It sounds like "no", but please tell me if that's a false presumption. That said, here's another option: Alert #1: update custom…
-
Let's start with my second-favorite question in monitoring: how do YOU know the service was disabled manually? What I mean is, imagine this scenario: You get a frantic call from a user, because "it" is down. You log onto the server you type a command or two AHA! You see that the service has been manually disabled. So what…
-
Probably. The issue (if I recall correctly) is that getting SNMP and AIX to behave is tricky business and could require recompiling the SNMP agent if not the kernel itself. (yeah, we all love IBM soooo much!!). I would start by doing an SNMP walk on the machine itself and outputting the results to a text file and reviewing…
-
I know this is out on thwack somewhere, but briefly: 1) node is good. Polling is a single ping every 120 seconds (or whatever you have it set to be) 2) single ping fails. 3a) node state changes to "warning" 3b) poller goes into "rapid ping mode" (my term, not Solarwinds) sending 1 ping every 5 seconds until 10 consecutive…
-
The quick answer to "why" is because the SNMP OID we're pulling enumerates them as such. And I'm sure there are a few SPECIFIC cases when you want to monitor them. But for the most part, I've never found a compelling reason.
-
I would actually make the delay longer - between 3 and 5 minutes. That way you get AT LEAST two polling cycles to confirm the box is really down before "that guy" gets a message. Alternatively you can create a custom property ("servicelevel") where "level0" indicates high-criticality, poll-every-60, and "level1" indicates…