adatole ✭✭✭✭✭

Comments

  • One of the reasons you have a setting in NPM for "device rediscovery" (set to 30 min by default) is for this. Every so often, a device will re-number the interfaces (this goes for NICs and also for disks). It doesn't change the name of the interface, or anything else. Just the internal numbering. So Solarwinds scans the…
  • It's interesting that you post this, because today my team saw the opposite - we had a box with 32 CPU that was reporting 90% utilization, but it turned out only CPU 1 was pegged, and the rest were down at 1%.
  • Check your polling completion %, you are missing polling cycles. In my experience that could happen due to a few different reasons: * Overloaded polling engine. * The target is across a slow network connection, so some polls simply don't come back in time * The target devices is overwhelmed with other tasks and can't…
  • There's no un-manage at the component level of a template/monitor. You have to go into the application monitor (meaning the specific application of a template on a node) and DISABLE that component (you don't have to delete it, just disable it). The difference is that unmanage would ostensibly have a start and end date.…
  • It's difficult to give a general answer to your question. The interfaces you want to monitor are largely based on the interfaces you care about, which hinges on you having a more-than-passing understanding of your environment. With that said, you can make SOME generalizations about interfaces based on device type. For…
  • I've already run them on the same box. The synergy between the two modules is just too significant NOT to do that. The additional load on the machine is negligible (assuming you have anything approaching a reasonable configuration. My pollers have typically used 4-8 CPU and 8-16Gb RAM. Also, depending on how large your…
  • If you have access to the database and can run a direct SQL query, use this: select AlertDefinitions.AlertName, AlertDefinitions.AlertDescription, AlertDefinitions.TriggerQuery, AlertDefinitions.ResetQuery, AlertDefinitions.ObjectType,ActionDefinitions.ActionType, ActionDefinitions.Title from AlertDefinitions join…
  • High level steps are: * Create the NOC view - the column layout, etc. * Create a new toolbar just for the NOC view. * Create a user. Make sure that user's settings include not timing out, and that the default view is the NOC view from step 1, and the default toolbar is the one you created in step 2. On the machine that…
  • Assuming your NPM server isn't running on a complete POS, 200 elements is not going to cause your server to break a sweat. SNMP is very VERY efficient like that. As for the WAP going down, you can alert if: * it has it's own IP (ie: it's a node in it's own right) * it is connected to a wireless controller and you are…
  • In my tradition, when someone of exceptional merit - intellectual achievement, moral grit, etc - dies, we understand this loss to the community (in the global sense) is a call to action. Our job is to consider - honestly and without embellishment - the character traits this person had, and how the loss of those traits will…
  • I can't emphasize enough the need for your SQL server disks to be configured for RAID 10. We have ~10,000 devices and are running NPM/SAM, and we're seeing upwards of 40,000 read/write operations per second. We've got it on a physical server, 192Gb RAM, 32 processors and the box is still struggling to keep up. ConfIO is…
  • A work around (if you have SAM) is to mark this as an external node (ie: no ICMP) and then pull the SNMP counters you want manually via SAM. It's kind of tedious (because SAM's SNMP counters aren't the greatest) but if what you need is simple it may get you what you want.
  • I think this is right up patrick.hubbard​'s alley. In fact, I believe he has stuff like this sitting on his desk right now.
  • caption is the "name" field for the element in question (node, interface, volume). Since this is a volume alert, you have to specify "node.caption" or else you will get the volume caption. It SHOULD match the node you selected when you tested.
  • Well color me stumped. it looks fine, it SHOULD work. On my system, it DOES work: * Created custom text field in node table "alertemail" * populated it with 1 or 2 email addresses (comma-separated) * created new node alert (when node status is equal to "up") * trigger action is to email to ${node.alertemail} * Tested and…
  • It's the "top 5" that's killing you. Try this one: select NodeID, Caption, <your email field> from Nodes where <your email field> IS NOT NULL
  • Capitalization in the field name? Spaces (always a bad thing)? I'm grasping at straws. Can you run a select statement on the db so we can see real data? Something like: use SolarWindsOrion; select top 5 NodeID, Caption, <your email field> from Nodes So we can see both the field names and the content that's in it?
  • I could make a case that the scripts below would solve your problem, increase your current capabilities, AND save you money in year two, if not year one. (and a vbscript version my team created that I need to upload here soon) These log monitors will let you do traditional text file scanning that you are asking about. Yes,…
  • For that kind of logic, your best bet is to create a separate web form with all that logic and functionality, and then use the Orion SDK to populate Orion when the user hits "enter" - including scanning and adding the node if you want it all as part of a "add node" operation.
  • I realize this is the very definition of "a day late and a dollar (or maybe an hour) short" but - with a nod to rob.hock‌ - what will happen is that you will see a one-hour gap in your data. it will be as if no data was collected from 12am to midnight. If you have alerts which are time-sensitive (ie: if xx hasn't happened…
  • While this may have changed slightly in recent versions, what I understood it to do was: * Child node went down * SolarWinds notes that it is the child of *something* and injects a delay of 1.5 polling cycles before doing anything * after the delay, SolarWinds checks the child node's current state in the database…
  • There's another resource that only shows alerts. Customize the page and add that resource (and remove the other one) and you should be good to go.
  • For anyone who finds this in their searches, Solarwinds has JUST addressed it as part of the SAM 6.1 hotfix #1. A new version of the jobengine will (somehow - i haven't dug into it yet) allow for DST changes every year. I'm glad Solarwinds finally addressed this issue!
  • there are two answers: effectively "big" and "small" "big" means purchasing Logfile and Event Manager (LEM, formerly Trigeo). This is is an extremely full-featured module that can do a wide range of tasks that goes far beyond monitoring a single text file for specific strings. "small" means purchasing Server & Application…
  • We just set up and tested a logic set seems to be working for both alerts and reports: select nodes.nodeid, nodes.caption from nodes left join (select CPULoad_Detail.nodeID, MAX(CPULoad_Detail.DateTime) as LastCPU from CPULoad_Detail group by CPULoad_Detail.NodeID) c1 on nodes.NodeID = c1.NodeID where DateDiff(mi,…
  • My experience is that - barring some mal-formed query that is killing your system - alert triggers are not the biggest load on the system. We've got ~10,000 devices, ~350 alerts (most running every minute, some at a longer interval) and we've never seen an issue with THOSE queries getting in the way. A good way to check is…
  • We have an additional web server on top of all those. As for time zone, EVERYTHING has to be on the same time zone. You can pick which one, but DO NOT set each server to the local time zone. See here for more info: http://thwack.solarwinds.com/thread/57866 As for your design, it looks fine except for the DB - please please…
  • Again, I asked the specific question about poller limits and was told there is none. The only limitation in terms of support-ability is elements. Once you get beyond 110,000 support starts to give you grief.
  • Effectively, "as many as you want". NPM is not limited by pollers but by elements. Currently that limit sits at around 110,000 (that's the official number I got from sales about 8 months ago). I know that version 10.6 is looking to up that limit. We've got 10 pollers (1 primary, 9 additional) PLUS an additional web server,…
  • SolarWinds currently doesn't have the idea of a "probe" (ie: a small poller for small uses like this). However, remember that you CAN send messages into SolarWinds via trap or syslog. So if you have a remote box that is pinging your edge router and can send a syslog message to your core poller, you could use the Syslog…