mesverrum · Observability Architect · ✭✭✭✭✭

Comments

  • Interface packet loss is not a thing, I don't mean it isn't in Orion, I mean no such measurable concept exists in networking. They'll need to be more clear. Interfaces can have errors on them, in which case you can look at errors, interfaces can have discards (often these are on purpose so they are usually a red herring),…
  • Thats a painful way to do it, it should be fairly obvious that all the tables in orion do something. Might be easier to run dbcc check and figure out which tables contain the errors and truncate those rather than exporting selected tables across and trying to patch up all the things that inevitably go wrong. To give you a…
  • In current releases Orion automatically builds a map of known connections for each node and just adds it as a tab on the left of the details view.
  • I find the SW agent to be a nuisance to deploy/maintain/upgrade/fix compared to WMI for Windows servers. I would never suggest using SNMP for windows though. Since Linux servers don't have WMI and net-snmp is garbage the agent is strongly a better choice there. In both cases an extremely important thing to keep in mind is…
  • VMAN doesn't really collect the data you are asking for anyway, so it's not relevant to this scenario. Basically you just need to sit down with the local VMware guy and ask them to write you scripts to get each of those metrics from powerCLI, then you set up a SAM monitor to run those scripts. Unfortunately nothing in SW…
  • All of those alert leverage the fact that SolarWinds has built in tables specifically for calculating those rates of change and forecasting them out. Since those tables don't exist for UNDP's you basically need to do the math yourself with custom SQL. Depending on how advanced you want to get into the statistics you could…
  • I throw away almost the entire default SAM dashboard in most cases, very little of it is useful for my cases and I just strip it down and use it as and use it as a place where the teams can search and jump off to specific dashboards for their apps or nodes. The company is so big that nobody even really wants a high level…
  • I've written pretty extensively on this topic and created a suite of tools in powershell to do this, along with a github repo for sharing custom dashboards and widgets among the community. https://thwack.solarwinds.com/t5/NPM-Documents/ViewExporter-ps1/ta-p/519617
  • You can get really complex or really simple with this scenario. The easy answer is to look up the component id of the service you want to restart, make a copy of the ootb service restart alert, edit it so it is specifically checking your http component and instead of passing the alert action the ${componentid} variable you…
  • If you don't already have any spreadsheets of data to import then yes you would just create the subnet manually and run a scan against it. Solarwinds can initiate a scan of the IP's as long as your servers can route to those IP's. I wouldn't call it an "external" scan though, since it is effectively just a ping sweep…
  • Assuming you mean for ticket generation then this is an out of the box feature. https://support.solarwinds.com/SuccessCenter/s/article/How-to-Integrate-ServiceNow-with-the-Orion-Platform-Video If you mean integrating with things like the CMDB then that's still something you would have to build custom via the REST API's of…
  • When you say site are you referring to a physical location or AD object? I've built out trees for ipam based on the location. But it was using some relatively advanced workflows. Most people don't seem to use them much but IPAM has custom properties available at the IP and Subnet levels. I ran a series of scripts that…
  • Can scan any amount of space, but once you hit your license cap of found addresses it stops storing the results for anything else. I don't know off hand if it actually stops scanning or just stops saving the results, but the effect is basically the same for us.
  • I have to say this is one of the most impressive UI improvements I have seen in a while. I feel like there was a LOT of customer feedback that was taken into this, kudos for sure.
  • Is the F5 an actual physical appliance or is it one of the VM based deployments? VM's dont have hardware to track, but i have noticed Orion doesn't seem to be automatically sorting them out in the environments where I noticed this. If it is a VM you just have to disable the hardware sensor on those nodes.
  • The whole dependency scheme only takes into account up and down statuses, so it has no impact on any alert types that involve something going wrong but not taking the node down. At the simplest explanation a down node is something we can't ping anymore, it can go critical for a ton of reasons such as having problems with a…
  • Since I recognize most of the query there came from one of my queries I feel like I have some guesses on this. I don't have any switch stacks that I can mute to test handy but you should try to confirm that maybe there's something missing in the auditing event tables for stacked switched, change those last 2 joins into…
  • I keep hearing that it is supposed to be bi directional, but I've not once seen that actually be the case.
  • if you really want it all gone delete from worldmappoints delete from worldmappointlabels
  • For data inside the detailed retention period the min/max/avg values are all the same. They only start to diverge once the data gets rolled up into the hourly and daily intervals (after 7 days by default) So in your case the peak isn't "gone" so much as it is the same as the avg and is hiding behind it.
  • To your last question about how many times you have to fail, see this: https://support.solarwinds.com/SuccessCenter/s/article/Orion-Fast-Poll-and-Node-Statuses-Explained Assuming you haven't changed any of the relevant settings it would be 12 failed pings in a row over the course of 2 minutes to get a node marked down.…
  • https://documentation.solarwinds.com/en/Success_Center/orionplatform/Content/Core-encrypt-database-connections-with-SSL.htm Is it possible that your windows server does not have the same tls settings enabled as the SQL server?
  • Some Cisco devices do that, basically shuffle lines around between how they display the running config versus what you see in show startup config. Not aware of any way to work around it since orion is just doing a diff between the two outputs and if Cisco displays them differently then there's not much we can do about it.…
  • The existing NCM script engine doesnt parse the results, it just executes the given commands and saves the whole result. There is work in progress on expanding NCM to support Python scripts but that's still in early work so don't count on it in the immediate future. For cases similar to this I have done fairly complicated…
  • Did you clear out any existing active alerts when you changed the reset condition? It doesn't retroactively change the logic for alerts that were already active. When I make significant changes I often disable the alert then when everything clears out I enable it again, or i go through and clear all active instances of the…
  • and e.EventTime < DATETRUNC('day',GETUTCDATE()) and e.EventTime > addday(-1,DATETRUNC('day',GETUTCDATE())) might have to fuss around with the tolocals depending on how you want to deal with that in your scenario.
  • This is the script I used as part of my talk at the SWUG last year on automating group building. https://github.com/Mesverrum/MyPublicWork/blob/master/AppAndAppRoleGroupBuilder.ps1 Obviously your specific group definitions would be different but basically all the pieces you need are in that sample, just make one group with…
  • I just delete that widget personally. I find it unhelpful as the concept of a startup/running config doesn't really apply to a big chunk of my gear.
  • The netflow collector is defined on the devices themselves so you cannot control it at all from Orion. If you need to rebalance the load then the solution that most people do is to actually log into the devices and change their configs so that a subset of them send their data to different polling engines. But the more…
  • SNMP and wmi would not directly be able to tell you if a service is hung only if it is running or not. If the service has wmi performance counters associated with it you might be able to see if the actions on the counters drop to zero. But realistically the best way to monitor if a service is not doing what that service is…