Volume of alerts?
User Interface?
Something else?
SolarWinds NPM has a very user-friendly gui. There are a lot of tuneable items, to fine-tune your (network) monitoring to your wishes. Also the out-of-the-box installation is very fast. If you ask for the most annoying thing / what should be improved first, is in my opinion the "cripple" handling of snmp-traps, and integration / tuning of these traps in the NPM database. Now we have to do this by inserting items in the alert table of the NPM database, as an action on a snmptrap. Building this is very time-consuming and prone to errors. For me, the fault handling proces is the core-business of a NPM system. I was hoping for improvement of the snmp trap handling, instead of this, new features are introduced in NPM 11.0, like deep packet inspection. We would appreciate that more priority had been given to improvement and solving of current issues. Please don't see it as negative critic, you asked what possibly could be improved. Every day again, we like to work with SolarWinds NPM / NCM /NTA !!
Single WEB refresh for the whole Orion system
https://thwack.solarwinds.com/ideas/1059
Drives me nuts that the executives page refreshes every minute.
Pain when a presentation refreshes to the point that you cant use Orion for the graphs.... etc etc....
Currently, the interface for Sonar Network Discovery. Not enough filters, not enough options, not enough consistency with the other GUIs that are fantastic in NPM
Lack of an API for IPAM and UDT that supports create/update/delete operations.
The interface for creating dependencies. It's horrible. This is my idea of what it should look like:
Alerts and Gui very configurable.
Issues:
1. If I want my NOC guys to be able to Unmanage something, I have to give them admin rights.
2 Need improvement on SNMP Trap Alerts.
3 Setting up dependencies is the worst. CA Spectrum about 10 years ago had one of the best Dependency engines.
Outside of that, the goods way out play the bads.
This is my biggest annoyance, too. I really need to set up dependencies and keep putting it off because it's painfully time consuming.
Does not have a "bring me beer" function.
What if we had auto-dependency mapping in NPM? Assuming we did, how would you want to interact with it? EX: Overriding any mis-identified deps, alert supression, etc.
Dependencies, Dependencies, Dependencies...... FIX
Well, the dependency would be "if surface($mydesk) !contain $beer { function bringbeer($me) }.
Jeeze, you gotta spell it out for some people.
+1 for Dependencies
Talk to me more about dependencies. Imagine in a magical future world we have automatic dependency mapping. How would you see this working?
Just for clarification sake, your web console literally refreshes views every one minute? The default refresh interval is 5 minutes. (Settings > Web Console Settings > Page Refresh)
1. ask your customers to sketch out the standard, and the gnarly, pieces of their network on a 4*6 index card and mail them to you.
2. look at the pictures and see how you could determine the dependencies.
3. offer thwack points for each card.
It would take me less than 10 minutes to write the cards, and you could see how Dependancy mapping might work..
In my world for example if all the interfaces on the routers connected to the root bridge in a broadcast domain are down, then everything in the broadcast domain should be unreachable
If the root bridge in a broadcast domain is down, then its likely that all of the other nodes in the broadcast domain are unreachable.
[Our routers do not participate in spanning tree, the root bridge is supposed to be the one the routers are plugged into]
If the switch the UPS is plugged into is down then the UPS is unreachable
SAM should come with NPM and not be an add on module. Every Exchange or SQL server can take 50 licenses....that is just silly. 150 licenses....3 servers....what?
RT
We could sketch it out if you like:
https://www.youtube.com/watch?v=2x3nItYhIWg
I imagine it similar to a tracert. If the signal/snmp has to go through a device to get to the device being monitored then there should be a dependency option. I like nickzourdos representation of what the interface might look like. I have multiple nodes that pass through 3 or 4 routers/switches before getting to the poller simply because they are on different sites. I started creating dependencies but stopped because I had other things more pressing on my plate (veiled bacon reference...but I digress). I'm not sure how that would work in an SNMP world or if it's more a function of ICMP.
bsciencefiction.tv wrote: Outside of that, the goods way out play the bads.
bsciencefiction.tv wrote:
Totally agree with this.
But SNMP traps needs to be fixed. Dependencies for me doesn't need to be automatic but it should be nicer and easier to configure.
A lot can happen in 5 minutes and I cannot have an event list scroll further then the end of the screen on a single refresh. With over 10000 elements and many critical items and applications/processes, things can change quickly. So yes, 1 minute updates is where the big monitoring screens need to be, and therefore all executives get the same refresh rate.
We cannot afford to have a user call us with an issue that happened 3 or 4 minutes ago and we don't know about it because we are waiting for a 5 minute refresh. We need to be as real time as possible.... But on the other hand we have many users that need to refresh manually or very slow rates since they are just doing research or asset tasks.
At any given time there may be over 40 users using Orion. Should they all really have the same refresh rate?
So, yes, this is very annoying to me...
Oh and don't get me started on traps....... :-)
That is a bit annoying, especially when the discovery finds hundreds of nodes with thousands of interfaces and volumes....
I just run it against my management vlans to get the topology in Atlas and forget the rest.... That in itself in a chore....
wow... this is just a gripe session :-)
Agree the good well out weighs the bad....
But a good ole gripe session is healthy... nothing is perfect...
Most competitors would have a MUCH longer list in my opinion and experience....
Display the nodes graphically, then allow the user to move them as needed. Click save to create the dependency.
Or "import" dependencies from Atlas.
I agree nickzourdos has a nice clean idea....
don't mean to be rude, but that's what alerting is for, you should not rely on your display boards, there good for a everything is green, but that's it
I use groups for both parents and children and enable dynamic queries for group member ship,
all my dependencies are setup as this group is a parent of that group, makes life easy and simple
The amount of time it takes to perform upgrades.
Multiple systems, multiple modules, multiple pollers, multiple upgrades per year (not that I don't like the upgraded function!) become very time consuming.
Perhaps development could think of a way for all the modules to be packaged in a single certified install package - with the modules enabled via the licences installed.
It would be nice to have a regular upgrade cycle where I don't have to worry about module compatibility, and performing multiple time consuming upgrades.
Dave.
This would present an interesting challenge with devices that have ICMP disabled. As familyofcrowes said below it would be great to have Orion give it's best guess as to what the dependencies should be, and then allow us to shuffle things around and add nodes as needed.
Am I allowed to make a second submission?
The UnDP tool on the Orion server could be migrated to the web console. It would also be incredibly convenient to be able to add our own OIDs.... not sure why that hasn't been implemented yet.
Alert categories, or "Types of properties to monitor" in advanced alerting. For example, I create a group and say "I want an alert if any member of this group has disk utilisation over 90%.
Then I go to advanced alerting and find that I can create an alert based on volume conditions OR group conditions, but I cant combine the 2. Annoying
To be honest rob.hock,
I don't think any vendor has gotten the dependencies correct yet. Define correct: automated or user-friendly to configure, accurate, and self correcting. There are just to many variables. Layer2 dependencies, layer3 dependencies, logical application dependencies, etc., etc.... Some vendors has some pieces pretty close, but no one has it right yet, in my opinion.
Now, in my perfect world, what would I expect to see:
1. Automatic Layer2 dependency mapping...and I should never have to manually correct the tool if my network is built correctly. A connection is a connection, cabling and MACs don't lie. But this should also point out where the rogue hubs are located on the network.
2. Automatic Layer3 dependency mapping...and I should never have to manually correct the tool if my network is built correctly. Routing doesn't lie!
3. Automatic/Manual Application dependency mapping...this is the biggest nightmare scenario because of the all the variables. It would be nice for a tool to automatically map this stuff, and then we could use that data to correct the application logic, or manually tune the tool. But unless you have a packet sniffer (probe) at every network junction, I don't see how you can collect the necessary data to build this map automatically.
4. Automatic/Manual Application Service dependencies...the only way I can see this happening is if you list all the services on a server, and "checkbox" this that are part of the application. SW has this now, somewhat, with application template building.
This would be my starting point, and is only my opinion.
D
Report writer GUI seems unable to handle a report like I typically need. I can do about 70% via GUI and then add one too many conditions...CRASH. I've become accustomed to working around the issue by creating what I can and saving often. Then take the resulting report and expose underlying SQL. Continue adding additional logic via SLQ rather than GUI.
Have you played with the new web-based report writer?
Definitely agree on point 1. And to take it further, I think that they should allow for the fine tuning of permissions on nearly everything.
traceroute doesn't provide me with much information for dependencies -- at L3 there are at most there are three [pairs] of routers between the monitoring system and any end-point.
to give you some idea of the scale: 14 routers => 4500 switches => ( 400 UPS and 9200 thin wireless access points.)
L2 dependency mapping based on LLDP neighbors that have one (or fewer) interfaces would eliminate 2/3rds of the alerts during a power failure. (node down alerts for UPS and thin AP)
L2 dependency mapping based on a broadcast domain eliminate another 2/3 of the remaining alerts (on average, 3 switches per building).
So, we've already got an order of magnitude fewer alerts without even looking at the dependency on routers;
If I had an API to create and manage dependencies this is a trivial amount of programming -- I reckon I could give you a perl script in a couple of hours.
I dislike GUI-only ways of doing things because they are not scaleable to large sites.
Yes, I LOVE the new Custom Property editor and the fact that we can add filters and extra columns to the view to help filter down to the objects we want to bulk edit. What I don't like is the following:
- No ability to use custom properties from other object types to filter on
- For example:
- When trying to filter interfaces I want to be able to group by/filter on Node Custom Properties. Why can't I do that? Let me add Node columns into the view. Every interface is attached to a Node so that shouldn't be very hard programatically. Just a simple table join.
- In "Manage Nodes" there is NO FILTER. I can search OR I can group by, but not both... The columns are somewhat customizable but they don't offer filters. AHHHH!!! I'm currently working with a customer that has 6,000 nodes... Do you know how hard it is to bulk edit 6,000 nodes when you can only filter or group by one criteria??? It's really hard if that wasn't obvious. You know what's harder? Trying to bulk edit the 24,000 interfaces this customer has, especially since my next point:
- In "Manage Nodes" when "Interfaces" is selected at the top, all of the Group By options are Node properties... What?? Why?? Don't get me wrong, I want to be able to Group By node properties as well, but Interface Properties should be in there. I can't simply group the interfaces by Interface Type, or Administrative Status, or an Interface Custom Property, or any interface thing. The columns can be set to interface properties and I can sort and search on them, but I can't filter and, once again, you can't Group By on the search you made, so it's an either or situation which makes it just oh so frustrating... The only way to truly bulk edit the manage nodes area is to not do it at all and to go edit the SQL table manually, which should not be a thing we have to do for something that can easily be programmed in by Solarwinds.
Don't get me wrong, I love Orion and it is hands down the most flexible and easy to use monitoring software I've seen. I also love all of the new features they keep improving on. I just feel like these things mentioned above are relatively simple fixes and improvements that should have been done a long time ago. The web-based custom property editor with the improvements I mentioned in my first bullet point should be the de facto standard for all administrative search/grouping screens in Orion. I really hope they work on improving this soon.
No way to create permissions groups for non-AD/non-local users. Orion web users need to be assigned permissions individually every time a new account is created. No, AD or Windows local users will not work/are not scalable in my environment. Also, every time new features are added new permissions are automatically granted to every user in the system, regardless of their actual permission level. I found out today that the Quality of Experience feature just went ahead and decided to add itself to every one of my menu bars when I installed 11.0.
Look, I don't want Orion to automatically add items and permissions I don't need to custom-made menu bars or accounts - I created the custom menus and accounts because I was not satisfied with using the 'Guest' account and 'Default' menu bar. Go hog wild with those things, Solarwinds, I don't care. I wish I could understand what is so damn difficult about providing a group template for Orion web users when there's already one for AD/Windows users. I wish I could understand why it's so difficult for SW to realize that security is an actual thing and giving new permissions to accounts that do not need them is just bad security all around.
A little bit. There's a learning curve, but I'm working on it.
In a perfect world the simplicity of the old Report Writer would be presented the same way in the web interface (without Report Writer crashes).
Would like it to remember filters, or be able to define filter templates, so when you import from discovery scans you dont have to re-massage the filters to select what you want.
Main things that I find annoying as my Orion system has grown.
1. Web sluggishness - when going into a node details it can be rather sluggish.
2. Compliance Report Updating - would like to be able to update a single report
3. Executing Scripts/Downloading Configs. Would like an option to download configs on a set of nodes that just had a script ran against them.
4. (this might be fixed in new NCM, havent upgraded yet) Searching through configs using Web GUI is more often than not incomplete.
Nodes:
~1,000 nodes/~11,000 interfaces
Orion Systems:
1 Main Orion Server
2 Additional Pollers
Components:
NPM
NCM
IPAM
Netflow
Our web console was painfully slow until we realized our network was defaulting to named pipe. We changed our servers to alias and it is literally 1000 time faster.
Can you elaborate? My web console is also painfully slow...
I agree with nickzourdos -- details man, details! I know we have DB issues because of our recent growth but I'd love to make the console a little more peppy for things where it wasn't waiting for data from the DB
Licensing.
really, this should be easier.
YES YES YES bsciencefiction.tv, I too agree with nickzourdos and jbiggley...
DETAILS... MOAR DETAILS!!
please.
It irritates me to no end to waste precious real estate...
For example, just look at all that white space above, to the right, and below my window...
And why am I not able to adjust the size of this popup window... I mean, I cannot even see the full names of the columns.
Also, it really annoys me that I cannot easily customize resources to show only the details we need to see, or are relevant. (at least there has been some progress with this lately, as well as a few workarounds, but still not ideal...)
Oh... almost forgot about the messy page views, and lack of organization in many lists.
We have hundreds of different pages, and when we go in to edit or create new pages, we have a bunch to sift through.
It would be super helpful to be able to categorize our pages with custom properties, or something to help organize them.
Not sure if I am missing something or not on this one, but why am I unable to easily add existing pages to new tabs on views?
Also, when I assign a new user their views, those views are scattered all around there, no organization whatsoever.
While not the biggest as you guys have mentioned a few but one of the ones that always seems to be asked...
Manage Nodes - Volumes. Currently there is only an option for Nodes and Interfaces. Where is the volumes option?
The Network sonar discovery: we use it on a weekly basis, and the interface is dysfunctional, sluggish and bugged... has a lot of place for improvement.
My other annoying things are the limit on traps/syslog, the number of alerting engines (1 per type), the alert rules, worldmap...
I think currently the alert suppression is the most annoying thing to us. I would rather see a scheduler for maintenance suppressions, and something like an emergency suppression. Both with the ability to schedule suppressions for machines. This would make it easy for reports for us. We need to show the client when they were truely down due to an outage, versus how much downtime they incurred due to maintenance this year for example. Currently if we use the unmanage feature we have semi-skewed results since it will track that as up. No disruption. So the up time for those servers are 100%. That is fine and dandy, but when it comes to reporting on emergency server down, and we do not want alerts going off every 5 minutes, we have to unmanage, or suppress alerts via the parser. The parse is time consuming since we have to log on that device, place the suppression in there for each node. This will show the up time as skewed in the opposite direction.
I think it would be easier to just have "unmanage" for servers that you want in the system, but do not want information on currently. Then Scheduled Maintenance - for scheduled maintenance/downtime. Finally, I would then suggest a Critical/Down/Emergency work suppression. If these were all track-able in reports this would be amazing! We then could see the difference between the true up time of the servers, the maintenance, and the emergency work that had to be performed.
jbiggley, nickzourdos, wluther
We realized that in our implementation, that SQL server was defaulting to named pipe. If we pre-pended the TCP/IP alias and port number on a query in SQL manager the query was almost instantaneous where as the regular queries took seconds. Multiply that by the thousands of SQL transactions that Orion does and you get super slow.
use the command on each poller
C:\Windows\SysWOW64\cliconfg.exe
and
C:\Windows\System32\cliconfg.exe
for your 32 and 64 bit alias
It will bring up the following window.
Click on Alias Tab
Select TCP\IP
Enter the Database name for Server name
Uncheck Dynamically detemine port
Enter Port Number and Click Ok.
Do this on both 32 and 64 bit.
Once we did this,
Our Database was faster. Our DBA's quit complaining about us and our Webconsole, Network Discovery, List Resources, you name it was faster.
Even our most demanding webviews load instantaneously.
I hope this helps.
aLTeReGo, I do not know if this would be helpful to you guys on the Product Development side.
Reporting
Reporting in NPM Su-ucks and Blo-ows if you have no knowledge of SQL Query. It just Blo-ows if you do.
I prefer to do reports in Access than NPM.
This includes the new web-based reporting engine?
are you talking about this?
Correct sir - we still have a Win32-based report writer available as well.
deeper/expanded usefulness for groups. everytime I try to deploy it, I find it is not doing what I wanted it to do...which is group for alerting...
also...inability to resize windows that list things like templates, custom properties, etc.. see and vote for Feature Request here: https://thwack.solarwinds.com/ideas/2043
Let me pull it down man!!!
My biggest issue are with the interface utilization graphs. There is a current issue with our network with packet loss and unfortunately the graphs provided by S/W does not show enough granularity. Below are screen shots of the same interface one from SolarWinds and another from Cacti. The S/W graphs are not showing the drops in network traffic.
Orion already does this. This has to do with your settings and will do the exact same thing in Orion as in your Cacti chart (although the design will obviously look a bit different. Also, you are comparing a step chart in Orion to a line type chart in Cacti). For Orion to do this you have to make sure the sample intervals are smaller than the amount of downtime commonly experienced by your interfaces. By default most sample intervals on charts are set to 30 minutes. So, only connectivity drops longer than 30 minutes would show in a chart where the sample interval is set to 30 minutes.
Also notice where the Sample Interval entry is it tells you that Data within each Sample Interval is summarized, so if you have a default 30 Minute interval set on y our chart and within that 30 minutes there was 20 minutes of downtime on the interface then Orion will simply summarize the 10 minutes of uptime data it got into that whole 30 minute sample interval and would just show as one point of data. However, if you had the interval set to 5 minutes then you would have 6 points of data in that 30 minute period, 4 of which would be empty and 2 of which that would display data. That would result in missing sections on the resulting chart (depending on how zoomed out the chart is of course) just like your Cacti chart.
So, here is a chart where the intervals are at 1 minute and shows the last week of data. You'll see the spaces where the interface was not up. This is the same as the Cacti chart you displayed just with a different look. Depending on the type of Chart (this is a custom chart) there can be several ways to display it including Line, Area, Step (the one you showed), bar, and several others. Each one will have different characteristics. This is a pretty under utilized interface so it isn't quite as full as your Cacti chart, but all of my highly utilized interfaces don't have any downtime at the moment, so this is the best I can do. Still, the blank areas represent times when the interface was down (in this case it is just a user workstation that is turned on and off frequently).
Thanks,
Jordan Hume
Field Systems Engineer
Loop1 Systems, Inc.
Jordon,
Thanks for the reply. The Orion chart I provided was the default Percent Utilization which is a line chart. I've messed with all the graph properties to try to replicate the graph you provided to no avail. The issue with change the setting to a 1 minute samping is when I zoom into a particular time frame only the plot points appear not a nice looking graph. Any assistance is appreciated.
Tony Vispetto
you will see the dots/dashes whenever your sample size/time is less than your polling/stats time.
so, if your npm poller(s) only poll/get stats every 2 minutes, and your sample is set to every 1 minute, then you will get dots.
I would like to see NPM work better with both SNMPTraps and Syslog handling. Can you give me some examples of what you're doing and/or what you would like NPM to do for you?
1. Simple syslog message -> alert:
I Tag syslog messages in the syslog viewer and use custom SQL like this:
where nodeid in(select nodeid from syslogwhere messagetype='BGP-PREFIX-EXCEEDED'and datetime > dateadd(hour,-1,getdate()) )
where exists (select 1 from syslogWHERE syslog.nodeid=nodes.nodeidAND datetime > Dateadd(hour, -1, Getdate())AND messagetype='SSHD_LOGIN_ATTEMPTS_THRESHOLD')
where nodeid in(select nodeid from syslogwhere messagetype='KERN_ARP_DUPLICATE_ADDR'and datetime > dateadd(hour,-1,getdate()) )
2. Simple trap alerting:
WHERE nodeid IN (SELECT nodeid FROM traps WHERE TrapType = 'PowerNet-MIB:apc.0.17' AND DateTime > DateAdd(hour, -1, Sysdatetime()) AND Acknowledged=0 )
3. Complex trap alerting...
(I use colors on traps to tag them according to a field in the trap...)
this looks for traps of a specific type and severity that have not been cleared by a subsequent trap...
where exists(SELECT 1 FROM (SELECT a.nodeid, Substring(tag, 1, 20) AS g3alarmsAlarmNumber FROM traps a with (nolock) INNER JOIN trapvarbinds b with (nolock) ON a.trapid = b.trapid AND b.oidname = 'g3alarmsAlarmType' INNER JOIN trapvarbinds c with (nolock) ON a.trapid = c.trapid AND c.oidname = 'g3alarmsPort' INNER JOIN trapvarbinds d with (nolock) ON a.trapid = d.trapid AND d.oidname = 'g3alarmsMaintName' WHERE a.acknowledged = 0 AND a.colorcode = 65535 AND a. datetime >= Dateadd(hour, -4, Getdate()) AND b.oidvalue IN ( 'MIN', 'MAJ', 'WRN', 'SUP' ) AND NOT ( b.oidvalue = 'WRN' AND d.oidvalue IN ( 'ANL-16-LI', 'DIG-BD', 'MG-DCP' , 'SYS-LINK', 'PRI-CDR' ) )) active LEFT OUTER JOIN (SELECT Substring(tag, 1, 20) AS g3alarmsAlarmNumber FROM traps with (nolock) WHERE colorcode = 65280) cleared ON active.g3alarmsalarmnumber = cleared.g3alarmsalarmnumber WHERE cleared.g3alarmsalarmnumber IS NULL AND nodeid = nodes.nodeid)
[this is not that good,, since a node can only have one PBX alert on it, but I've not had time to revisit this]
4. trap -> modify the database
I run a vbscript to set the Thin AP state based on the incoming trap; this is faster than polling the controller
aside: note the style of 'in' or 'exists' this is important because the SQL performs significantly better under some circumstances.
also datetime > dateadd(hour,-1,getdate()) can use indexes on datetime against a constant rather than adding an hour to datetime on every row and seeing if it's in the future.
Network Atlas crashes if more than 2 users are connected simultaneously.
We need Offline Mode for Network Atlas.
Come on guys, it's embarrassing...
I really miss having the horizontal UI.
Having to click around inside the collapsing death temple is nauseating.
And, my non technical unsophisticated users cannot see their custom links easily. I have to retrain them and they have made the same points. They like the old bar with everything "there" in front of them.
Easy.
Now, not so much.