Implemented
over 1 year ago

SNMP Trap & Syslog Rules Overhaul

In my opinion these two items have been neglected by SolarWinds for many years.  We use SNMP trapping extensively within my organization and every rule we have to create is an arduous process.  Ideally there are several aspects of both of these functions that should be improved upon.

1.  Copy/Paste rule creation.  When we look at alerts we can take a similar alert and make a copy of it altering the rule to suit our needs.  This element doesn't exist in the SNMP or Sylsog rules.  Each rule must be built from scratch.  For example I have multiple rules that are exactly the same with the minor exception being one specific OID for Netscaler traps.  If the OID equals one of our web servers we send it to the web team...if it is one of our exchange servers we send to our messaging team, and so on.  However to build these rules we have to manually create.

2.  Import/Export actions.  In alerts you can import/export an action for use within another alert.  This functionality is missing from the Syslog/SNMP rules.

3.  Enhanced ordering.  At my last count I have 160 + SNMP rules.  These rules are top down ordered.  When I create a new rule it is placed at the bottom.  If I need that rule to go to the top I have to click my mouse 160 times to get it to the top (no wonder I've had carpal tunnel surgery on both hands).  A drag an drop feature would solve this issue.

My first three requests I would think should be relatively simple because these features exist today within other components of SW.  The 4th I assume would be a little trickier to accomplish.

4.  Treat SNMP/Syslog rules like alerts that must be acknowledged (if desired).  Right now if I get an SNMP trap that I would consider to be critical it sends an email. It is not treated like an alert that requires acknowledgement.  I understand this would be a much greater challenge because you would have to have well defined reset scenarios.

I know that I am in the minority in this as it seems that many other members of the community rely less dependently on traps but they are a part of our environment and they aren't going away.  As they continue to grow I will be forced to look for alternatives to SW in this space if SW doesn't evolve these areas.  I have been using SW for 6 years and I have seen little to no improvement in these two areas.  I had hoped with the acquisition of Kiwi there would have been some nice improvements but alas that isn't the case.

Parents
  • Seriously? not on the 10.6 roadmap?

  • I haven't seen a roadmap for 10.6 yet...do you have a link you can share?

  • Added a comment to that what we're working on thread.. nothing new, as per SOP @ SW.


    Peter   

  • SIGH....I'm very disappointed.  I'll move forward in finding an alternative to SW for trapping....Hopefully I don't find a full product replacement while I'm looking.

  • So my strategy is to setup a Linux server with Syslog-NG.  The problem is handling traps.  I found some code that will convert a trap into a syslog message and pump it into Syslog-NG.  In Syslog-NG I can filter noise messages to avoid further processing.  Then I can deliver the syslog message to Simple Event Correlator process.  This will allow the syslog server to send traps and/or syslog.  The problem is spoofing the source address of the messages sent to Orion so it likes it.  Otherwise, you could send an email to Alert Central.  I can also deliver the syslog directly to Orion after filtering.

    Is there a programmatic way to create Orion Events and Alerts?

    http://simple-evcorr.sourceforge.net/

  • Check out this sec rule base for Cisco and think of the possibilities...

    http://simple-evcorr.sourceforge.net/rulesets/cisco-syslog.sec

  • It was long ago suggested that SEC would be a good fit for our SNMP trap handling needs, and it has been implemented in a couple of corner cases by someone who's gone for 2 years, for a use case that will likely disappear in the next one or two.

    That being said, I've spent virtually no time at all with it, and don't possibly have the time to convert everything trap based at the moment.

    That also being said, SW needs to start looking around for ideas or hire a dev team who can handle these tasks they've been dodging forever to remain an integral part of monitoring. 

    99% of the visible features in the last two years have had minimal impact on my life, my systems were scaled for the old sizing charts, and growth has been moderate recently, but that's likely changing shortly.  Integrating piles of other projects into SW to handle the deficiencies and being able to shut down the Syslog and Trap receivers as well as overpriced Nagios (SAM) is likely the way to go, because the unified interface is decent for simplistic views of individual components.  The front page is decent as well, but the entire paradigm of how Syslog and Traps.and honestly, even the licensing mechanism for SAM, are pretty much completely broken. 

    The enhancements in view restrictions and integration further with NCM is great stuff, and either Kiwi's lessons or others need to be grafted into NPM, preferably with the option to call it from a separate database because table size in NPM is an issue, even if you cache the entire tables in RAM.  I would simply merge the Kiwi front end into NPM and keep the DB calls separate for the Kiwi resources, if I wanted to make the fastest possible enhancement.  All the syslog resources could move by themselves and stay out of "SL_Orion" entirely, maybe even on a separate DB server if so desired, since SW support always alludes to the scale of the problem as the primary reason why the product fails so hard in the area.

    The API is another one of those projects that had promise, but where are the API calls to provision nodes (bypassing the SNMP/login checks and trusting the source information) as well as custom poller assignments to nodes and interfaces.  For that matter, we need an API call that can do a list resources and add new interfaces based on SQL criteria across interface table fields including custom properties.  A nice framework to start with, but not enough followup.  The same stuff with alerting that has had more and more of it added to the web interface so that allegedly it can make it into the API.

    On an aside, the migration of stuff to the web interface (and then allegedly API) is ultimately good, important and a non-trivial amount of task, but in some ways it's too incremental, and the primary focus is killing the MSP customers.

    SW asks for this process, we keep pointing out the obvious (if you used the product for more than 100 nodes *ever* and tried to make sense of syslog and traps or needed to provision 150 new nodes asap and had to grind through it for a week), and SW Product still says and does nothing useful for these obvious and grinding sore points.  I cannot predict the future with 100% accuracy, but I can see quite clearly a strategic move to a platform that addresses these several critical points (Administration/API, Syslog, Traps, UnDP multiple-table handing) as being the proper business decision.

    And to be perfectly frank, working with vendors who continue to refuse to enhance their products for my primary use cases isn't really anything resembling good business either.  It's very one-sided.  We give money, use the product to run our network management, and give you guys feedback on the issues that cause us pain (time lost doing routine tasks that need attention, inability to gather data or present it in a meaningful manner, and scalability woes) and you guys do literally nothing for years.  I don't feel like I get a lot for my 10-11k a year, not to mention the other 10k a year that I have had the product (50k over 5 years), and that's just the tip of the iceberg on TCO in lost opportunities (alerting missed due to inadequacies of the product making it impossible to do so meaningfully) and time spent in the care and feeding.  I can say that for a good part of 3 years at a decent salary, it was 40% of my time until I automated some things, and some things (like grouping) have gotten ignored on some of the more recent devices.

    Grouping and Parent/Child relationships is another thing that needs to be able to happen via API.  An entire SW dev needs to be dedicated to getting the API up to snuff for provisioning.  For all of maybe, two weeks.  4 if you make them go through a pile of approval process.  These are really cool features of the product that are a drag to implement because the provisioning time for each individual device or new interface can be literally 10 minutes, even setting all the custom properties with auto-scripts.

    Again, I'm reasonably sure this was a total waste of my total keystrokes in my life prior to getting carpal tunnel.. I shouldn't have wasted them on SW Product, if past experience is any sort of indicator.  I would be ecstatic to be wrong.

    Peter

  • Kudos!  Best, well written, resume I've seen in a long time too. ;-)  Carpal tunnel be damned, you made very valid points of discussion.

Comment Children
No Data