Open for Voting
over 1 year ago

Replace NPM Syslog & Trap system with Kiwi Syslog

For many years now the Sylsog and Trap system have been due to an overhaul.  At this point they are nothing more than an ugly scar on NPM which is an otherwise fantastic product.  Years ago SolarWinds squired Kiwi Syslog Server which handles both Sylsogs and Traps and is also a fantastic product.

I think the Syslog & Trap system that are currently in NPM should be permanently removed and replaced by Kiwi.  SolarWinds should just provide a free copy of Kiwi with every polling engine.  Of course then it would make sense to provide some out-of-the-box integration between Kiwi and NPM.

It seems to me this is a much easier way to solve the legacy Syslog and Trap system issue in NPM versus completely rebuilding it; no need to reinvent the wheel when you already have a perfectly good wheel!

  • Hi

    is it possible to see the part of the script that update Orion?

  • This is clever.  Nice job to tie the systems together.

  • I have two Kiwi server and 4 orion npm pollers.

      All syslogs and traps go to one of the kiwi servers then:

       A. all syslogs and traps are saved to a file for each node on the kiwi server. so we have a record of all syslogs and traps

       B. then if the syslog or trap matches a rule (is an major issue) it is forwarded to the orion system.

       Then at the Orion NPM server

          A. will use its rules to Tag the syslog or trap for a type(power, link, fault).

          B. if the syslog or trap is something that needs to have an alert it will run a script that will set or remove an alertcode in the custom property (alertcode) for the node.

                   The alertcode is in the format of XXXX:T

                          where XXXX is a code for the event. i.e. A000,A001

                              and :T  will be a time to live for this event. i.e. A001:10

                          So some events will have a code like A000 (this event will not have a time to live )

                            and some will have a code like A000:60 (this event will have a time to live of 60min )

                          And you can have more then one code by using a ,  i.e A000:33,A001:67

           C. The Orion system can set an Alert for the node by looking to see if the custom property (alertcode) contains the alertcode for this alert.

           D. And there is an alert in the orion system that will look for a , in the alertcode, and then run a script to do a down count of all codes that have a time to live.

    So with this set up I have:

       1. a log of all traps and syslog from a device.

       2. all syslogs and traps of significant are sent to the orion system so they can be seen on the orion web page.

       3. syslogs and traps that are critical can trigger a orion alert via the alertcode.

       4. if the alertcode has no time to live. it will be active unto it is cleared  by someone or a trap or syslog clears it.

       5. if the alertcode has a time to live the alert will be active unto the time to live expires, but each now trap or syslog for this event will reset the time to live.

    ***************************************************************************************************************************

    Here is what I would like to see.

    Have an action added to the kiwi syslog server to "Log to the orion Database".

    This way I could have some of the traps and syslogs be displayed in the orion system just like I have it now.

    And all the rules could be on the kiwi system, some traps and syslog would be just logged to a file, some would be displayed in orion and some would generate an orion alert ( via a script run on the kiwi server to set the alertcode.

    This would remove the syslog and trap load from the orion servers.

    I would only have one set of rules for syslogs and traps, where I have two sets now.

    *******************************************************************************************************************************

    And I think Solarwinds could easily do this by making this new action to Kiwi Syslog server.

  • I don't have anything I can share on this right now.

  • Yes...

    You are right it is better... i just don't see them do that...

    You still can join them in  the  "Message Center " if you like to see events and traps toghter in same view

    "Under the hood" Orion is still very SNMP POLL based software and you can't just

    change that over night.

    I just hope the PM will give some information about what they are planning to do?

    cobrien​ ?

    adatole​ ?

    That "real life networking"  some elements support only SNMP trap.

    traps/syslog  software that  can not indicate an alert to the main alert engine ...?

    it's embarrassing ...it's 100 times more important then every thing in that list.

    https://thwack.solarwinds.com/docs/DOC-176899#start=300

  • hallo, it would be better to integrate syslog and traps into NPM-Web, all the event's are saved in to a database, and can be related to each other....

  • kiwisyslog is great tool and I use just like Leon says !

    https://m.youtube.com/watch?v=8u61Faf6maI

    I need a kiwisyslog action simple CP change like TRAP Alert-YES/Active/yellow/red/2hot/flpas/nopower....whatever..

    should come as you forward trap from kiwisyslog to NPM...

    that will generate advanced alert.

    that will close 90% of the integration I have with trap and NPM

    adatole​ Is that more realistic approach to push traps to  advanced alert?

    without advanced  sql ?

  • I'm working on my employer to allow me to dedicate more time to getting more value out of our Orion deployment, too.  We're still woefully understaffed, and I can only dream of creating a new Systems Performance Strategist position to allow myself the luxury of learning every system and buying and deploying the appropriate Orion modules for it.

  • We are, in the grand scheme of things, a pretty small environment here and we use Kiwi. It is an amazing tool for trapping and logging. You can even use it for alerting. We have really only started toying with that aspect of it, but what I have figured out on it so far has been a great improvement. The best example so far is our EMR system is set to dump there for logging. It also runs routine scheduled database checks on each server at set intervals. We have a total of 7 servers and they each run the database check at a different day to space out the load. If the database check finishes with any errors, the Kiwi sees it and an email goes out instantly to the appropriate people so they can act on it right away. Instead of the method of trying to remember to go in and look at the individual servers to see what happened, if anything. It is buried in the system to view the status of the database check, and you have to go into each individual server to view it, so its a huge time saver having this setup. Took forever to get them to trust it, but when one of the database checks a few months came up with an error, we knew about it within seconds of the error, and we were able to phone it in and get it checked out. Normally this would not have been caught until the next day, at the very earliest. Now they want to expand on it, if only I had the time to dedicate to doing that right now lol.

  • I keep a legacy version of Kiwi running on a different server, and I configure my most critical infrastructure to send syslog/trap information to that server. 

    Then I leave an RDP session open to that resource and I dedicate a monitor 100% to it, 7x24.  From there I see every critical change that occurs to the network in real-time.  It gives me a big heads-up over traditional NPM alerts, the Help Desk, or users' notifications, and lets me see these items the instant they occur:

    • Firewall interface status changes
    • Core router interface changes
    • Distribution switch interface changes
    • EIGRP and BGP notifications
    • Power events affecting my core and distribution infrastructure
    • Port-channel and Virtual port-channel status changes
    • Speed/duplex changes on core and distribution interfaces
    • Link failures
    • Attempted console access by unauthorized users (includes failed password attempts)

    I'm very discriminating about what devices I allow to point to Kiwi, since all devices already report this info to NPM--but NPM's data is not as readily accessible as the dedicated real-time Kiwi screen.  No access-level devices report to Kiwi, which is why I can justify devoting a screen just for its use in reporting status of core and distribution resources.

    Better still, Kiwi is easy to customize to send custom alerts wherever they're needed.  If it receives a power event alert from anything, it forwards a message out to the teams that support electrical issues and power supplies and UPS's.  Temperature/environmental alerts go to a different group for evaluation and remediation--the Maintenance group loves the HVAC messages Kiwi forwards, since it allows them to catch what's fallen between the cracks.

    Kiwi's definitely a good tool--especially when your environment is big enough to justify it.  It's easy to deploy, simple to configure, and it just works without additional help or baby sitting.

    I've read newer versions of Kiwi can do even more, and are one solution, along with LEM.