Closed

When we first acquired Kiwi, we did a baseline performance and scale test between the existing Orion engine and the Kiwi engine and Orion actually performed better from this lens and point of view, so we decided not to do this embedding.

Replace NPM Syslog & Trap system with Kiwi Syslog

For many years now the Sylsog and Trap system have been due to an overhaul.  At this point they are nothing more than an ugly scar on NPM which is an otherwise fantastic product.  Years ago SolarWinds squired Kiwi Syslog Server which handles both Sylsogs and Traps and is also a fantastic product.

I think the Syslog & Trap system that are currently in NPM should be permanently removed and replaced by Kiwi.  SolarWinds should just provide a free copy of Kiwi with every polling engine.  Of course then it would make sense to provide some out-of-the-box integration between Kiwi and NPM.

It seems to me this is a much easier way to solve the legacy Syslog and Trap system issue in NPM versus completely rebuilding it; no need to reinvent the wheel when you already have a perfectly good wheel!

Parents
  • Your suggestion makes sense for large deployments, but perhaps less so for smaller ones.

    I got by just fine with the NPM Syslog & Trap systems included with NPM, as long as I was judicious in selecting the types of messages and traps I wanted to receive.

    However, if you have Wireless Controllers or firewalls that are set to debug or informational levels of syslogging or trapping, they can easily overwhelm NPM's syslog & trapping system.  When I found that issue I began sending the data to Splunk, which we thought had been sized correctly for our environment.  And Splunk was promptly swamped with the data; our vendor was required to provide us with a significantly larger solution due to poor discovery of our environment.

    What's the size of your network environment--how many nodes and elements do you monitor, byrona​?  How many firewalls and wireless controllers are in it?  What's the amount of data (in records per second or minute or hour) they're sending to your syslog and trap solutions?

  • rschroeder While I don't have the numbers immediately available I can tell you it's enough to overwhelm the system currently built in to NPM.  However, it's more than just that; it's also a functionality problem.  The system that currently exists in NPM is one of the oldest parts of the product and it isn't super flexible.  Kiwi is a much better and more flexible product.  By being able to remove another legacy component of Orion and provide better functionality to clients it really is a win/win.  Even if you don't have a load issue, Kiwi will give you better functionality.

  • I keep a legacy version of Kiwi running on a different server, and I configure my most critical infrastructure to send syslog/trap information to that server. 

    Then I leave an RDP session open to that resource and I dedicate a monitor 100% to it, 7x24.  From there I see every critical change that occurs to the network in real-time.  It gives me a big heads-up over traditional NPM alerts, the Help Desk, or users' notifications, and lets me see these items the instant they occur:

    • Firewall interface status changes
    • Core router interface changes
    • Distribution switch interface changes
    • EIGRP and BGP notifications
    • Power events affecting my core and distribution infrastructure
    • Port-channel and Virtual port-channel status changes
    • Speed/duplex changes on core and distribution interfaces
    • Link failures
    • Attempted console access by unauthorized users (includes failed password attempts)

    I'm very discriminating about what devices I allow to point to Kiwi, since all devices already report this info to NPM--but NPM's data is not as readily accessible as the dedicated real-time Kiwi screen.  No access-level devices report to Kiwi, which is why I can justify devoting a screen just for its use in reporting status of core and distribution resources.

    Better still, Kiwi is easy to customize to send custom alerts wherever they're needed.  If it receives a power event alert from anything, it forwards a message out to the teams that support electrical issues and power supplies and UPS's.  Temperature/environmental alerts go to a different group for evaluation and remediation--the Maintenance group loves the HVAC messages Kiwi forwards, since it allows them to catch what's fallen between the cracks.

    Kiwi's definitely a good tool--especially when your environment is big enough to justify it.  It's easy to deploy, simple to configure, and it just works without additional help or baby sitting.

    I've read newer versions of Kiwi can do even more, and are one solution, along with LEM.

  • We are, in the grand scheme of things, a pretty small environment here and we use Kiwi. It is an amazing tool for trapping and logging. You can even use it for alerting. We have really only started toying with that aspect of it, but what I have figured out on it so far has been a great improvement. The best example so far is our EMR system is set to dump there for logging. It also runs routine scheduled database checks on each server at set intervals. We have a total of 7 servers and they each run the database check at a different day to space out the load. If the database check finishes with any errors, the Kiwi sees it and an email goes out instantly to the appropriate people so they can act on it right away. Instead of the method of trying to remember to go in and look at the individual servers to see what happened, if anything. It is buried in the system to view the status of the database check, and you have to go into each individual server to view it, so its a huge time saver having this setup. Took forever to get them to trust it, but when one of the database checks a few months came up with an error, we knew about it within seconds of the error, and we were able to phone it in and get it checked out. Normally this would not have been caught until the next day, at the very earliest. Now they want to expand on it, if only I had the time to dedicate to doing that right now lol.

Comment
  • We are, in the grand scheme of things, a pretty small environment here and we use Kiwi. It is an amazing tool for trapping and logging. You can even use it for alerting. We have really only started toying with that aspect of it, but what I have figured out on it so far has been a great improvement. The best example so far is our EMR system is set to dump there for logging. It also runs routine scheduled database checks on each server at set intervals. We have a total of 7 servers and they each run the database check at a different day to space out the load. If the database check finishes with any errors, the Kiwi sees it and an email goes out instantly to the appropriate people so they can act on it right away. Instead of the method of trying to remember to go in and look at the individual servers to see what happened, if anything. It is buried in the system to view the status of the database check, and you have to go into each individual server to view it, so its a huge time saver having this setup. Took forever to get them to trust it, but when one of the database checks a few months came up with an error, we knew about it within seconds of the error, and we were able to phone it in and get it checked out. Normally this would not have been caught until the next day, at the very earliest. Now they want to expand on it, if only I had the time to dedicate to doing that right now lol.

Children
  • I'm working on my employer to allow me to dedicate more time to getting more value out of our Orion deployment, too.  We're still woefully understaffed, and I can only dream of creating a new Systems Performance Strategist position to allow myself the luxury of learning every system and buying and deploying the appropriate Orion modules for it.