5 Replies Latest reply on Feb 15, 2012 11:16 AM by storn

    NPM feature request:  outage mash-up / correlator

    jott1

      Hello -

      Has there ever been thought about creating a Thwack mash-up for Orion systems?  Here is the concept:

       

      Problem:  Service providers (Internet, MPLS, private-line, etc) don't always identify problems as quickly as Orion finds them.  It can sometimes take 1 hour to correlate your circuit outage with a carrier fault.

      Example:  Carrier A has a major CO issue in Chicago impacts hundreds of DS-3 circuits.

      Need:  Some way to know if other customers are also impacted or if this problem is unique to your company (in which case it might be your network).  Think of all of the time spent trying to determine if the issue is internal or external.  Wouldn't it be great to know that other customers in your area are also impacted?

      Solution:  Community!  Orion NPM immediately sends de-identified outage data (time, circuit type, carrier, zip-code) to a centralized Thwack server.  The Thwack server collects this info and returns matching event information back to Orion to determine a correlation score.  The correlation gets stronger as more and more Orion systems report in.  This can also provide benefit as circuits begin to come back.

      Opt-in:  Of course sending this data would be optional and might only be used for certain circuits in an organization.

       

      I'd appreciate any feedback in terms of adding this to Orion and any additional ideas on ways to leverage this capability.

       

      thx

        • Re: NPM feature request:  outage mash-up / correlator
          MTorok

          Hi Jott1,

          I am very interested in hearing how the community feels about this. I'd like to see if there would be concern over sending the stripped information and if it would be utilized. Sort of a Down for Everyone or Just Me type of service, correct?

          Cool idea.

          Thanks,

          Michael

            • Re: NPM feature request:  outage mash-up / correlator
              byrona

              At a high level I think this idea is absolute genius!

              Like everything, the devil is in the details.  I would like to see details on how this would technically be achieved as that would determine how accurate and useful the data is.

              The model I picture is basically SolarWinds using all the participating Orion installations around the world as a distributed monitoring system for service providers where SolarWinds aggregates and correlates all of the data and provides it back to Orion customers via some form of dashboard.

              I think this would need to be done via pre-configured pollers and alerts so that both the data being polled and the alerts being sent to SolarWinds was normalized.

              I also picture a new Orion resource that would essentially act as a dashboard for visualization of the data.

              If something like this were to get underway I would be more than happy to participate, our company is directly connected via several of the major service providers on west coast of the US.

            • Re: NPM feature request:  outage mash-up / correlator
              jott1

              You guys are getting the concept!

              The need for this occurred to me after undersea fiber cables were damaged by Pacific rim earthquakes a few years back.  At the time, our circuit latency to India jumped from 285ms to 500ms.  The carriers had very little information about what was going on and it was very difficult to understand the magnitude of the impact.

              The important point here is to leverage the Solarwinds community.  There are thousands of ambitious and creative network folks out there trying to solve problems.  This type of community wants to contribute (their data).

              Here are some additional points:

              - Orion Live?

              - access to Orion Live is part of your annual Solarwinds maintenance agreement

              - agreed that these would be pre-built alerts (or similar) to ensure standardized data

              - the alerts could be sent to Orion Live via HTTP or SMTP

              - the new Orion Live module would allow you to visualize the data via Google Maps

              - as a first step, Orion Live could be a combination of Orion alerting and an online visualization / correlation website...  the visualization / correlation portion could be integrated into Orion as a second phase

              - the Google Map APIs would also be used to calculate correlation based on the lat,long proximity to other events

              - other metrics could be added in the future (latency, loss, jitter)

              - Solarwinds / Thwack could leverage the data and issue quarterly performance reports (top carriers, worst zip codes, etc)