16 Replies Latest reply: May 25, 2012 12:58 PM by familyofcrowes RSS

Proactive Monitoring is the Future

chriswahl

With the vast amount of objects that must be monitored and tracked in a modern server environment, sometimes it can be harder to simply understand the data than it is to gather it. If you've ever been bombarded by hundreds or thousands of alert generated emails (I bet some of you are shaking your head right now), I'm sure you know what I mean. In essence: you have the data, but now what?

 

I think the current technology shift is to add a layer of intelligence with computer aided analytics and algorithms, with the goal of achieving order out of the chaos. However, this seems like a stepping stone along the path of achieving highly accurate proactive monitoring, such as predicting traffic flows, failure rates, and capacity limitations. This would effectively slot reactive monitoring to more "break / fix" types activities (someone accidently unplugged the cord).

 

Where do you see the state of monitoring headed, along with both proactive and reactive monitoring?

 
  • Re: Proactive Monitoring is the Future
    oljokine

    Hi Chris,

     

    Nice to see that there are actually other people thinking these as well =)

     

    I have been implementing tooling for this for two years now and this simply isn't the topic at the moment in IT, sadly. I think that especially because of the virtualization and cloud technologies the role of IT is changing from reactive to proactive, it simply has to.

    Since reactive incidents (hw failures etc) are becoming more rare but also more drastic the eye should be on the cost/perfomance items eg the key is to offer sufficient perfomance at the cost. It doesn't have to be the best available performance as it is in many cases today but "sufficient". Modern IT-department can be seen as a normal "production" part of the company that optimizes it's performance based on needs and cost factors given. This leads to monitoring and TRENDING user activity and capacity in in general to meet the expectations set. User activity in terms of numbers and profile (heavy, light usage)  for the service and this information trended gives us the information for capacity handling, increase or decrease. Unfortunately in cloud based services this is not possible currentl and probably on purpose by the vendors. It's normal to pay the same monthly fee for all users regardless of the usage profile. Of course there are many more factors but in general this is the case how I see the situation...

    • Re: Proactive Monitoring is the Future
      fcaron

      We have thoughts along the same lines (see this similar thread, about networks, but completely applicable to apps and servers).

      I would love to hear your opinion about trending and baselining and which one seems the most important to you (see link above for what I mean by trending and baselining):

      - Trending is more a capacity planning and problem anticipation analytic. Typically what users call proactive monitoring. Usually mid to long term (30-90 days) prediction to anticipate problem (lack of capacity) or need to spend budget to keep capacity current with demand.

      - Baselining is more an operational-type analytic, which can pinpoint in a few minutes an unsualy behavior (e.g. CPU for this serve is currently not what it typically is for this hour of the day and day of the week). The manager should look at the list of processes running and see whether there is an abnormal thing happening.

       


      • Re: Proactive Monitoring is the Future
        oljokine

        Excellent Point you have there.

        I actually hadn't thought this way before and you are quite right to separate these topics; I can give a couple of examples of both on our side.

        The base monitoring platform is not Solarwinds but a rival product on which we have produced our own plugins for various corporate applications and platforms.

        Trending:

        In general we calculate 1, 3 and 6 months data to produce an estimation to 1, 3 and 6 month forecasts. We also state the current value and the "trend based" current values eg what they should have been in each case. This seems to work quite ok but obviously takes time to be fully enabled.

        Some (real life) Capacity management examples here: VMWare resourcing, Backup time and size management, Lync Conference online times (enables CBA to see whether Lync online makes sense in terms of costs), Sharepoint document usage and search (to ensure document sharing is done via SPPS, not email), Project Management; In SCCM we monitor a separate collection of Corporate XP's (declining as gets replaced by Windows 7) to estimate when project is finished (number comes to zero). We also provide trending to some ISVs providing their SaaS platform in Amazon to see their capacity needs the same way. We also "connect back" the actual forecast values to be able to see the history of forecasting.

        Baselining;

        In general we use day values 1, 7 and 30 the same way as in trending. We use this currently in the menioned SaaS ISV case to ensure that their code (which is updated every week!) is quality and doesn't cause any performance issues. Frankly we don't have too many applications for this atm.

         

        Hope this raises some debate and ideas to share....

      • Re: Proactive Monitoring is the Future
        chriswahl

        Baselining seems a bit more difficult to achieve, as it is a moving target that requires a lot of data to become remotely accurate. I think this holds especially true for corporations that have sporadic or seasonal usage patterns, service providers, or other "spikey" use cases.

         

        Trending definitely seems more handy, and the current focus, for a lot of shops. Getting a reasonably good guess on when capacity will be used up in X days and Y changes occur is quite handy for reporting up the chain.

        • Re: Proactive Monitoring is the Future
          fcaron

          I mostly agree with you, chriswahl.

          My only comment is that trends needs seasonnality as well to be accurate. Same as baseline. And same for accumulating enough data to be meaningful

          But I agree that the "spikey" behavior is difficult to take into account in baseline (it's irrelevant in the trending which tends to use averages or even better percentile), which tends to flood you with alerts (or don't send you enough, if any)

          The calibration is the tricky part in the baselining logic.

           

          As far as the trending, what logic do you see being required? Is linear regression sufficient?

           

          BTW, we have trending today (just go to a line chart and extend the reporting date to "the future").

          The points I ws making above was related to some improvements that we have in ming, about the current trending.

          Can you elaborate on what improvements to the current Trending, you have in mind?

          Oljokine has given some examples, anything specific in mind Chriswahl?

          • Re: Proactive Monitoring is the Future
            chriswahl

            Most helpful to me are the What-If type scenarios that can be played out for a trending / futures analysis. That's probably the most prevalent use case that I see the C-Level execs ask for, essentially "how much more stuff can I cram in without expending capital" or a lines of business asking for "X number more of Y virtual machine / database / whatever".

            • Re: Proactive Monitoring is the Future
              fcaron

              makes a lot of sense

            • Re: Proactive Monitoring is the Future
              familyofcrowes

              I have a "Metrics" meeting in 10 minutes to discuss this exact thing.   They want me to automate it and provide a presentation all from Orion.  We currently use excel and pull 95th percentile data and aggregate monthly.

               

              It's time consuming, but the CIO and CTO require this presentation quarterly.  So I would be a hero if I can make it so the appsguys, server guys, and network guys can stop worrying about thier metrics and just look at the presentation....

               

              Starting June 1, after I install LEM and UDT, that is my #1 project....

        • Re: Proactive Monitoring is the Future
          oljokine

          a couple of comments to this;

           

          I have found that having those three different trend bases gives you a pretty good view what's happening. If all lines are parallel it's obvious that the change is constant if not it may be due to seasonal change.

           

          Building alerting might be a tricky task =) maybe conditional of all trends could be a solution.