3 Replies Latest reply on Jul 22, 2016 6:41 AM by shaun_9999

    Is solarwinds often this messy?

    shaun_9999

      Or is it me?  Our infrastructure changes a lot through the day (SQL runs thorough the day, hospital sends through images/patient date in large batches, etc.) which means we receive a lot of alerts; maybe one every minute.  Which means that something has sent an email and then it's been placed in to an event/alert on solarwinds.

       

      I really enjoy the basis of monitoring solutions.  I had hardly any exposure with regards to monitoring applications in my last position.  The one I have used was SCOM 2008/2012 and from memory (this was about three years ago) we only received emails in the morning after jobs had finished running and then later in the morning if jobs had failed.  Because Vsphere/SCOM would monitor all of our other things that make a VM up - CPU, RAM, disks and networking.

       

      I understand that this is because of a different set of requirements, different environment and lots of other variables that change how many emails you would receive, but it just seems a minefield of alerts when something happens.  Is everyone else's solarwinds like this, or is it because the solarwinds I am using right now has been installed out of the box and my company requires a lot more honing to make it run correctly?

        • Re: Is solarwinds often this messy?
          shaun_9999

          Example - from this morning (00:00 am) we have received 60 emails (It's now 10.29) it's 650 minutes, which is one every 10 minutes (I am aware that's different from one every minute, but we received one at 7.39 about a hardware alert down, then 7.40 about a h/w alert up for the same node.) Wwould you not look at the alert if it's alerting?  Either the alert is redundant, in which case why is it alerting, or it needs investigating and at least acknowledging the alert.

          • Re: Is solarwinds often this messy?
            dmartzall

            You will usually want to tweak the thresholds to fit your environment some.  Has your instance been running for at least six months?  If so, you will be able to build a good baseline for how your environment performs on a daily basis.  If not, then you can start making small changes to bring the thresholds down.  You can also tweak if you get an alert or if it just changes on the dashboard.  Keep in mind while making these changes, you want to only reduce the non-actionable alerts.  Here would be an example of each:

            Threshold changes

            w/six months data - bandwidth utilization spikes to 95% every Sunday from 1 AM until 5 AM but with that exception it's usually around 60%

                 On the alerts section you can suppress the alerts for bandwidth on Sunday during that window and at the same time you can change the threshold for all other times to say 70%

            w/a few weeks data - if say the bandwidth utilization alert is set to 35% utilization and you see your norm during your sample period is 40%

                 you are not going to want to bump it all to 65% as that could see you missing a leading indicator

              • Re: Is solarwinds often this messy?
                shaun_9999

                This has been in place for maybe a year, possibly more.  I wanted to do this next week.  If I am being truthful I wanted to do start this today as my boss wants all green on the dashboard (this includes applications, CPU, HDD and Memory) and tried to explain that I had run a report that gives us a year/6 months of how a VM had been performing and that we can start tweaking the thresholds for each CPU and then manage the alerts so that they reflect the new thresholds that have been set.  I wanted to do a small amount of servers (only 50 that have the same corresponding peak values for the year) run that, see how the servers perform for a week and then start slowly changing the thresholds as you receive more information.

                 

                The questioning was 'using resources isn't bad for a server'  and then told to lave it for a future project.  I am certain right now I am still correct in my thinking and it has disgruntled me slightly.