61 Replies Latest reply on Aug 29, 2013 7:36 PM by superfly99

    How to Avoid "Monitoring Spam"

    SomeClown

      Large and small companies alike have it easy when it comes to network monitoring. Large companies can afford the biggest and best solution available, and an army of people to monitor every little twitch in the network.

       

      The Long Game.png

       

      Small companies, on the other hand, either don't monitor at all (this is the much-vaunted, "Hey… the Internet is down" approach--also called "Canary" monitoring), or only have a very small number of devices to keep an eye on.  What happens if you're in the middle somewhere?  What happens if you're big enough to have a staff, but not big enough to have one dedicated person assigned to sitting around watching for failures?

       

      If you are like me, you buy a great monitoring solution like SolarWinds's Network Performance Monitor (NPM). You do the initial installation, run through some wizards to get some basic monitoring up and going, then you start playing with the "nerd knobs" in the software, and boy does NPM have that in spades. NPM monitors everything from port states to power supplies, link errors to wireless authentications, and everything in between.  You can then send alerts to a distribution list that your team tracks, assign responsibilities for monitoring and escalation, and everyone on the team now has visibility into failures and problems anywhere on the network.

       

      Nirvana.  Bliss.  You have now become the Network Whisperer, a beacon of light in the darkness.  Except, now you have 150 email messages about a printer because that printer has been offline for a few hours waiting for a part and someone forgot to kill the monitoring for that device.  Easily fixed.  Ooh… 300 more emails… someone rebooted a server.

       

      I exaggerate a bit, but you see the problem.  Pretty soon you set up mail rules, you shove your monitoring messages into a folder, and your entire monitoring solution is reduced to an in-house spam-generation machine.  You find you don't actually know when stuff breaks because you're so used to the mail folder filling up, you ignore it all.  The only thing you've accomplished is the creation of another after-the-fact analysis tool.  Someone from accounting tells you that so-and-so system is down, you can quickly see that, yes, it is down.  Well, go you.

       

      I'll talk about how we work to solve this problem in my next post, but I'm curious how everyone here deals with these very real issues:

       

      * What do you monitor and why?

      * How do you avoid the "monitoring-spam" problem?

        • Re: How to Avoid "Monitoring Spam"
          RandyBrown

          2nd question first:  How to avoid the "monitoring-spam" problem?

          We only setup alerts (email, text message, flashing lights and sounds, etc.) for the events that really are worthy of that type of notification.  We also make sure that we've set the duration for the event prior to the alert actually triggering to be something realistic.  We don't setup alerts for things like high CPU or memory utilization unless there is something very specific that we are looking for ... because these are regular occurring events and are considered normal in our environment (SQL and Exchange servers regularly consume most of the available memory so it would be pointless to alert on high CPU utilization for these servers). That said, we do still have some "monitoring-spam" but we work to keep it to a minimum by constantly adjusting alerts as needed.

           

          Back to the first question:  What do we monitor and why?

          We monitor:

          • Server/device up/down status (ICMP)
          • Disk space (based on custom properties set for volumes - we do not alert based on percentages)
          • Hardware component failures (fans, disk drives, power supplies, array controller batteries, etc.)
          • High datacenter temperature
          • HTTP failures for some specific web applications that we have in our environment
          • Application specific service (using SAM) failures
          • UPS output load
          • SSL certificates nearing expiration

           

          We try to keep it relatively simple and only monitor on the things that are absolutely imperative that we know about right away.  A lot of other things are monitored and logged to the Orion event log for us to find if we are trying to track down a problem that we did not receive an alert about.

          • Re: How to Avoid "Monitoring Spam"
            bsciencefiction.tv

            1.     Network / Server Up down

                    Disk Space either Percentage or specific size based on criteria

                    Router Temperature

                    Url

                    PDU voltage and amperage

                    Deep application monitoring (    for example we give our app dev team a real time break down of what smart device, os, and os version is hitting our mobile web portal and what the first click is   )

             

            2.      Work with the LOB's to make sure we are only monitoring valuable data.

                     Clear, well defined rules for both alerting and alert actions.

            • Re: How to Avoid "Monitoring Spam"
              ecklerwr1

              A couple items I've had to deal with to keep the event stream from being flooded:

               

              NetApp snmpd agent (some versions) constantly change the volume id around on NAS's with many volumes.  This creates hundreds or thousands, if you have as many as I do, of volume changed messages, volume appeared, volume disappeared.  (I've had to manually edit the table in the database to not log these events to database and likewise not put them in my event stream)

               

              I've noticed that thresholds that might be set to defaults for physical servers... DON'T apply to virtual servers.  This means virtualized SQL servers and some others end up having most of the thresholds in SAM being wrong out of the box.

               

              The new hardware monitoring is nice... but some devices for some reason will pump out hardware up green messages ALL DAY LONG every polling cycle.  For devices that do this I turn hardware monitoring OFF.

               

              These are just three examples of my worst offenders which if allowed will Generate so many events that you can't see the forest for the trees. <--- This is here for a reason

                • Re: How to Avoid "Monitoring Spam"
                  SomeClown

                  I have seen the virtual vs. physical monitoring problems here as well.  It makes sense that you look for different things, but out of the box the virtual settings don't tend to work as well as you might think.  Or, they seem to be somewhat counter-intuitive (what are you monitoring when you monitor "physical memory" for instance).  Good points.

                • Re: How to Avoid "Monitoring Spam"
                  bsciencefiction.tv

                  Also kudos for the Satellite 5 reference to monitoring.

                  • Re: How to Avoid "Monitoring Spam"
                    rgeist

                    We monitor servers, switches, routers, access points, virtual machines, and more.  Depending on how importing said nodes are, we vary our monitoring of other things like disk usage.  All of them get at least up/down.  Some of them get their interfaces monitored.

                     

                    Right now we have other means of alerting and aren't using alerts except for a few very important things.  We are working on dependencies so we can configure alerts to be reasonable then configure notifications from there.  If we didn't do it that way, our notifications would be the most annoying, un-useful tornado of emails ever.

                      • Re: How to Avoid "Monitoring Spam"
                        SomeClown

                        Yup.  We also have "other" monitoring solutions (some home-grown, some off-the-shelf), mostly in cases where the functionality is needed from one product and just doesn't exist in another.  For instance, we log everything to syslog and Splunk for sorting and after-the-fact analysis, but don't alert from there.  It gets even more important to control the monitoring-spam when you get multiple systems all wanting to churn out alerts.

                         

                        Thanks for the reply.

                      • Re: How to Avoid "Monitoring Spam"
                        zackm

                        We monitor everything. For network gear, we have a specific template per device (i.e.; we only monitor certain interfaces per router, switch, firewall, etc) that we poll with SNMP. For servers and storage arrays, we only use ICMP monitoring as we use a different solution for in-depth server monitoring.

                         

                        As far as how to avoid the flood; we only alert on items after XX polling cycles and we have a 24/7/365 alert management team that parses through alerts from our entire enterprise and assigns them to support queues as needed. (I guess we fall into the "really big company" category)

                          • Re: How to Avoid "Monitoring Spam"
                            SomeClown

                            Did it take some trial-and-error to find the right number of poll cycles to wait before alerting?  Do you find that "down" events are slower to alert because of this, or does the team catch the real problems quickly enough?

                              • Re: How to Avoid "Monitoring Spam"
                                zackm

                                My team specifically handles the architecture, design, and maintenance of SolarWinds NPM and a few other monitoring tools. As such, the alert definitions were created with the guidance of the network architecture team. As a company, we provide managed hosting solutions, among a lot of other things, so our monitoring is heavily focused on SLAs and metrics around time to remediation for events.

                                 

                                Our alerts are what I would consider "living", in that they are always available for change at the discretion of the customer(s). However, our basic system of alerting can have an engineer looking at an issue within 5 minutes of an outage (which is pretty good for tens of thousands of devices). There are a few false positives, but there are "safeties" built into our systems and processes that dramatically decrease returning alerts on servers that are not in production or a switch that was decommissioned, etc.

                            • Re: How to Avoid "Monitoring Spam"
                              rharland2012

                              We monitor everything related to our deliverables - routers, switches, all servers P and V, hosts for those servers, storage, UPS, cooling, application performance for apps, SQL, Oracle, etc., etc.

                               

                              We avoid spam thusly - we set up our NPM environment a year ago, and I was the sole recipient of email alerting during our pilot. I got it to a level I thought appropriate and added my boss and our help desk manager, and let that cook for a few days, after which we tuned cycles before notification, etc., as well as the appropriate granularity of our polling. It's an ongoing thing and will continue to be. When our partners in EMEA and Asia asked me to add tons of things, I performed the same process in an accelerated manner, with the latency add of overseas transit factored in. Also, dependencies are great and make things elegant.

                              • Re: How to Avoid "Monitoring Spam"
                                Aforsythe

                                I pay attention to every alert that gets e-mailed. It can be a little tedious to tweak thresholds and dependencies, especially if the tool you are using doesn't correlate for you, but it's possible with the right set of tools. It's just as easy to get lost in the spam as it is to collect the infinite amount of data you'll never use. And it's just as easy to narrow down the spam as it is to aggregate older data, you just need to know what you're looking at and what you want to get out of it.

                                 

                                The biggest problem I see most people struggling to deal with is getting started. When you're talking about some network devices that can send out 1000's of messages per hour and you have dozens of them. Or servers, we all know how chatty the event logs can be, then you've got all of the different monitors, it can seriously get overwhelming very quickly when you're first implementing a monitoring solution, or heck even after you've had it for a while.

                                 

                                 

                                I tell people to take it slow and just remember, it all starts with these two questions for each alert and once the ball gets rolling you'll get behind it.

                                 

                                1. Do I need to know about this?

                                2. Do I need to know about this at 2am?

                                • Re: How to Avoid "Monitoring Spam"
                                  byrona

                                  I think this is a thin line to walk.  We are a service provider so we need to monitor a lot of things on a lot of gear as part of the service that we provide.  What I have found is that if you aren't getting any spam from the monitoring system then you probably aren't monitoring enough and you are probably missing things.  If you get too much spam your NOC guys start ignoring everything assuming its spam.  If your environment is a good size and has any rate of change then keeping the monitoring system on that thin line is a constant effort no matter what software you are using and that needs to be considered and expected as part of the solution.

                                    • Re: How to Avoid "Monitoring Spam"
                                      SomeClown

                                      Yeah, and I purposely avoided drawing too many conclusions from the service provider world.  You've got a whole different set of issues (SLA, multi-tenant, etc.) to deal with on that side of the fence.  I can imagine that monitoring there is not only important in the same way as in the enterprise, but also for billing purposes.

                                    • Re: How to Avoid "Monitoring Spam"
                                      andrethegiant

                                      Monitoring need a deep knowledge base and a really good cross-relations.

                                      Users need services, but operatars and technical people need to know which system is affected.

                                      IMHO for each environment you need specific tools (or specific "adapters") in order to provide a better monitorability.

                                      • Re: How to Avoid "Monitoring Spam"
                                        lwpeters

                                        We do not really have an overreaching group that monitors all alerts.  We have isolated our alerting to effected admins.  We have worked with each admin, or group, to determine what alerts are useful.

                                        • Re: How to Avoid "Monitoring Spam"
                                          Kurt H

                                          We monitor everything that is going on with our network except for desktop computers. All servers, Network equipment, infrastructure, power, etc. are monitored on a constant basis. We have alerts sent out when circuits go down,, servers go down, or any componenet within devices has a problem. The alerts are sent to specific groups that deal with the particular item in question. We are not a big organization, but with a lot of the SolarWind tools that we have, we are able to monitor like a big company or organization.

                                          • Re: How to Avoid "Monitoring Spam"
                                            wluther

                                            we monitor everything... as in, we monitor the monitor that monitors the monitor that is monitoring those monitors...

                                            but we do not alert on everything...

                                             

                                            we are just now getting around to cleaning things up, removing interfaces/nodes as they alarm.

                                             

                                             

                                            as for the spam problem... well, at least its not for cheap online prescriptions...

                                            • Re: How to Avoid "Monitoring Spam"
                                              cahunt

                                              We do not monitor Access Switch Interfaces; other than uplinks to our distribution boxes. Any new alert gets just me in the email line until the alert is fine tuned. Some alerts are for devices that other departments use and they are plugged into the network so just they and me of course will get the alert.  Network related alerts go to a lot of people, all the way up (they like to know). But remedial items; like UPS batteries and some redundant hardware items go to the Operations team to be handled. We use the heck out of UnDP's for status.

                                              Implementation of trap and syslog alerts has been ramped up to cover all ends of a situation. At the same time; triggers in the syslog highlight specific events on nodes that trigger alerts to just me and a few folks. Sometimes a really granular syslog or trap alert helps me to maintain the 'Big Brother' status and keeps people looking over their shoulder.

                                                • Re: How to Avoid "Monitoring Spam"
                                                  SomeClown

                                                  Not a bad strategy at all: moving the alerts out of "IT" at the macro level and compartmentalizing.  Do you do that to the level where folks outside of IT (say, the financial controller) get alerts?  ERP system has a problem, for instance, so the finance team gets a courtesy notice as well as the server team?

                                                    • Re: How to Avoid "Monitoring Spam"
                                                      cahunt

                                                      It is usually a specific request; or someone complaining about the connectivity for their devices that may communicate back to a server that they use for status. When it does't work seeing an email from Orion saying it is not working keeps them informed. Be aware!!! As this does NOT prevent phone calls, it causes more; and emails to. In one case, we have key lock boxes that connect to the network and are managed (PW and key status) through a software. Cheap Chinese electronics caused the devices to dump the tcp/ip stacks on a large multicast network. Even once isolated; the devices have issues if you do not restart the box and plug it in and reconnect at the server in the 'correct' way. So they got a lot of disconnects, but now (after several months of calls and emails) they understand the alert; and know to check their systems if a disconnect shows and my alert did not trigger. Now i do not get hardly any calls unless a box moves, or a new one goes in (but it took several months of taking those calls and re explaining to get there).

                                                       

                                                      Another case is our Telehealth group; they want monitoring on projectors and the digital control consoles for each room.... that too, will be a specific alert to their group, and CC me and my partner in crime so we are informed.  Alerts may be a projector bulb has XXXXXXXX life hours, it needs to be replaced. OR as serious as Room Control Console XXX in Room ABC.1234 is not connected (or unreachable) ...  of course with this type of stuff it depends how robust the mib table is and how much information I can give them; more than up/down.

                                                       

                                                      Even more so, access layer type stuff goes to our operations team mainly, where as a Distribution or Core switch issue will go to everyone (which includes the engineers).

                                                       

                                                      I have another NOC-Type view that highlights how we seperate these items; using 2 (Two) AjaX views...I will see if i can post that on the NOC view request page before this time tomorrow.

                                                      Just an addendum to watching green change to yellow and red and know whether or not there is an alert that you or the Network team should even care about.

                                                       

                                                      I am working with our Rx Group to fill in the app and device monitoring that our 'Other' Alerting system run by the DC people really isn't watching.. also the key points of contact and a proper, non cryptic email suits the suits better than a bunch of #'s and characters that even our team has to decode. But of course Most of the Rx alerts will be going to that group.

                                                       

                                                      **** With this, on a critical system, i will create an alert for our Help/Service Desk with mild information that they can use to inform callers about an outage seconds to minutes after it happens.  * This gives us an extra minute or two to get that Network Event Notification email out to all the groups; which has the full outage and tech deplaoyment to fix the issue type info.

                                                  • Re: How to Avoid "Monitoring Spam"
                                                    michael stump

                                                    Initially, I like to have NPM discover EVERYTHING. When I'm trying to wrap my brain around a new infrastructure, I need to know what's out there. Servers, switches, storage, hypervisors, even desktops and printers. This way I get a sense of scale.

                                                     

                                                    Desktops are the first to go, though. I don't want to know how many times the visitor center PC reboots in a day. Printers usually go, too. I'll let them send traps if they run into trouble, but I certainly don't need to monitor physical memory utilization of a printer on a constant basis. In the end, it's routers, switches, servers, and storage that I care about.

                                                     

                                                    On monitoring-spam: I agree with earlier posts that tuning your thresholds is a great way to reduce duplicate alerts from any NMS. A 5 second spike in CPU shouldn't trip any alarms. But 95% cpu utilization for 15 minutes might warrant some diagnostics. In these cases, I let NPM run with the defaults for a month or two to establish a baseline for performance, then go back and start tweaking alert thresholds.

                                                    • Re: How to Avoid "Monitoring Spam"
                                                      wbrown

                                                      Our approach sounds pretty much the same as that mentioned by RandyBrown above.

                                                      We only configure alerts for items we want to get an alert for, and that alert is configured to send to the group that wants it.

                                                      The first thing we did when we started configuring alerts was to disable all of them.

                                                       

                                                      There are some items we want to know about but we don't want an alert generated.  For these items we have a report that is available for review every morning.  This report is a link on our default summary view.

                                                      • Re: How to Avoid "Monitoring Spam"
                                                        jgherbert

                                                        Interestingly (for me at least), I was reading a detailed postmortem of an outage recently - a link to which I have been totally unable to find now so I could share it - where one of the extenders for the outage was that the manager was receiving alert emails (hundreds of them as I recall) saying there was a problem, but had set up a rule that pushed all the alerts into a folder, and he therefore did not notice them until it was already many hours past when he needed to know...

                                                        • Re: How to Avoid "Monitoring Spam"
                                                          Network_Guru

                                                          Monitoring is the easy part, alerting efficiently is difficult.

                                                          We use the "knee-jerk" approach to setting up alerts which takes months and sometimes years to alert on all critical outages.

                                                          After setting basic alerts, we wait until there is some critical degradation or outage.

                                                          When management asks why they weren't alerted, and a new custom alert is born.

                                                           

                                                          Also here are two methods used to control who gets spammed.

                                                           

                                                          1. Send to custom Outlook distribution lists. When someone goes on vacation, they can be removed from the DL and re-added when they return,
                                                            rather than manually removing & re-adding them from multiple Orion alerts.
                                                          2. Use an Orion Custom Property to determine users/DLs receiving e-mails for each device. I have 3 Custom Properties:
                                                            E_mail_alert_To_addr, E_mail_alert_CC_addr, E_mail_alert_BCC_addr
                                                            I just have to add these variables in the alert to use this property for the recipients:
                                                            ${E_mail_alert_To_addr}
                                                            ${E_mail_alert_CC_addr}
                                                            ${E_mail_alert_BCC_addr}


                                                          This allows users to add/remove themselves from alerts using the edit/manage nodes GUI in Orion (self serve).

                                                          • Re: How to Avoid "Monitoring Spam"
                                                            802jr

                                                            At a previous job we monitored every system just for Up/Down status. Now the IT staff, we were used to the alerts and if (more like when) something went down we knew right away if ti was a false alert, waited a couple of minutes and received the " back online alert." Our problem was not the system periodically spamming us with false positives, the micro manager boss who also want the alerts for every division within IT would constantly be asking ask to look at different systems that were other divisions issues. What we really need was Spam filter for him. If we changed the setting on what alerts he received and did not get an alert someone else got he would blow a gasket.

                                                             

                                                            Dividing who gets what alerts is an essential part of managing the IT beast. Staying within the bounderies that are assigned to each division within IT I also an important of avoiding the monitoring Spam.

                                                              • Re: How to Avoid "Monitoring Spam"
                                                                SomeClown

                                                                Yeah, the boss filter can be a problem depending on the structure and size of the organization.

                                                                 

                                                                I'm definitely a fan of more granularity than just up/down in monitoring, but every organization is going to be different in that regard.  Sounds like you had a system for everything *but* the boss.  Maybe that's a good feature request. 

                                                              • Re: How to Avoid "Monitoring Spam"
                                                                mcam

                                                                We have recently switched our monitoring toolset to Solarwinds and are hoping to use the trending to help us adjust our alerting to more closely match our environment.

                                                                We are also going to try out Alert Central to tweak the outbound alerts.

                                                                 

                                                                There are also a couple of specific monitoring apps we have that insist on using alert storms based on their polling - most annoying.

                                                                 

                                                                Love the Satellite5 reference - not enough "Who" around here

                                                                • Re: How to Avoid "Monitoring Spam"
                                                                  jgherbert

                                                                  One thing I have seen (with varying degrees of success) are event aggregators that attempt to do root cause analysis and suppress downstream events.

                                                                   

                                                                  e.g. if you lose a hub WAN link and are thus unable to reach 10 sites over that connection, you don't really want to receive the notifications that the WAN is down and 10 sites are also apparently down; better to receive a single notification saying that the WAN is down, and this is affecting reachability to 10 sites. Similar capabilities were tied in to the NOC systems so that when a particular element failed, the NOC would know which customers were impacted, and thus could proactively notify them of the issue.

                                                                   

                                                                  There are a number of approaches to automated root cause out there, and it can be complex, but if you can get it right and tie it into other data sources to pull added intelligence for your alerts, you can make great steps towards minimizing the number of alerts hitting the users who get notified.

                                                                    • Re: How to Avoid "Monitoring Spam"
                                                                      SomeClown

                                                                      That's a really good idea for a feature addition to NPM.  Create sites, tie the sites together, create rules around site alerting.  I would think it might be easy to define a "site" structure and have all alerting inside that entity follow a set of rules.

                                                                       

                                                                      Mostly I'm thinking out loud here as to what would be useful to the customer base of Solarwinds.  A lot of the really cool features and things like you describe above are relegated to the larger enterprises who can afford to roll custom solutions, or programmatically enhance existing extensible products.  Solarwinds could add some great functionality and saleability to NPM by giving some of that to everyone.

                                                                        • Re: How to Avoid "Monitoring Spam"
                                                                          jgherbert

                                                                          Absolutely. I don't know how much capability SW NPM has in terms of grouping alerts or rolling them up in some way, so I threw it out there anyway. Even on a site basis though, if you can't reach the WAN router for a site (e.g. you know the link has gone down), it would be really neat if you could tell NPM that (some list of monitored objects) all sit behind that single WAN router, and if the router's down, implicitly everything behind it will be too, and to generate one big alert rather than 101 little ones for the 100 things behind the router.

                                                                           

                                                                          Once you go beyond that level, or you have multiple paths to a site, it gets much more complex - now you have to check if, say, BOTH WAN routers are down, and only then suppress alerts behind that failed 'edge'. Tricky, and hard to maintain manually over time.

                                                                            • Re: How to Avoid "Monitoring Spam"
                                                                              rharland2012

                                                                              Dependencies 'sort of' produce this behavior (one alert instead of a ton when a site gateway/router becomes unresponsive), but maybe we're talking aobut a different kind of desired config/behavior here.

                                                                              The key word is indeed that these things need to be done manually. For example, you've got dual WAN routers and build your dependencies by putting the two routers in their own little group and building the dependency on the availability of the group...great. If you have to do that a thousand times, though, I can see where it gets chunk-style over time.

                                                                                • Re: How to Avoid "Monitoring Spam"
                                                                                  jgherbert

                                                                                  Great info, rharland2012 and thank you for sharing it! It's good to know there's at least some level of ability to wrap things up groups for example. I agree though about the issue of manual configuration and the pain that goes with that. For example, one place I worked tied in to a database that tracked all the devices that major applications should be traversing, and would alert not only on a fault, but could also tell you which applications might be adversely impacted by that failure. The problem was that every routing change then required you to check that paths had not changed in that database, and to maintain them as the architecture expanded. Great idea, but tiresome to keep accurate...

                                                                                  • Re: How to Avoid "Monitoring Spam"
                                                                                    cahunt

                                                                                    Dependencies are great, but if you are like us, then Distribution, Core and Routers have a higer capacity UPS than our access layer. So power goes out and the access layer fails first due to the UPS batteries being depleted. It still fires those Access layer alerts (mainly node down in this case) and then triggers the Uplink/Interface Errors from the Distro.

                                                                                    And if the distro is in the same room; wait a few minutes and that node down alert will go off.

                                                                                      • Re: How to Avoid "Monitoring Spam"
                                                                                        SomeClown

                                                                                        All of our datacenters are fully redundant for power, cooling, etc., so that's not much of an issue.  If all outside power fails, the batteries hold for the 15 seconds or so that it takes the gentrans to kick power over to the generator.  That said, I can see where different environments aren't going to be as resilient and choices have to be made.

                                                                                          • Re: How to Avoid "Monitoring Spam"
                                                                                            cahunt

                                                                                            Servers in the DC. We are so big that not every building has it's own BU Gen. Access layer interface monitoring only happens in special cases for statistics, or a setup where they have a PC running a robot, or machine or other critical device(But this is monitoring, I rarely setup alerts for access interfaces). Our naming convention, and Hierarchy allows for interface alerting on specific devices very easily and leaves the Access layer off. Access layer changes so dang much in our case, it would be useless and generate too many questions about triggered alerts, and things showing red. Most my monitoring occurs outside our Data Centers; but the tool those guys use is so dang cryptic that I get requests from other department service providers to give them something understandable.  DC's have full power TO THE MAX!.... power issues reside in our MDR/IDR's mainly and we are underway to get better power redundancy in place. Most places have normal and an Emergency power setup....most.

                                                                                          • Re: How to Avoid "Monitoring Spam"
                                                                                            jgherbert

                                                                                            *laughs* I can see the challenge there. I hope your Orion server isn't in the access layer though! ;-)

                                                                                  • Re: How to Avoid "Monitoring Spam"
                                                                                    Network_Guru

                                                                                    Seems like we are kind of flogging a dead horse when it comes to suppression of alerts,
                                                                                    This was discussed 4 years ago here.

                                                                                    Now that NPM supports groups, this should be easier to do.

                                                                                    • Re: How to Avoid "Monitoring Spam"
                                                                                      Network_Guru

                                                                                      Just to complete this discussion here's a link to a document discussing Groups and Dependencies:
                                                                                      http://www.solarwinds.com/documentation/Orion/docs/Groupsanddependencies.pdf

                                                                                      • Re: How to Avoid "Monitoring Spam"
                                                                                        superfly99

                                                                                        We monitor all switches, routers and wireless devices. Basic monitoring consists of up/down status. As mentioned groups can reduce the amount of emails we get for 1 particular event. From there, I have to fine tune the alerts. Sometimes it's a case of something happend, but I was not alerted. Other times, I can see the need to be alerted for a particular event. Solarwinds keeps expanding what we can alert on making life much easier. For example, temp monitoring used to be done either via syslogs or custom pollers, but now it is built in. Every now and then I go through the alerts and clean them out just so it is easier to see what is being alerted on.