26 Replies Latest reply on Oct 3, 2013 11:55 AM by Kevin Rak

    When monitoring service uptime and availability, do you use geographically diverse monitoring stations? Can you have too many?


      In dealing with availability monitoring, how many different geographic angles do you prefer to analyze from? Are the pros and cons to having too many or too few?


      I recently had a downtime problem with a cabinet that I lease for customers as well as my own hosted services. I monitor the uptime of the services running in my cabinet from multiple points throughout the world. During a recent outage, I could see via multiple historical traceroute logs where the problem appeared to be. The checks all started from monitoring nodes outside of my datacenter with monitoring traffic moving inward.


      Traceroute traffic was starring out at a certain edge device for the datacenter, and it was doing so for all geographic checks. I was happy to have such insurmountable evidence to present concerning the downtime. However, later on I noticed some peculiar downtimes that were only affecting certain monitoring stations. Las Vegas lost visibility into my cabinet for a minute, all traffic dying at one of the upstream providers for my datacenter. Days later, Paris, France couldn’t access services in my cabinet for a few minutes, also dying at a hop from one of my datacenter’s providers.


      When monitoring from a vast array of nodes, you start to see how much of a living entity the Internet can appear to be. Latencies morph and shift across the globe, downtimes for certain localities can quickly come and go, sometimes too brief to detect without up-to-the-second checks (which are a bit excessive, if you ask me).


      Monitoring from multiple external nodes can also cause something of a mania. I mean, do I really need to check from 56 different external locations? With that number of localities, you start to get alerts for spurious, transient and local interrupts that aren’t your fault. I've quickly been swept up in fretting over strange local network issues for monitoring stations and tracking down gremlins that weren't even mine. And anyway, do clients or business leaders really need to be impressed that badly unless they’re FaceBook? Wouldn't they be mollified with just a dozen geographic monitors?


      When you’re setting up checks for your services, how many geographically diverse monitors do you like to have? Is there a point when you can have too many? What have you discovered in this realm of uptime monitoring, service availability, and overall transaction time checking? Can you have too much of a good thing?