Let me first recall some basics. Your company’s website is as available to customers and partners only as long as your name-server is available to answer DNS queries for the server IP address(es) associated with the company’s domain name (your_company.com). The answer your name-server sends in response to a DNS request includes a value (in seconds) for time-to-live (TTL)—the time for which the answer should be considered reliable for another name-server to hold in cache.


When your customer types your_company.com into a browser, the browser asks the name-server tending the local domain—within which the customer’s request originates—to resolve the domain name to an IP address; so that the browser can then ask the appropriate web server to send over your company’s homepage. If the local name-server cannot immediately give the answer it queries an internet root name-server, which also does not give the answer outright but rather directs the local name-server to an appropriate top tier name-server (.com, .net, .org, etc.). The top tier name-server gives the IP address sought by the local name-server and the local name-server relays it to your customer’s browser; the browser finally is able to request the relevant webpage from the web server running behind your_company.com. Eight points of communication precede the customer seeing the first webpage; usually it all happens within a few seconds.


Apart from having your name-server include a TTL with its answers to DNS queries, and hoping that the name-servers in other domains are configured to respect TTL values, there is nothing you can do to influence the speed with which DNS facilitates the delivery of the first webpage from your site.


But latency in delivery of your web content is irrelevant compared to the failure to return any content due to the unavailability of your domain’s name-server. In that case, instead of the—at worst—slowly loading homepage, the customer receives the fatal error message related to being unable to find the web server.


Redundancy Ensures Availability

Most likely, if your company does business through its website, all production servers are sitting in a third-party datacenter with the infrastructure to guarantee the availability of power and cooling. That part of the availability formula is usually out-sourced, in other words.

Since serving up web content depends on your name-server, having two name-servers is much better than having one. You can setup the two name-servers as equally authoritative for your domain, so that the query load is balanced in a round robin while both servers are up, and so that each name-server is capable of handling all queries if the other goes down.


But if your primary datacenter goes down—and this happens often enough to be of concern, despite infrastructural redundancies—redundant name-servers are of not help to you.  So the best practice is to have redundant name-servers along with your production web content in two different datacenters.


Monitoring What is Available

Let’s assume that you have the name-servers setup in the recommended double redundancy (two name-server pairs in two datacenters). You already have a highly available DNS operation.


Now you need a process for maintaining the records on your name-servers that tell your customer’s browser which web servers to contact for various resources. Since human error is most prevalent in configuration file updates, you need to know when DNS records on the name-servers are modified and if the modifications are accurate.


dnsstuff.com is an excellent resource for tools related to exploring and maintaining the DNS piece of your web service operation.