On 09 September 2012 Go Daddy and all the domains for which it provides DNS were down for five or more hours. Thousands of sites—some associated with businesses, and many the primary means of doing business—all went dark.

 

This protracted outage provides a good opportunity to discuss two related points of IT practice: risks of outsourcing DNS for your website(s) and the need for tight control on config updates for network devices.

 

What Happened?

In the name of the hacker group Anonymous, someone claimed responsibility for coordinating a distributed denial of service (DDOS) attack against Go Daddy. Since Go Daddy has been alone among ISPs in declaring support for internet-policing legislation (SOPA), one might think Anonymous would seek to make a political example of Go Daddy.

 

Indeed, if Anonymous insurgents or dosnet bots launched High Orbit Ion Canon software at Go Daddy DNS servers, it’s possible that the flood of traffic could overwhelm an infrastructure that can handle 10 billion DNS queries a day.

 

Yet, before the outage on September 9th, Go Daddy already had reversed their position and opposed SOPA, seemingly removing the motive for hacktivist mischief. And in this case, review of relevant logs reflected no spikes.

 

Go Daddy itself claims that corruption in routing tables caused the outage. Assuming that’s true, let’s first look at the routing footprint to godaddy.com web servers. I’m located in the middle of the US. When I trace the route to godaddy.com I see it pass through datacenters in the DC metro, Atlanta, New Jersey, Dallas, Colorado, and Phoenix (where Go Daddy owns two large datacenters). The Godaddy DNS servers CN1.SECURESERVER.NET and CNS3.SECURESERVER.NET are located in New Jersey; and CN2.SECURESERVER.NET is located in Phoenix.

 

Some hops in the route to godaddy.com—probably those in the DC metro—involve ICANN root servers. But let’s say that at least 10 hops occur within Go Daddy’s network in the process of their name servers sorting the query and sending the user’s browser to the appropriate www.godaddy.com web server farm. And let’s keep in mind that, in terms of DNS, the same route pertains to any of the millions of websites that Go Daddy hosts on the more than 30 thousands servers in their datacenters and the millions more for which it resolves DNS requests.

 

The point here is that even if the route to the web servers involved 10 different Go Daddy routing points, one expects that a routing table issue replicated on all affected devices could fairly easily be resolved within an hour; unless the IT team’s routing table repository is such a mess that it takes a protracted forensic analysis of recent config changes to figure out the source of the problem and how to fix it without causing other problems.

 

Covering Bases

Taking Go Daddy at their word, we confirm that an IT team must always know the history of the config changes made to network devices. And as part of creating a point of truth for that history, we must include an approval step in the process of changing network configurations.

 

We can also infer from fact of Go Daddy’s extended downtime the importance of having a clear triage process for troubleshooting network events.

 

We are left with the question of the risks and benefits of outsourcing DNS to a big provider like Go Daddy. As I have discussed in a different article, architecting DNS for high availability requires multiple redundant name servers, preferably located in different datacenters. In that regard, Go Daddy in particular would seem to have that base covered.

 

Your alternative is to setup your own DNS and rely on Go Daddy or another registrar to secure your use of the domain name. However, in this case, you must make sure that the registrar properly sets-up the A record for your domain. In the case of the Go Daddy outage, at least one domain holder experienced downtime for his domain even though Go Daddy does not manage DNS for the domain. So the lesson in registering your domain is always confirm that registrar sets-up DNS records for your domain the ventway you expect; and if possible devise tests to verify it.