I tried to go to mysql.com and Planet MySQL over my lunch break at OSCON 2009 to find the websites are down. Seems from conversions with fellow Drizzle colleagues this has been down for some time.
What does your site look like when your system is unavailable or down?
This is a question I ask clients. What redundancy do you have in place for DNS, for a site unavailable page, for a static copy of content?
I learned my first personal lesson several years ago when at The Planet, my server and 9,000 others were unavailable at least 40 hours due to explosion, fire at a data center. While I had copies of my site, and shared hosting options elsewhere, all DNS was also in the same unavailable data center. This was definitely a shortcoming of the Host Provider at the time.
For any commercial site, it is important that at least your have geographical redundancy for DNS. Let’s use mysql.com as an example investigation.
Identify DNS records
$ dig mysql.com ; < <>> DiG 9.4.3-P1 < <>> mysql.com ;; global options: printcmd ;; Got answer: ;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 63421 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;mysql.com. IN A ;; ANSWER SECTION: mysql.com. 2839 IN A 213.136.52.29 ;; AUTHORITY SECTION: mysql.com. 72 IN NS ns7.sun.com. mysql.com. 72 IN NS ns8.sun.com. mysql.com. 72 IN NS ns1.sun.com. mysql.com. 72 IN NS ns2.sun.com. ;; ADDITIONAL SECTION: ns1.sun.com. 86045 IN A 192.18.128.11 ns2.sun.com. 86075 IN A 192.18.99.5 ns7.sun.com. 86085 IN A 192.18.43.15 ns8.sun.com. 86093 IN A 192.18.43.12 ;; Query time: 2 msec ;; SERVER: 10.10.16.2#53(10.10.16.2) ;; WHEN: Wed Jul 22 14:18:11 2009 ;; MSG SIZE rcvd: 183
I am definitely no expert in networking, my understanding is your defined DNS server contain your primary information that is then delegated to servers worldwide.
These servers are up and running. Having no ping response is not an indicator the server not available.
mactaz:~ rbradfor$ ping -c 1 ns1.sun.com PING ns1.sun.com (192.18.128.11): 56 data bytes 64 bytes from 192.18.128.11: icmp_seq=0 ttl=242 time=66.891 ms --- ns1.sun.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 66.891/66.891/66.891/0.000 ms mactaz:~ rbradfor$ ping -c 1 ns2.sun.com PING ns2.sun.com (192.18.99.5): 56 data bytes 64 bytes from 192.18.99.5: icmp_seq=0 ttl=239 time=58.879 ms --- ns2.sun.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 58.879/58.879/58.879/0.000 ms mactaz:~ rbradfor$ ping -c 1 ns7.sun.com PING ns7.sun.com (192.18.43.15): 56 data bytes 64 bytes from 192.18.43.15: icmp_seq=0 ttl=244 time=3.921 ms --- ns7.sun.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 3.921/3.921/3.921/0.000 ms mactaz:~ rbradfor$ ping -c 1 ns8.sun.com PING ns8.sun.com (192.18.43.12): 56 data bytes 64 bytes from 192.18.43.12: icmp_seq=0 ttl=244 time=4.076 ms --- ns8.sun.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 4.076/4.076/4.076/0.000 ms
They even appear to be in different locations which is good.
$ traceroute 192.18.128.11 traceroute to 192.18.128.11 (192.18.128.11), 64 hops max, 40 byte packets 1 10.10.0.1 (10.10.0.1) 1.575 ms 0.882 ms 1.538 ms 2 10.10.16.2 (10.10.16.2) 0.329 ms 0.366 ms 0.376 ms 3 gateway.above.net (209.133.114.1) 1.567 ms 0.785 ms 0.863 ms 4 ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161) 1.386 ms 1.567 ms 1.214 ms 5 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 2.177 ms 1.907 ms 1.873 ms 6 above-att.sjc7.us.above.net (64.125.12.118) 5.361 ms 3.927 ms 3.717 ms 7 cr2.sffca.ip.att.net (12.123.15.162) 66.434 ms 66.523 ms 66.694 ms 8 cr2.la2ca.ip.att.net (12.122.31.133) 67.472 ms 66.008 ms 65.632 ms 9 cr2.dlstx.ip.att.net (12.122.28.177) 66.003 ms 66.372 ms 66.723 ms 10 cr1.attga.ip.att.net (12.122.28.173) 66.472 ms 66.001 ms 66.908 ms 11 gar1.chlnc.ip.att.net (12.122.141.77) 66.139 ms 65.835 ms 65.892 ms 12 12.125.220.10 (12.125.220.10) 67.209 ms 66.569 ms 66.529 ms 13 cltea-ns-1.sun.com (192.18.128.11) 66.357 ms 66.756 ms 66.386 ms mactaz:~ rbradfor$ traceroute 192.18.99.5 traceroute to 192.18.99.5 (192.18.99.5), 64 hops max, 40 byte packets 1 10.10.0.1 (10.10.0.1) 1.159 ms 0.763 ms 0.704 ms 2 10.10.16.2 (10.10.16.2) 0.298 ms 0.303 ms 0.290 ms 3 gateway.above.net (209.133.114.1) 0.637 ms 0.784 ms 0.937 ms 4 ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161) 1.513 ms 1.743 ms 1.746 ms 5 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 2.066 ms 1.417 ms 4.144 ms 6 above-att.sjc7.us.above.net (64.125.12.118) 3.835 ms 3.374 ms 4.001 ms 7 cr2.sffca.ip.att.net (12.123.15.162) 56.427 ms 56.191 ms 55.553 ms 8 cr1.dvmco.ip.att.net (12.122.28.54) 55.819 ms 55.508 ms 55.442 ms 9 gar1.dvmco.ip.att.net (12.122.144.37) 55.429 ms 55.406 ms 55.401 ms 10 12.125.159.146 (12.125.159.146) 59.293 ms 59.501 ms 59.237 ms 11 192.18.101.249 (192.18.101.249) 58.936 ms 59.099 ms 60.184 ms 12 brm-ea-ns-1.Sun.COM (192.18.99.5) 60.090 ms 59.285 ms 59.289 ms mactaz:~ rbradfor$ traceroute 192.18.43.15 traceroute to 192.18.43.15 (192.18.43.15), 64 hops max, 40 byte packets 1 10.10.0.1 (10.10.0.1) 1.070 ms 0.639 ms 0.639 ms 2 10.10.16.2 (10.10.16.2) 0.323 ms 0.238 ms 0.242 ms 3 gateway.above.net (209.133.114.1) 1.524 ms 2.697 ms 0.615 ms 4 ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161) 1.463 ms 1.510 ms 1.922 ms 5 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 7.735 ms 2.136 ms 66.881 ms 6 xe-0-0-0.mpr3.sjc7.us.above.net (64.125.27.85) 1.744 ms 3.131 ms 1.874 ms 7 * above-level3.sjc7.us.above.net (64.125.13.242) 49.976 ms 2.078 ms 8 ae-11-69.car1.SanJose1.Level3.net (4.68.18.3) 124.861 ms 206.837 ms 5.631 ms 9 SUN-MICROSY.car1.SanJose1.Level3.net (4.53.16.50) 3.182 ms 3.579 ms 3.348 ms 10 192.18.44.18 (192.18.44.18) 4.168 ms 4.611 ms 4.146 ms 11 * * * 12 * * * 13 * *^C mactaz:~ rbradfor$ traceroute 192.18.43.12 traceroute to 192.18.43.12 (192.18.43.12), 64 hops max, 40 byte packets 1 10.10.0.1 (10.10.0.1) 1.206 ms 0.818 ms 0.879 ms 2 10.10.16.2 (10.10.16.2) 0.348 ms 0.485 ms 0.465 ms 3 gateway.above.net (209.133.114.1) 10.055 ms 1.911 ms 1.775 ms 4 ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161) 1.278 ms 0.963 ms 1.307 ms 5 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 2.243 ms 2.004 ms 2.041 ms 6 * xe-0-0-0.mpr3.sjc7.us.above.net (64.125.27.85) 2.016 ms 2.104 ms 7 above-level3.sjc7.us.above.net (64.125.13.242) 2.143 ms 1.471 ms 2.106 ms 8 ae-41-99.car1.SanJose1.Level3.net (4.68.18.195) 2.970 ms 3.103 ms ae-31-89.car1.SanJose1.Level3.net (4.68.18.131) 2.876 ms 9 SUN-MICROSY.car1.SanJose1.Level3.net (4.53.16.50) 3.054 ms 3.414 ms 2.925 ms 10 192.18.44.18 (192.18.44.18) 3.721 ms 3.643 ms 3.622 ms 11 scaea-ns-1.sun.com (192.18.43.12) 4.350 ms 3.905 ms 4.188 ms
A traceroute of mysql.com shows it’s outside of the Sun network that at least the DNS servers are at.
$ traceroute 213.136.52.29 traceroute to 213.136.52.29 (213.136.52.29), 64 hops max, 40 byte packets 1 10.10.0.1 (10.10.0.1) 1.243 ms 0.750 ms 0.844 ms 2 10.10.16.2 (10.10.16.2) 0.397 ms 0.353 ms 0.413 ms 3 gateway.above.net (209.133.114.1) 1.254 ms 1.021 ms 0.976 ms 4 ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161) 1.448 ms 0.933 ms 14.524 ms 5 * xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 1.734 ms 2.025 ms 6 sjo-bb1-link.telia.net (213.248.94.29) 2.001 ms 1.942 ms 2.212 ms 7 nyk-bb2-link.telia.net (80.91.254.176) 75.310 ms 81.628 ms 75.063 ms 8 kbn-bb2-link.telia.net (80.91.254.90) 175.072 ms 175.445 ms 174.846 ms 9 s-bb2-pos7-0-0.telia.net (213.248.65.30) 181.580 ms 181.930 ms 182.126 ms 10 s-b3-link.telia.net (80.91.253.226) 184.610 ms 198.216 ms 184.766 ms 11 bahnhof-110262-s-b3.c.telia.net (213.248.97.42) 182.919 ms 185.830 ms 184.827 ms 12 * * * 13 tsic2-gw.bahnhof.net (85.24.151.133) 186.588 ms 186.847 ms 188.352 ms 14 tsic3-gw.bahnhof.net (85.24.151.135) 183.782 ms 183.355 ms 184.660 ms 15 pio-dr1.pio-dr2.bahnhof.net (85.24.151.7) 186.142 ms 186.809 ms 186.723 ms 16 mysql-gw-sec-c.bahnhof.net (85.24.153.74) 183.821 ms 183.793 ms 183.597 ms 17 * * * 18 * * * 19 * * * 20 * * * 21 * * *
For such a significant open source product, I’m surprised that this level of complete unavailability without even a site unavailable page is surprising.
NOTE Further update. It’s been reported the site has been down now for 8+ hours.
Chris Barber says
According to @mysql about 5 hours ago via Twitter, “There is a power outage at our data center in Sweden. We’re working to get the websites up and running soon.”
Nils says
The problem is, changing DNS will take time for the changes to propagate to resolving nameservers at providers. These might cache aggressively. That means you have to know *in advance* how long the outage is going to be to determine wether you switch to another IP address with a static site. Then, when the systems are back up you’ll have the same problem getting the users back on the site that’s operable. Others might still get the old ip from their provider’s resolvers (or own resolvers, in days of internet censorship…) and be directed to the broken site.