mysql.com and related sites are down

I tried to go to mysql.com and Planet MySQL over my lunch break at OSCON 2009 to find the websites are down. Seems from conversions with fellow Drizzle colleagues this has been down for some time.

What does your site look like when your system is unavailable or down?

This is a question I ask clients. What redundancy do you have in place for DNS, for a site unavailable page, for a static copy of content?

I learned my first personal lesson several years ago when at The Planet, my server and 9,000 others were unavailable at least 40 hours due to explosion, fire at a data center. While I had copies of my site, and shared hosting options elsewhere, all DNS was also in the same unavailable data center. This was definitely a shortcoming of the Host Provider at the time.

For any commercial site, it is important that at least your have geographical redundancy for DNS. Let’s use mysql.com as an example investigation.

Identify DNS records

$ dig mysql.com

; < <>> DiG 9.4.3-P1 < <>> mysql.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 63421
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;mysql.com.			IN	A

;; ANSWER SECTION:
mysql.com.		2839	IN	A	213.136.52.29

;; AUTHORITY SECTION:
mysql.com.		72	IN	NS	ns7.sun.com.
mysql.com.		72	IN	NS	ns8.sun.com.
mysql.com.		72	IN	NS	ns1.sun.com.
mysql.com.		72	IN	NS	ns2.sun.com.

;; ADDITIONAL SECTION:
ns1.sun.com.		86045	IN	A	192.18.128.11
ns2.sun.com.		86075	IN	A	192.18.99.5
ns7.sun.com.		86085	IN	A	192.18.43.15
ns8.sun.com.		86093	IN	A	192.18.43.12

;; Query time: 2 msec
;; SERVER: 10.10.16.2#53(10.10.16.2)
;; WHEN: Wed Jul 22 14:18:11 2009
;; MSG SIZE  rcvd: 183

I am definitely no expert in networking, my understanding is your defined DNS server contain your primary information that is then delegated to servers worldwide.

These servers are up and running. Having no ping response is not an indicator the server not available.

mactaz:~ rbradfor$ ping -c 1 ns1.sun.com
PING ns1.sun.com (192.18.128.11): 56 data bytes
64 bytes from 192.18.128.11: icmp_seq=0 ttl=242 time=66.891 ms

--- ns1.sun.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 66.891/66.891/66.891/0.000 ms
mactaz:~ rbradfor$ ping -c 1 ns2.sun.com
PING ns2.sun.com (192.18.99.5): 56 data bytes
64 bytes from 192.18.99.5: icmp_seq=0 ttl=239 time=58.879 ms

--- ns2.sun.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 58.879/58.879/58.879/0.000 ms
mactaz:~ rbradfor$ ping -c 1 ns7.sun.com
PING ns7.sun.com (192.18.43.15): 56 data bytes
64 bytes from 192.18.43.15: icmp_seq=0 ttl=244 time=3.921 ms

--- ns7.sun.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 3.921/3.921/3.921/0.000 ms
mactaz:~ rbradfor$ ping -c 1 ns8.sun.com
PING ns8.sun.com (192.18.43.12): 56 data bytes
64 bytes from 192.18.43.12: icmp_seq=0 ttl=244 time=4.076 ms

--- ns8.sun.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 4.076/4.076/4.076/0.000 ms

They even appear to be in different locations which is good.

$ traceroute 192.18.128.11
traceroute to 192.18.128.11 (192.18.128.11), 64 hops max, 40 byte packets
 1  10.10.0.1 (10.10.0.1)  1.575 ms  0.882 ms  1.538 ms
 2  10.10.16.2 (10.10.16.2)  0.329 ms  0.366 ms  0.376 ms
 3  gateway.above.net (209.133.114.1)  1.567 ms  0.785 ms  0.863 ms
 4  ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161)  1.386 ms  1.567 ms  1.214 ms
 5  xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  2.177 ms  1.907 ms  1.873 ms
 6  above-att.sjc7.us.above.net (64.125.12.118)  5.361 ms  3.927 ms  3.717 ms
 7  cr2.sffca.ip.att.net (12.123.15.162)  66.434 ms  66.523 ms  66.694 ms
 8  cr2.la2ca.ip.att.net (12.122.31.133)  67.472 ms  66.008 ms  65.632 ms
 9  cr2.dlstx.ip.att.net (12.122.28.177)  66.003 ms  66.372 ms  66.723 ms
10  cr1.attga.ip.att.net (12.122.28.173)  66.472 ms  66.001 ms  66.908 ms
11  gar1.chlnc.ip.att.net (12.122.141.77)  66.139 ms  65.835 ms  65.892 ms
12  12.125.220.10 (12.125.220.10)  67.209 ms  66.569 ms  66.529 ms
13  cltea-ns-1.sun.com (192.18.128.11)  66.357 ms  66.756 ms  66.386 ms
mactaz:~ rbradfor$ traceroute 192.18.99.5
traceroute to 192.18.99.5 (192.18.99.5), 64 hops max, 40 byte packets
 1  10.10.0.1 (10.10.0.1)  1.159 ms  0.763 ms  0.704 ms
 2  10.10.16.2 (10.10.16.2)  0.298 ms  0.303 ms  0.290 ms
 3  gateway.above.net (209.133.114.1)  0.637 ms  0.784 ms  0.937 ms
 4  ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161)  1.513 ms  1.743 ms  1.746 ms
 5  xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  2.066 ms  1.417 ms  4.144 ms
 6  above-att.sjc7.us.above.net (64.125.12.118)  3.835 ms  3.374 ms  4.001 ms
 7  cr2.sffca.ip.att.net (12.123.15.162)  56.427 ms  56.191 ms  55.553 ms
 8  cr1.dvmco.ip.att.net (12.122.28.54)  55.819 ms  55.508 ms  55.442 ms
 9  gar1.dvmco.ip.att.net (12.122.144.37)  55.429 ms  55.406 ms  55.401 ms
10  12.125.159.146 (12.125.159.146)  59.293 ms  59.501 ms  59.237 ms
11  192.18.101.249 (192.18.101.249)  58.936 ms  59.099 ms  60.184 ms
12  brm-ea-ns-1.Sun.COM (192.18.99.5)  60.090 ms  59.285 ms  59.289 ms
mactaz:~ rbradfor$ traceroute 192.18.43.15
traceroute to 192.18.43.15 (192.18.43.15), 64 hops max, 40 byte packets
 1  10.10.0.1 (10.10.0.1)  1.070 ms  0.639 ms  0.639 ms
 2  10.10.16.2 (10.10.16.2)  0.323 ms  0.238 ms  0.242 ms
 3  gateway.above.net (209.133.114.1)  1.524 ms  2.697 ms  0.615 ms
 4  ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161)  1.463 ms  1.510 ms  1.922 ms
 5  xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  7.735 ms  2.136 ms  66.881 ms
 6  xe-0-0-0.mpr3.sjc7.us.above.net (64.125.27.85)  1.744 ms  3.131 ms  1.874 ms
 7  * above-level3.sjc7.us.above.net (64.125.13.242)  49.976 ms  2.078 ms
 8  ae-11-69.car1.SanJose1.Level3.net (4.68.18.3)  124.861 ms  206.837 ms  5.631 ms
 9  SUN-MICROSY.car1.SanJose1.Level3.net (4.53.16.50)  3.182 ms  3.579 ms  3.348 ms
10  192.18.44.18 (192.18.44.18)  4.168 ms  4.611 ms  4.146 ms
11  * * *
12  * * *
13  * *^C
mactaz:~ rbradfor$ traceroute 192.18.43.12
traceroute to 192.18.43.12 (192.18.43.12), 64 hops max, 40 byte packets
 1  10.10.0.1 (10.10.0.1)  1.206 ms  0.818 ms  0.879 ms
 2  10.10.16.2 (10.10.16.2)  0.348 ms  0.485 ms  0.465 ms
 3  gateway.above.net (209.133.114.1)  10.055 ms  1.911 ms  1.775 ms
 4  ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161)  1.278 ms  0.963 ms  1.307 ms
 5  xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  2.243 ms  2.004 ms  2.041 ms
 6  * xe-0-0-0.mpr3.sjc7.us.above.net (64.125.27.85)  2.016 ms  2.104 ms
 7  above-level3.sjc7.us.above.net (64.125.13.242)  2.143 ms  1.471 ms  2.106 ms
 8  ae-41-99.car1.SanJose1.Level3.net (4.68.18.195)  2.970 ms  3.103 ms ae-31-89.car1.SanJose1.Level3.net (4.68.18.131)  2.876 ms
 9  SUN-MICROSY.car1.SanJose1.Level3.net (4.53.16.50)  3.054 ms  3.414 ms  2.925 ms
10  192.18.44.18 (192.18.44.18)  3.721 ms  3.643 ms  3.622 ms
11  scaea-ns-1.sun.com (192.18.43.12)  4.350 ms  3.905 ms  4.188 ms

A traceroute of mysql.com shows it’s outside of the Sun network that at least the DNS servers are at.

$ traceroute 213.136.52.29
traceroute to 213.136.52.29 (213.136.52.29), 64 hops max, 40 byte packets
 1  10.10.0.1 (10.10.0.1)  1.243 ms  0.750 ms  0.844 ms
 2  10.10.16.2 (10.10.16.2)  0.397 ms  0.353 ms  0.413 ms
 3  gateway.above.net (209.133.114.1)  1.254 ms  1.021 ms  0.976 ms
 4  ge-11-0-2.er1.sjc2.us.above.net (64.124.196.161)  1.448 ms  0.933 ms  14.524 ms
 5  * xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  1.734 ms  2.025 ms
 6  sjo-bb1-link.telia.net (213.248.94.29)  2.001 ms  1.942 ms  2.212 ms
 7  nyk-bb2-link.telia.net (80.91.254.176)  75.310 ms  81.628 ms  75.063 ms
 8  kbn-bb2-link.telia.net (80.91.254.90)  175.072 ms  175.445 ms  174.846 ms
 9  s-bb2-pos7-0-0.telia.net (213.248.65.30)  181.580 ms  181.930 ms  182.126 ms
10  s-b3-link.telia.net (80.91.253.226)  184.610 ms  198.216 ms  184.766 ms
11  bahnhof-110262-s-b3.c.telia.net (213.248.97.42)  182.919 ms  185.830 ms  184.827 ms
12  * * *
13  tsic2-gw.bahnhof.net (85.24.151.133)  186.588 ms  186.847 ms  188.352 ms
14  tsic3-gw.bahnhof.net (85.24.151.135)  183.782 ms  183.355 ms  184.660 ms
15  pio-dr1.pio-dr2.bahnhof.net (85.24.151.7)  186.142 ms  186.809 ms  186.723 ms
16  mysql-gw-sec-c.bahnhof.net (85.24.153.74)  183.821 ms  183.793 ms  183.597 ms
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *

For such a significant open source product, I’m surprised that this level of complete unavailability without even a site unavailable page is surprising.

NOTE Further update. It’s been reported the site has been down now for 8+ hours.

Comments

  1. says

    According to @mysql about 5 hours ago via Twitter, “There is a power outage at our data center in Sweden. We’re working to get the websites up and running soon.”

  2. says

    The problem is, changing DNS will take time for the changes to propagate to resolving nameservers at providers. These might cache aggressively. That means you have to know *in advance* how long the outage is going to be to determine wether you switch to another IP address with a static site. Then, when the systems are back up you’ll have the same problem getting the users back on the site that’s operable. Others might still get the old ip from their provider’s resolvers (or own resolvers, in days of internet censorship…) and be directed to the broken site.