SPOF Internet

SPOF (i.e. Single Point of Failure) is the bane for technologists. Avoiding SPOF generally requires redundancy, and redundancy has a cost, often more then a business is prepared to pay. In the database field, I see this regularly and advise clients on how to improve availability and potential avoid disasters that can affect their business.

Today, at approximately 10:30am, the Con Edison work crew in front of my home (digging a 5″ deep trench down the road), severed multiple Time Warner Cable fibre connections. ($#&* and the lack of ownership to correct timely is another story). No Internet, no ability to work actively with clients (which I was doing), etc, etc.

As an individual that works from home, I have recognized this SPOF and have redundancy in place. That is, a Verizon MiFi HotSpot, normally used for travel, but a backup in times of Internet downtime to my home. The moral here is, one level of redundancy is often not enough, just as MySQL replication is not a backup solution, only one part thereof, my backup redundancy was in maintenance mode (in this case loaned to a good overseas friend that is visiting and traveling in the US). Disaster often strikes unexpectedly, and often causes multiple failures, this being an example of a cascading failure in my Internet redundancy procedures.

The fact that I am writing this post, shows that I have a second backup, that is a portable WiFi hotspot on my T-Mobile phone. It’s not great, but it is an emergency.

This is not a satisfactory solution long term. My first estimate for repairs was September 17th. Even after stressing that was unsatisfactory, my second estimate is still unacceptable for my business.

Amen, to co-working spaces in New York. The ability to work at a location for a fixed daily/weekly/monthly cost. It pays to know where they are, and have an informal relationship for such an emergency. While inconvenient, I can take a laptop and have power and Internet to work for some core hours in the day, again far less then ideal.

For those that have actually decided to read this far, the moral of the story is this. What is your plan when this happens to your business connection, or your primary MySQL database server? What is acceptable downtime, and how to address correcting issues outside of your control (e.g. An explosion in a data center taking out 15,000 servers, which has happened to me, or damage to the 5 servers you have, all in a single rack). With practically every client, there is not a defined plan in the event of a disaster. There should be.