What questions do you ask?

When you have to evaluate a MySQL System & Environment, what questions do you ask in order to determine critical information about the environment and evaluate the business success and viability. You don’t have to be a consultant to ask these questions, ask them of your own environment. Do any of the answers shock or concern you?

I was prompted to write about this from a conversation with a colleague about “accepting risk”. His comment was, “every IT server on the planet is vulnerable regardless of best practices.”

Here is my list of questions for you based on an immediate response from this discussion?

Technology

  • What is your full technology stack, i.e. Operating System, Database, Application Server, Development Language(s) and other essential components?
  • What are the versions of these technologies?
  • What new technologies or versions of existing technologies are you presently evaluating?

Disaster Recovery

  • What is your Backup and Recovery strategy including your Database, Application and Administration?
  • Have you tested your Backup and Recovery strategy?
  • Have you really tested your Backup and Recovery strategy from end to end? How long ago? How long did it take?
  • What RAID do you run? Have you physically verified that? When did you confirm you are not running in a degraded RAID situation?
  • What does your website look like when it’s unavailable? What is the physical content on the website. Let’s pretend your entire data center is unavailable for 40 hours.
  • Have you ever had a major disaster? What did you learn from this experience?

Development Processes

  • Do you use a version control system? Which one? Is everything under version control?
  • Do you have a controlled and reproducible build and release process? Is it automated in any way?
  • What are your levels of testing. Unit test for coding? Regression testing for new features? Volume Testing? End User Testing?
  • Do you have a proper test environment (which is not production) where you can accurately evaluate production software and production problems?

Infrastructure

  • How do you know when there is a problem with your site? Do you have monitoring and alert notification in place?
  • When you have a performance problem, can you evaluate if it is new, re-occuring or a gradually worsening problem?
  • What are you two biggest performance problems right now? What are the specific details of the problem? “My website is slow” is not an answer
  • Can you roll out new features without taking your website down for general use?

Business Viability

  • How long would a customer stay with your site if it was unavailable?
  • Can your clients be satisfied with the Twitter “failed whale” approach or will they leave?
  • When will your system crash under load? Do you know this figure? What is the load today and the projections to this failure point?

Given more time I’d probably revise the list, but this was just an initial response.

This post is part of 31 Days to Build a Better BlogWrite a List Post.