Clearly define your uptime needs

In writing about Performance and Scalability I referenced a quote that I have provided in a number of presentations regarding a valuable interaction with a client. All software architects and managers need to clearly understand this for their own sites in order to enable technical resources to deliver a highly scalable solution.

Development Manager:  We need a maintenance window for software upgrades and new releases.
CTO:  No Downtime.
Development Manager: But we need this to fix problems and improve performance.
CTO:  No Downtime.
Consultant (aka Ronald Bradford):  Mr CTO. What is your definition of no downtime?
CTO:  We serve pages, we serve ads.
Consultant: We can do that.

Asking the right question about the uptime requirements completely changed the architecture needed to meeting these specific high availability needs.

It is important to know with this major TV network client the answer was not updating content, selling merchandise or enabling customers to comment. Each of these needs requires a different approach to high availability.

Tagged with: Site Reliability

Related Posts

More CPUs or Newer CPUs

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Read more

An Interesting Artifact with AWS RDS Aurora Storage

As part of using public datasets with my own Benchmarking Suite I wanted upsize a dataset for larger volume testing. I have always used the INFORMATION_SCHEMA.TABLES data_length and index_length columns as a sufficiently accurate measurement for actual disk space used.

Read more

How long does it take the ReadySet cache to warm up?

During my setup of benchmarking I run a quick test-sysbench script to ensure my configuration is right before running an hour+ duration test. When pointing to a Readyset cache where I have cached the 5 queries used in the sysbench test, but I have not run any execution of the SQL, throughput went up 10x in 5 seconds.

Read more