MYSQL Conference – Scaling and High Availablilty Architectures Tutorial

Ronald Bradford
April 23, 2007

My first tutorial today at MySQL Conference 2007 is Scaling and High Availablilty Architectures by Jeremy Cole and Eric Bergen of Proven Scaling.

Basic Tenets

While not discussed, the premise is to Cache Everything. MemCache is a key component to any scalable system.

Lifetime of a scalable system

Using the analogy from a newborn child Jeremy stepped us through the categories Newborn, Toddler, Teenager, Late teens to 20s, Adult.

In Late teens to 20s, is where most systems die a slow death, he termed “the awkward stage”. This is where scalability is critical, and a meltdown for example can ruin you. Downtime is also just not acceptable for your user community.

When your Adult you need to perfect the ability to deploy incremental changes to your application seamlessly.

As the system grows, optimizations changes that may have worked are now affecting your system. It’s important to revisit during each stage.

Partitioning

Most applications mainly implement a horizontal partitioning model. Different components of your systemcan be scaled by a “partition key”. The different models include fixed “Hash key” partitioning, Dynamic “directory” partitioning, Partition by “group” and partition by “user”.

The Dynamic “directory” is a lot harder to implement, but is ultimately more scalable.

One of Partitioning Difficulties, is inter-partition interactions. Ultimately the solution is duplicating meta-data or duplicating data. Overall reporting is also more difficult. What if we want average for users per location, if we partition by users. Most systems user driven and partition by user. A newer strategy is to partition by group.

For implementing a Fixed Hash Key partitioning.

Divide data into B buckets
Divide the B buckets over M machines

You define 1024 physical buckets (can then the easily dividable) 0-1023 (user_id % 1024). Coded then by range to physical machines, 0-255, 256-511, 512-767, 768-1023. The plus side is very easy to implement, you can always derive where something is. The biggest problems is scalability, e.g. going from 4 machines to 5. You also don’t have any fine grained control over buckets.

For Dynamic Directory partitioning you maintain a database of mappings to partitions. A user can be easily moved at a later date in a much finer grain. MySQL Cluster is designed for this type of application. It is not necessary however, a well configured Innodb Hardware solution with memcache can easily provide the same functionality. The only writes are new users, or update partition keys, with a lot of reads.

HiveDB

This open source product implements a “standard” partition-by-key MySQL system written in Java.
Many organizations have a somewhat similar built system, but this is an example of something that’s been open sourced.

More information at www.hivedb.org .

The Hive API language should be the only code that should be re-written to be application development language (e.g. PHP,Ruby) when needed.

High Availability

The obvious goals.

Avoid downtime due to failures.
No single point of failure.
Extremely fast failover.
No dependency of DNS changes.
No Dependency on code changes.
Painless and seamless failover.
Fail-back must be just as painless.

The overall objective is speed.

Understanding MySQL Replication is important to understanding HA options.

MySQL Replication is Master-Slave One Way asynchronous replication.

Slave requests binary logs from last position.
Master sends binary logs up to current time.
Master keeps sending binary logs in real-time.

Dual Master provides an easy configuration to fail over, it doesn’t provide benefits in throughput. Can help solve online schema changes without downtime. Assuming existing queries will perform both pre and post schema. (set-sql-bin-log=0 for the session is the tip). There are a number of caveats.

Ultimately for High Availability you have a trade off, data loss (minuet) to scalability.

Tagged with: General MySQL MySQL Conference &Amp; Expo 2007

MySQL and Heatwave Summit Presentation

Ronald Bradford
April 30, 2025

Last week I had the opportunity to speak at the MySQL and Heatwave Summit in San Francisco. I discussed the impact of the new MySQL 8.0 default caching_sha2_password authentication, replacing the mysql_native_password authentication that was the default for approximately 20 of the 30 years that MySQL has existed.

Readyset QueryPilot Announcement

Ronald Bradford
April 22, 2025

At the MySQL and Heatwave Summit 2025 today, Readyset announced a new data systems architecture pattern named Readyset QueryPilot . This architecture which can front a MySQL or PostgreSQL database infrastructure, combines the enterprise-grade ProxySQL and Readyset caching with intelligent query monitoring and routing to help support applications scale and produce more predictable results with varied workloads.

More CPUs or Newer CPUs

Ronald Bradford
April 2, 2025

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).