My first tutorial today at MySQL Conference 2007 is Scaling and High Availablilty Architectures by Jeremy Cole and Eric Bergen of Proven Scaling.
Basic Tenets
While not discussed, the premise is to Cache Everything. MemCache is a key component to any scalable system.
Lifetime of a scalable system
Using the analogy from a newborn child Jeremy stepped us through the categories Newborn, Toddler, Teenager, Late teens to 20s, Adult.
In Late teens to 20s, is where most systems die a slow death, he termed “the awkward stage”. This is where scalability is critical, and a meltdown for example can ruin you. Downtime is also just not acceptable for your user community.
When your Adult you need to perfect the ability to deploy incremental changes to your application seamlessly.
As the system grows, optimizations changes that may have worked are now affecting your system. It’s important to revisit during each stage.
Partitioning
Most applications mainly implement a horizontal partitioning model. Different components of your systemcan be scaled by a “partition key”. The different models include fixed “Hash key” partitioning, Dynamic “directory” partitioning, Partition by “group” and partition by “user”.
The Dynamic “directory” is a lot harder to implement, but is ultimately more scalable.
One of Partitioning Difficulties, is inter-partition interactions. Ultimately the solution is duplicating meta-data or duplicating data. Overall reporting is also more difficult. What if we want average for users per location, if we partition by users. Most systems user driven and partition by user. A newer strategy is to partition by group.
For implementing a Fixed Hash Key partitioning.
- Divide data into B buckets
- Divide the B buckets over M machines
You define 1024 physical buckets (can then the easily dividable) 0-1023 (user_id % 1024). Coded then by range to physical machines, 0-255, 256-511, 512-767, 768-1023. The plus side is very easy to implement, you can always derive where something is. The biggest problems is scalability, e.g. going from 4 machines to 5. You also don’t have any fine grained control over buckets.
For Dynamic Directory partitioning you maintain a database of mappings to partitions. A user can be easily moved at a later date in a much finer grain. MySQL Cluster is designed for this type of application. It is not necessary however, a well configured Innodb Hardware solution with memcache can easily provide the same functionality. The only writes are new users, or update partition keys, with a lot of reads.
HiveDB
This open source product implements a “standard” partition-by-key MySQL system written in Java.
Many organizations have a somewhat similar built system, but this is an example of something that’s been open sourced.
More information at www.hivedb.org .
The Hive API language should be the only code that should be re-written to be application development language (e.g. PHP,Ruby) when needed.
High Availability
The obvious goals.
- Avoid downtime due to failures.
- No single point of failure.
- Extremely fast failover.
- No dependency of DNS changes.
- No Dependency on code changes.
- Painless and seamless failover.
- Fail-back must be just as painless.
The overall objective is speed.
Understanding MySQL Replication is important to understanding HA options.
MySQL Replication is Master-Slave One Way asynchronous replication.
- Slave requests binary logs from last position.
- Master sends binary logs up to current time.
- Master keeps sending binary logs in real-time.
Dual Master provides an easy configuration to fail over, it doesn’t provide benefits in throughput. Can help solve online schema changes without downtime. Assuming existing queries will perform both pre and post schema. (set-sql-bin-log=0 for the session is the tip). There are a number of caveats.
Ultimately for High Availability you have a trade off, data loss (minuet) to scalability.