Replication++, Replication 2.0, Replication.Next

There has been some discussion recently regarding the death of MySQL Read Replication starting with Brian Aker and then Farhan, Arjen and Paul have all chimed in. Whatever you want to call it, the next generation of replication approach is clearly on the agenda of the industry leaders and pack followers. We should take a programmatic look however and ask ourselves a few questions. Such as:

  1. The Use — What is/was MySQL Replication used for?
  2. The Reasons — Why was it used?
  3. The Problems — Why is there a need for something better, different or improved?

Read Scalability

  1. The Use — Clearly the most commonly known method of MySQL Replication is for Read Scalability. The ability for many more reads to occur against the data then writes. I call it the YouTube effect, thousands/millions of reads (i.e. view a video) to a single write (i.e. upload of a video)
  2. The Reasons — Database servers could not support the read/write load. Applications were poorly designed for sharding data. The database was being used for a function when it wasn’t really needed. The MySQL Server can’t scale to a massive number of cores.
  3. The Problems — MySQL Replication suffers from the asynchronous condition. A successful transaction on the Master, is not confirmed with a Slave before acknowledgment to the client. Google has made inroads in their own environment to improve the asynchronous nature with the SemiSyncReplication patch, now submitted back to the community. The number one effect of asynchronous replication is lag, and again poorly developed applications simply can’t support successful scale-out. Other problems include locking contention when not using a transactional engine.

Backups

  1. The Use — A Replication slave is used to perform a database backup, designed to not affect production usage.
  2. The Reasons — MySQL does not support a single “unbreakable” online backup solution. Period. There are partial/incomplete implementations, dependent on storage engines and data volume. There is a Roadmap for a solution in MySQL 6.0 (finally), but it’s way too early to tell what features will there be, will you need to pay for some of them, will it just work, let alone who is going to upgrade to 6.0 for this.
  3. The Problems — There exists no consistent method for backup across all storage engines. There is no compatibility for backup/point in time recovery (e.g. binary logs/redo logs) across different master/slave servers. No mirrored binary logs is also a problem, something also in the Google Patch.

Failover

  1. The Use — You need Business Continuity or Revenue Continuity Solutions (RCS) as discussed by Marc Simony. You need to ensure the maximum uptime possible with a failover infrastructure when some component (software or hardware) fails and your production database is not accessible.
  2. The Reasons — MySQL provides no Online DDL. In a 24×7 large volume, changing functionality application this provides an unnecessary complexity for applying changes easily.
  3. The Problems — When correctly configured and applied in a controlled situation, failover can work, provided your data is not too large, and your traffic is not too much. It simply doesn’t have a guarantee to work without data loss in an uncontrolled (i.e. disaster recovery) situation.

Others

There are more reasons, something to write about at a later time. These would include.

  • Testing
  • Alternative data access path (e.g. Data Warehousing)
  • Upgrade Path/Migration Path

Options

To handle Read Scalability. Why select something from the database (or various level of caches in the database call when it can be managed by the Application). Memcache is the obvious buzz word here. With tighter database integration (such as the patches and commands FaceBook has) this could be a winner.

Backups is a tough one, without practical storage engine agnostic consistent solutions I don’t feel the backup conundrum will be resolved anytime soon.

To handle a successful failover you need a guarantee that the data is consistent. A Synchronous Replication solution will solve this. MySQL Cluster is a synchronous solution, however it is impractical for basically every application that is already running with MySQL Replication.

What other options exist? Food for thought.

About Memcache

A little on Memcache for those still in the dark ages. Many large sites such as FaceBook,Fotolog, Wikipedia and Slashdot are strong proponents of Memcached. Created originally for Live Journal Memcached has become the defacto standard in this level of application data caching.

The description from the source.

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Comments

  1. says

    I see you’re focusing on problem of replication being Async – of course as you’re going to solve this problem,
    but do not be mistaken – this is not the only problem which limits read scalability. The data size is another one.

    Say you had 10GB data and couple of 16GB slaves. Now you got 100GB data and 10 times more reads. Will 20 Slaves handle it ? Most likely not because you week 20 copies of same data on disk and in the cache…

    Though it will surely be good combination with Sharding.