MySQL Conference – Google

MySQL: The Real Grid Database

Introduction

  • Can’t work on performance problems until we solve the availability
  • We want MySQL to fix our problems first.

The problem

  • Deploy a DBMS for a workload with
    • too many queries
    • to many transactions
    • to much data

A well known solution

  • deploy a grid database
  • -use many replicas to scale read performance
    -shard your data over many master to scale write performance
    -sharding is easy, resharding is hard

  • availability and manageability trump performance
  • – make it easy to run many severs
    – unbretable aggregate perfomance

  • we describe problems that matter to us.
  • The grid database approach

    • Deploy a large number of small servers
    • use highly redundant commodity components
    • added capacity has a low incremental cost
    • not much capacity lost when a server fails
    • support many servers with a few DBAs

    Managability
    Make it easy to do the tasks that must be done. Reduce, Reduce.
    Make all tasks scriptable
    Why does it mater, support hundreds of servers, spend time solving more interesting problems. You generally have lots of problems to solve.

    Underutilize your severs
    Require less maintenance
    Requre less tuning
    tolerate load spikes better
    tolerate bad query plans better

    In a Perfect World
    Short running queries
    uses kill mistake and runaway queries
    accounts new use to many connections
    query plans are good
    new apps increase database workload by a small amount
    only appropiate date is stored in the database

    Reality

    • Long running transactions, create replication delays everywhere
    • servers with round robin DNS aliases make queries hard to find
    • applications create more connections where the database is slow
    • some storage engines use sampling to get query plan statistics
    • new applications create new database performance problems
    • applications use the database as long as rows are never deleted
    • many long running queries on replicas

    Solutions

    • Improve your ability to respond because prevention is impossible
    • Need tools to make monitoring easier
    • determine what is happening across servers
    • detemine what happened in the past

    Mantra

    • Monitor everything you can, and archive as long as possible. (vmstat 5 secs, iostat, mysql error logs)
    • You will need these to reconstruct failures
    • save as much as possible
    • script as much as possible

    Monitoring Matters

    • Display what is happening
    • -which table, account or statemsns caused most of the load
      -many fast queries can be as much a problem as one slow query

    • Record what happending
    • –archivce show status counters somweere
      – query data from the archive
      — visualise data from the archivce

    • record queries that have been run
    • — archive show processlist output (do every 30 seconds)
      — support queries on this archive

    • All of this much scale to an environment with many servers

    Monitoring Tools

    • Display counters and rate change for counters
    • aggregate values over many servers
    • visualize and rang results
    • display results over time

    Google mpgrep tools

    New Commands
    We changed mysql, three new commands
    SHOW USER _STATISTICS
    SHOW TABLE STATISTICS
    SHOW INDEX STATISTICS

    Per Account Activity
    USER_STATISTICS
    seconds executing commands
    number of rows fetched and changed
    total connections
    number of select/updates/other/commits/rollback/binlog bytes written.

    TABLE STATISTICS
    number of rows fetched/changed

    INDEX STATISTICS
    display number of rows fetched per index
    helps find indexes that are never used

    available in code.google.com in 4.0, porting to 5.0

    MySQL High Availability

    • Great options
      • Cluster
      • Replication
      • Middelware — e.g. continum
      • DRBD
    • We need some features right now
    • we are committed to innodb and mysql replication
    • *a lot of appplicaton code works on this
      *our tools and processed support this

    • We favor commodity hardware

    There are all great features but we are much more limited in what we can use.
    Management want to know we don’t loose transactions, not loose some transactions.

    Desired HA Functionality

    • Zero transaction loss on failures of a master
    • minimal downtime on failures of a master
    • reasonable cost in performance and dollars
    • fast and automatic failover to local or remove server
    • no changes to our programming model
      • does it support MVCC
      • does it support long running transactions (5 mins – populate temp table then use to update another table, changing rows), 5 mins on master, causes 5 mins on slave, causes code to failover from slaves to master

    • replication and reporting are concurrent on a slave

    MVCC must have update concurrent with query.

    Failures happen everywhere
    OS – kernal oom or panic (older 2.4 32 bit systems)
    mysqld – caused also by code we added
    disk, misdirected write, corrupt write (love innodb checksums)
    file system – inconsisted after unplanned hardware reboot (use ext2)
    server – bad RAM
    lan, switch – lose
    Rack – reboot
    Data center – power loss, overheading, lightning, fire
    People – things get killed or rebooted by mistake ( a typo can take out the wrong server, when names differ by a character or a digit)

    ext2 and 4.0 are great, there are the same generation.
    Trying not to use RAID, not battery backed raid etc, we try work around with software solutions. We do use RAID 0, but we also try software solution.
    When we have the right HA solution, we won’t need RAID.

    Mark. “Yes, Google programmers have bugs. Not me personally, it was my predecessor.”

    HA Features we want in MySQL
    Synchronous replication as an option
    a product that watches a master and initiates a failover
    archives of the master binlogs stored elsewhere
    state stored in the filesytstem t obe consistent after a crash
    . innodb and mysql dictionaries can get out of sync
    .replicatoin state on a slave can get out of sync

    We could not wait
    Features we added to MySQL 4.0.26
    We can do things a lot faster
    . we have more developers lying around
    . Our needs as specific, not a general product solution

    Transactional replications for slaves
    semi-synchronous replication
    mirrored binlogs
    fast and automated failover

    Transactional Replication
    Replication state on a slave is stored in files
    slave sql thread commits to storage engines and then updates a file
    a crash between the two can make replication state inconsistent
    transactional replication
    MySQL can solve this in the future by storing replication state in tables

    Semi-synchronous replication
    Block return from commit on a master until at least one slave has acknowledged receipt of
    slave io thread acknowledges receipt after buffering the changes
    modified mysql replication protocol to support acknowledgments
    conifuration options
    where to the master uses it
    where a slave used it
    how long the maser waits for an acknowledgement

    can run a server with some semi-sync replication slaves and some regulare replication salves
    this can be worked with any storage engines that supports commit, but we only use innodb

    * This is how we guarantee to management for Zero Transaction Loss.

    Latency single stream 1ms, multi-stream 10ms. This is acceptable for us.

    The MySQL Replication Protocol

    • The current replication protocol is efficient
    • a slaves makes one request

    Replication Acknowledgment

      Slaves register as semi-sync or async at connect time
      prepend flag bytes to all replication events sent to semi-sync clients
      the master sends the flag bytes to request acknowledged for replication events that represent the end of the transaction
      the slave use the existing connection for acknowledgments

    Mirrored Binlogs
    mysql does not provide a way to maintain a copy of a master’s binlog on a replica. By copy we me a file of same name and equivalent byte for byte.
    Hierarachial replication works much better where a slave can disconnect from one replication proxy and reconnect to another with adjusting binlog offsets.
    Hot backups taken before a failover and difficult to use after a failover

    Mirrored Binlog Implementions
    Slave IO threads write their own relay log and a copy of the bin log
    all events but the rotate log event are written

    After failover, start a new binlog on new master

    Fast Failover

    Slaves use a hostname, rather then an IP
    You can’t enable the binlog dynamically (in 4.0)
    Added new SQL STATEMENTS that does
    disconnect users with SUPER privilege
    disable new connections
    enable the bin log
    enable connections from all users

    Automatic failover
    Something must decided that a master has failed
    Something must choose the new master

    Q: What keeps up from moving to 5.0?
    A: Queries don’t parse (Joins)

    Data sets, 8GB servers, 50-100GB’s