MySQL Conference – Google

MySQL: The Real Grid Database

Introduction

  • Can’t work on performance problems until we solve the availability
  • We want MySQL to fix our problems first.

The problem

  • Deploy a DBMS for a workload with
    • too many queries
    • to many transactions
    • to much data

A well known solution

  • deploy a grid database -use many replicas to scale read performance
    -shard your data over many master to scale write performance
    -sharding is easy, resharding is hard

  • availability and manageability trump performance – make it easy to run many severs
    – unbretable aggregate perfomance

  • we describe problems that matter to us. The grid database approach

  • Deploy a large number of small servers

  • use highly redundant commodity components

  • added capacity has a low incremental cost

  • not much capacity lost when a server fails

  • support many servers with a few DBAs

Managability
Make it easy to do the tasks that must be done. Reduce, Reduce.
Make all tasks scriptable
Why does it mater, support hundreds of servers, spend time solving more interesting problems. You generally have lots of problems to solve.

Underutilize your severs
Require less maintenance
Requre less tuning
tolerate load spikes better
tolerate bad query plans better

In a Perfect World
Short running queries
uses kill mistake and runaway queries
accounts new use to many connections
query plans are good
new apps increase database workload by a small amount
only appropiate date is stored in the database

Reality

  • Long running transactions, create replication delays everywhere
  • servers with round robin DNS aliases make queries hard to find
  • applications create more connections where the database is slow
  • some storage engines use sampling to get query plan statistics
  • new applications create new database performance problems
  • applications use the database as long as rows are never deleted
  • many long running queries on replicas

Solutions

  • Improve your ability to respond because prevention is impossible
  • Need tools to make monitoring easier
  • determine what is happening across servers
  • detemine what happened in the past

Mantra

  • Monitor everything you can, and archive as long as possible. (vmstat 5 secs, iostat, mysql error logs)
  • You will need these to reconstruct failures
  • save as much as possible
  • script as much as possible

Monitoring Matters

  • Display what is happening -which table, account or statemsns caused most of the load
    -many fast queries can be as much a problem as one slow query

  • Record what happending –archivce show status counters somweere
    – query data from the archive
    — visualise data from the archivce

  • record queries that have been run — archive show processlist output (do every 30 seconds)
    — support queries on this archive

  • All of this much scale to an environment with many servers

Monitoring Tools

  • Display counters and rate change for counters
  • aggregate values over many servers
  • visualize and rang results
  • display results over time

Google mpgrep tools

New Commands
We changed mysql, three new commands
SHOW USER _STATISTICS
SHOW TABLE STATISTICS
SHOW INDEX STATISTICS

Per Account Activity
USER_STATISTICS
seconds executing commands
number of rows fetched and changed
total connections
number of select/updates/other/commits/rollback/binlog bytes written.

TABLE STATISTICS
number of rows fetched/changed

INDEX STATISTICS
display number of rows fetched per index
helps find indexes that are never used

available in code.google.com in 4.0, porting to 5.0

MySQL High Availability

  • Great options

    • Cluster
    • Replication
    • Middelware — e.g. continum
    • DRBD
  • We need some features right now

  • we are committed to innodb and mysql replication *a lot of appplicaton code works on this
    *our tools and processed support this

  • We favor commodity hardware

There are all great features but we are much more limited in what we can use.
Management want to know we don’t loose transactions, not loose some transactions.

Desired HA Functionality

  • Zero transaction loss on failures of a master

  • minimal downtime on failures of a master

  • reasonable cost in performance and dollars

  • fast and automatic failover to local or remove server

  • no changes to our programming model

    • does it support MVCC does it support long running transactions (5 mins – populate temp table then use to update another table, changing rows), 5 mins on master, causes 5 mins on slave, causes code to failover from slaves to master

  • replication and reporting are concurrent on a slave

MVCC must have update concurrent with query.

Failures happen everywhere
OS – kernal oom or panic (older 2.4 32 bit systems)
mysqld – caused also by code we added
disk, misdirected write, corrupt write (love innodb checksums)
file system – inconsisted after unplanned hardware reboot (use ext2)
server – bad RAM
lan, switch – lose
Rack – reboot
Data center – power loss, overheading, lightning, fire
People – things get killed or rebooted by mistake ( a typo can take out the wrong server, when names differ by a character or a digit)

ext2 and 4.0 are great, there are the same generation.
Trying not to use RAID, not battery backed raid etc, we try work around with software solutions. We do use RAID 0, but we also try software solution.
When we have the right HA solution, we won’t need RAID.

Mark. “Yes, Google programmers have bugs. Not me personally, it was my predecessor.”

HA Features we want in MySQL
Synchronous replication as an option
a product that watches a master and initiates a failover
archives of the master binlogs stored elsewhere
state stored in the filesytstem t obe consistent after a crash
. innodb and mysql dictionaries can get out of sync
.replicatoin state on a slave can get out of sync

We could not wait
Features we added to MySQL 4.0.26
We can do things a lot faster
. we have more developers lying around
. Our needs as specific, not a general product solution

Transactional replications for slaves
semi-synchronous replication
mirrored binlogs
fast and automated failover

Transactional Replication
Replication state on a slave is stored in files
slave sql thread commits to storage engines and then updates a file
a crash between the two can make replication state inconsistent
transactional replication
MySQL can solve this in the future by storing replication state in tables

Semi-synchronous replication
Block return from commit on a master until at least one slave has acknowledged receipt of
slave io thread acknowledges receipt after buffering the changes
modified mysql replication protocol to support acknowledgments
conifuration options
where to the master uses it
where a slave used it
how long the maser waits for an acknowledgement

can run a server with some semi-sync replication slaves and some regulare replication salves
this can be worked with any storage engines that supports commit, but we only use innodb

  • This is how we guarantee to management for Zero Transaction Loss.

Latency single stream 1ms, multi-stream 10ms. This is acceptable for us.

The MySQL Replication Protocol

  • The current replication protocol is efficient
  • a slaves makes one request

Replication Acknowledgment

Slaves register as semi-sync or async at connect time
prepend flag bytes to all replication events sent to semi-sync clients
the master sends the flag bytes to request acknowledged for replication events that represent the end of the transaction
the slave use the existing connection for acknowledgments

Mirrored Binlogs
mysql does not provide a way to maintain a copy of a master’s binlog on a replica. By copy we me a file of same name and equivalent byte for byte.
Hierarachial replication works much better where a slave can disconnect from one replication proxy and reconnect to another with adjusting binlog offsets.
Hot backups taken before a failover and difficult to use after a failover

Mirrored Binlog Implementions
Slave IO threads write their own relay log and a copy of the bin log
all events but the rotate log event are written

After failover, start a new binlog on new master

Fast Failover

Slaves use a hostname, rather then an IP
You can’t enable the binlog dynamically (in 4.0)
Added new SQL STATEMENTS that does
disconnect users with SUPER privilege
disable new connections
enable the bin log
enable connections from all users

Automatic failover
Something must decided that a master has failed
Something must choose the new master

Q: What keeps up from moving to 5.0?
A: Queries don’t parse (Joins)

Data sets, 8GB servers, 50-100GB’s

Tagged with: Databases General MySQL MySQL Conference &Amp; Expo 2007

Related Posts

Why Being Proactive Is Always a Winning Approach

Many companies manage production infrastructure using a reactive model rather than a proactive one. Organizations typically react to warnings and alerts, then implement corrective actions in response. While some companies have well-designed architectural patterns—such as feature flags and rate limiting—that can quickly mitigate the impact of issues, these are merely temporary solutions, not resolutions.

Read more

AWS CLI support for Aurora DSQL and S3 Tables

If you were following the AWS Re:invent keynote yesterday there were several data specific announcements including Aurora DSQL and S3 Tables . Wanting to check them out, I downloaded the latest AWS CLI 2.

Read more

Migrating off of WordPress - A Simplified Stack

The ongoing drama between Wordpress v WP Engine continues to cross my reading list, but I have permanently removed WordPress from my website. I have finally transitioned away from the complex Linux/Apache/MySQL/PHP (LAMP) stack required for self-hosting WordPress on my professional website.

Read more