Introduction
- Can’t work on performance problems until we solve the availability
- We want MySQL to fix our problems first.
The problem
- Deploy a DBMS for a workload with
- too many queries
- to many transactions
- to much data
A well known solution
-
deploy a grid database -use many replicas to scale read performance
-shard your data over many master to scale write performance
-sharding is easy, resharding is hard -
availability and manageability trump performance – make it easy to run many severs
– unbretable aggregate perfomance -
we describe problems that matter to us. The grid database approach
-
Deploy a large number of small servers
-
use highly redundant commodity components
-
added capacity has a low incremental cost
-
not much capacity lost when a server fails
-
support many servers with a few DBAs
Managability
Make it easy to do the tasks that must be done. Reduce, Reduce.
Make all tasks scriptable
Why does it mater, support hundreds of servers, spend time solving more interesting problems. You generally have lots of problems to solve.
Underutilize your severs
Require less maintenance
Requre less tuning
tolerate load spikes better
tolerate bad query plans better
In a Perfect World
Short running queries
uses kill mistake and runaway queries
accounts new use to many connections
query plans are good
new apps increase database workload by a small amount
only appropiate date is stored in the database
Reality
- Long running transactions, create replication delays everywhere
- servers with round robin DNS aliases make queries hard to find
- applications create more connections where the database is slow
- some storage engines use sampling to get query plan statistics
- new applications create new database performance problems
- applications use the database as long as rows are never deleted
- many long running queries on replicas
Solutions
- Improve your ability to respond because prevention is impossible
- Need tools to make monitoring easier
- determine what is happening across servers
- detemine what happened in the past
Mantra
- Monitor everything you can, and archive as long as possible. (vmstat 5 secs, iostat, mysql error logs)
- You will need these to reconstruct failures
- save as much as possible
- script as much as possible
Monitoring Matters
-
Display what is happening -which table, account or statemsns caused most of the load
-many fast queries can be as much a problem as one slow query -
Record what happending –archivce show status counters somweere
– query data from the archive
— visualise data from the archivce -
record queries that have been run — archive show processlist output (do every 30 seconds)
— support queries on this archive -
All of this much scale to an environment with many servers
Monitoring Tools
- Display counters and rate change for counters
- aggregate values over many servers
- visualize and rang results
- display results over time
Google mpgrep tools
New Commands
We changed mysql, three new commands
SHOW USER _STATISTICS
SHOW TABLE STATISTICS
SHOW INDEX STATISTICS
Per Account Activity
USER_STATISTICS
seconds executing commands
number of rows fetched and changed
total connections
number of select/updates/other/commits/rollback/binlog bytes written.
TABLE STATISTICS
number of rows fetched/changed
INDEX STATISTICS
display number of rows fetched per index
helps find indexes that are never used
available in code.google.com in 4.0, porting to 5.0
MySQL High Availability
-
Great options
- Cluster
- Replication
- Middelware — e.g. continum
- DRBD
-
We need some features right now
-
we are committed to innodb and mysql replication *a lot of appplicaton code works on this
*our tools and processed support this -
We favor commodity hardware
There are all great features but we are much more limited in what we can use.
Management want to know we don’t loose transactions, not loose some transactions.
Desired HA Functionality
-
Zero transaction loss on failures of a master
-
minimal downtime on failures of a master
-
reasonable cost in performance and dollars
-
fast and automatic failover to local or remove server
-
no changes to our programming model
-
does it support MVCC does it support long running transactions (5 mins – populate temp table then use to update another table, changing rows), 5 mins on master, causes 5 mins on slave, causes code to failover from slaves to master
-
-
replication and reporting are concurrent on a slave
MVCC must have update concurrent with query.
Failures happen everywhere
OS – kernal oom or panic (older 2.4 32 bit systems)
mysqld – caused also by code we added
disk, misdirected write, corrupt write (love innodb checksums)
file system – inconsisted after unplanned hardware reboot (use ext2)
server – bad RAM
lan, switch – lose
Rack – reboot
Data center – power loss, overheading, lightning, fire
People – things get killed or rebooted by mistake ( a typo can take out the wrong server, when names differ by a character or a digit)
ext2 and 4.0 are great, there are the same generation.
Trying not to use RAID, not battery backed raid etc, we try work around with software solutions. We do use RAID 0, but we also try software solution.
When we have the right HA solution, we won’t need RAID.
Mark. “Yes, Google programmers have bugs. Not me personally, it was my predecessor.”
HA Features we want in MySQL
Synchronous replication as an option
a product that watches a master and initiates a failover
archives of the master binlogs stored elsewhere
state stored in the filesytstem t obe consistent after a crash
. innodb and mysql dictionaries can get out of sync
.replicatoin state on a slave can get out of sync
We could not wait
Features we added to MySQL 4.0.26
We can do things a lot faster
. we have more developers lying around
. Our needs as specific, not a general product solution
Transactional replications for slaves
semi-synchronous replication
mirrored binlogs
fast and automated failover
Transactional Replication
Replication state on a slave is stored in files
slave sql thread commits to storage engines and then updates a file
a crash between the two can make replication state inconsistent
transactional replication
MySQL can solve this in the future by storing replication state in tables
Semi-synchronous replication
Block return from commit on a master until at least one slave has acknowledged receipt of
slave io thread acknowledges receipt after buffering the changes
modified mysql replication protocol to support acknowledgments
conifuration options
where to the master uses it
where a slave used it
how long the maser waits for an acknowledgement
can run a server with some semi-sync replication slaves and some regulare replication salves
this can be worked with any storage engines that supports commit, but we only use innodb
- This is how we guarantee to management for Zero Transaction Loss.
Latency single stream 1ms, multi-stream 10ms. This is acceptable for us.
The MySQL Replication Protocol
- The current replication protocol is efficient
- a slaves makes one request
Replication Acknowledgment
Slaves register as semi-sync or async at connect time
prepend flag bytes to all replication events sent to semi-sync clients
the master sends the flag bytes to request acknowledged for replication events that represent the end of the transaction
the slave use the existing connection for acknowledgments
Mirrored Binlogs
mysql does not provide a way to maintain a copy of a master’s binlog on a replica. By copy we me a file of same name and equivalent byte for byte.
Hierarachial replication works much better where a slave can disconnect from one replication proxy and reconnect to another with adjusting binlog offsets.
Hot backups taken before a failover and difficult to use after a failover
Mirrored Binlog Implementions
Slave IO threads write their own relay log and a copy of the bin log
all events but the rotate log event are written
After failover, start a new binlog on new master
Fast Failover
Slaves use a hostname, rather then an IP
You can’t enable the binlog dynamically (in 4.0)
Added new SQL STATEMENTS that does
disconnect users with SUPER privilege
disable new connections
enable the bin log
enable connections from all users
Automatic failover
Something must decided that a master has failed
Something must choose the new master
Q: What keeps up from moving to 5.0?
A: Queries don’t parse (Joins)
Data sets, 8GB servers, 50-100GB’s