The MySQL Table Checksum, part of the MySQL Toolkit (having to be renamed soon) is an invaluable community tool for use with MySQL. Most sites or installations of any volume will use MySQL Replication for one purpose or another, be it for read scalability, backups, testing, upgrading etc.
Why is it needed?
There are two primary compelling reasons. First, MySQL replication is an asynchronous process and there is no absolute guarantee that the Master Database and the Slave Database are the same (By definition that can be different). Second, MySQL does not provide any tools relating to checking, managing, reporting differences. Luckily the community has addressed this present lack of product feature in current versions of MySQL. Special thanks to Baron who has made this happen.
Should I be concerned?
Yes, you should. While some people will not like that I’ve made this alarming comment, a lot of organizations use MySQL Replication for backups. It would be unprofessional to be ignorant to make the assumption.
How to I get it?
Download MySQL Table Checksum. It is recommended that you download all components in the toolkit as there are many valuable tools then just this one.
MySQL Table Checksum will require Perl which is generally always installed with any Linux distribution by default (For Windows you need to do yourself). You will also require the Perl Modules DBI and DBD::mysql. Use CPAN for easy installation of these modules.
These are the steps I did (NOTE: this is a specific version, you should always check for newer versions)
wget http://internap.dl.sourceforge.net/sourceforge/mysqltoolkit/mysqltoolkit-1204.tar.gz tar xvfz mysqltoolkit-1204.tar.gz cd mysqltoolkit-1204/bin ./mysql-table-checksum
If everything is installed correctly, you should see.
Usage: mysql-table-checksum [OPTION]... HOST [HOST...] Errors in command-line arguments: * No hosts specified. mysql-table-checksum checksums MySQL tables efficiently on one or more HOSTs. Each HOST is specified as a DSN and missing values are inherited from the first HOST. If you specify multiple HOSTs, the first is assumed to be the master. For more details, please use the --help option, or try 'perldoc mysql-table-checksum' for complete documentation.
How do I use it?
Start with the documentation, RTFM
perldoc mysql-table-checksum
There are a number of ways to run MySQL Table Checksum and different means of using arguments etc. This is what I do.
By Default:
./mysql-table-checksum u=root,p=sakila,h=localhost DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG xxx table1 1 localhost InnoDB NULL 1678710928 0 0 NULL NULL xxx table2 1 localhost InnoDB NULL 3023415523 0 0 NULL NULL xxx table3 1 localhost InnoDB NULL 1692517818 0 0 NULL NULL xxx table4 1 localhost InnoDB NULL 0 0 0 NULL NULL xxx table5 1 localhost InnoDB NULL 2295061143 0 0 NULL NULL xxx table6 1 localhost InnoDB NULL 2238111875 0 0 NULL NULL xxx table7 1 localhost InnoDB NULL 823770692 0 0 NULL NULL xxx table8 1 localhost InnoDB NULL 2313561225 3 0 NULL NULL xxx table9 1 localhost InnoDB NULL 3524358173 0 0 NULL NULL xxx table10 1 localhost InnoDB NULL 0 0 0 NULL NULL
The output will provide a checksum via the MySQL CHECKSUM function. As this doesn’t provide a row count, the COUNT column is NULL.
A better method is to use the –replicate function. This provides the results into a Database Table (handy for lots of things) as well as getting table counts. This requires a pre-requisite table. For this example I’ve added it to the test database schema.
$ mysql -uroot -psakila test mysql > CREATE TABLE checksum ( db char(64) NOT NULL, tbl char(64) NOT NULL, chunk int NOT NULL, boundaries char(64) NOT NULL, this_crc char(40) NOT NULL, this_cnt int NOT NULL, master_crc char(40) NULL, master_cnt int NULL, ts timestamp NOT NULL, PRIMARY KEY (db, tbl, chunk) );
My execution for this using this table
./mysql-table-checksum u=root,p=sakila,h=localhost --replicate=test.checksum DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG xxx table1 1 localhost InnoDB 2236 cdd6689dbb23ff547540561c4815c717b3d01bf3 0 NULL NULL NULL xxx table2 1 localhost InnoDB 3157 f4d538a7a83168acb2cf3374a6edc4949809e723 0 NULL NULL NULL xxx table3 1 localhost InnoDB 99 daa0d8403feb200beac5db6456e92c86de6a9b84 0 NULL NULL NULL xxx table4 1 localhost InnoDB 0 NULL 0 NULL NULL NULL xxx table5 1 localhost InnoDB 1006 118b996e10f76153b786479b15f134c08cd21b62 0 NULL NULL NULL xxx table6 1 localhost InnoDB 2499397 6a60e07f45d5980bb5de3bd75e1b2ce1e5be25b8 29 NULL NULL NULL xxx table7 1 localhost InnoDB 11879 00929c2d6b9278f585b6b7346816bb7a0e483b9e 0 NULL NULL NULL xxx table8 1 localhost InnoDB 675 e8e17f9ed15986965ff678ca0c26cd5397ea272d 0 NULL NULL NULL xxx table9 1 localhost InnoDB 1730498 373e7fb3fec77b1d0edcfb6a5e28619e9c91d5e4 25 NULL NULL NULL xxx table10 1 localhost InnoDB 8557931 7913803bf65bbcf18e679255c657dba53045b88e 99 NULL NULL NULL
The screen output is great, I actually run the following command keeping the output. See later for more info,
./mysql-table-checksum u=root,p=sakila,h=localhost > checksum.mysql4.replicate.`date +%y%m%d.%H%M` &
The table contents are a little different.
mysql> select * from checksum limit 1G *************************** 1. row *************************** db: xxx tbl: table20 chunk: 1 boundaries: 1=1 this_crc: f7c4d19f6342105f97a4b2770ee7b9094e57a6f2 this_cnt: 2236 master_crc: f7c4d19f6342105f97a4b2770ee7b9094e57a6f2 master_cnt: 2236 ts: 2007-11-08 15:37:16
What is most impressive is that in a Master/Slave environment these commands are all replicated, and a simple comparison of data in this table on each slave will confirm if your database is in sync. The docs also confirm, the structure of the tables are not compared so a slave may indeed have some changes.
Migration?
I’ve been able to use the checksum to test a MySQL 4 to MySQL 5 migration. I’ve been able to confirm via checksums as an initial verification that the data has indeed been loaded correctly.
The issue was using the log files for comparison between to instances is the TIME column value changed. As the format of the rows is spaced, it was not possible to easily cut as each schema name was a different length. A small inconvience for the benefit in the results.
I have found however that optimizing the schema in 5, e.g. INT for BIGINT and TIMESTAMP for DATETIME did cause the checksums to fail.
Conclusion
This is a quick introduction, this tool has a lot to offer and is only one in a whole toolkit.