Q: Does MySQL support ACID? A: Yes

I was recently asked this question by an experienced academic at the NY Oracle Users Group event I presented at.

Does MySQL support ACID? (ACID is a set of properties essential for a relational database to perform transactions, i.e. a discrete unit of work.)

Yes, MySQL fully supports ACID, that is Atomicity, Consistency, Isolation and Duration. (*)

This is contrary to the first Google response found searching this question which for reference states “The standard table handler for MySQL is not ACID compliant because it doesn’t support consistency, isolation, or durability”.

The question is however not a simple Yes/No because it depends on timing within the MySQL product’s lifecycle and the version/configuration used in deployment. What is also *painfully* necessary is to understand why this question would even be asked of the most popular open source relational database.

MySQL has a unique characteristic of supporting multiple storage engines. These engines enabling varying ways of storing and retrieving data via the SQL interface in MySQL and have varying features for supporting transactions, locking, index strategies, compression etc. The problem is that the default storage engine from version 3.23 (1999) to 5.1 (2010) was MyISAM, a non-transactional engine, and hence the first point of confusion.

The InnoDB storage engine has been included and supported from MySQL 3.23. This is a transactional engine supporting ACID properties. However, not all of the default settings in the various MySQL versions have fully meet all ACID needs, specifically the durability of data. This is the second point of confusion. Overtime other transactional storage engines in MySQL have come and gone. InnoDB has been there since the start so there is no excuse to not write applications to fully support transactions. The custodianship of Oracle Corporation starting in 2010 quickly corrected this *flaw* by ensuring the default storage engine in MySQL 5.5 is InnoDB. But the damage to the ecosystem that uses MySQL, that is many thousands of open source projects, and the resources that work with MySQL has been done. Recently working on a MySQL 5.5 production system in 2016, the default engine was specifically defined in the configuration defined as MyISAM, and some (but not all tables) were defined using MyISAM. This is a further conversation as to why, is this a upgrade problem? Are there legacy dependencies with applications? Are the decision makers and developers simply not aware of the configuration? Or, are developers simply not comfortable with transactions?

Like other anti-reasonable MySQL defaults the unaware administrator or developer could consider MySQL as supporting ACID properties, however until detailed testing with concurrency and error conditions not realize the impact of poor configuration settings.

The damage of having a non-transactional storage engine as the default for over a decade has created a generation of professionals and applications that abuses one of the primary usages of a relational database, that is a transaction, i.e. to product a unit for work that is all or nothing. Popular open source projects such as WordPress, Drupal and hundreds more have for a long time not supported transactions or used InnoDB. Mediawiki was at least one popular open source project that was proactive towards InnoDB and transaction usage. The millions of plugins, products and startups that build on these technologies have the same flaws.

Further confusion arises when an application uses InnoDB tables but does not use transactions, or the application abuses transactions, for example 3 different transactions that should really be 1.

While newer versions of MySQL 5.6 and 5.7 improve default configurations, until these versions a more commonly implemented non-transactional use in a relational database will remain. A recent Effective MySQL NYC Meetup survey showed that installations of version 5.0 still exist, and that few have a policy for a regular upgrade cadence.

What is FTS_BEING_DELETED.ibd

I currently have on a MySQL 5.6 database using innodb_file_per_table the following individual tablespace file.

schema/FTS_00000000000001bb_BEING_DELETED.ibd

The schema is all InnoDB tables, and there ARE NO Full Text Indexes. I cannot comment on if a developer has tried to create one previously.
I am none the wiser in explaining the ongoing use of these files, or if it can be/should be deleted.

On closer inspection there are infact a number of FTS files.

$ ls -al FTS*
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001bb_BEING_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001bb_BEING_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:26 FTS_00000000000001bb_CONFIG.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001bb_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:00 FTS_00000000000001bb_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_BEING_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:26 FTS_00000000000001c7_BEING_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001c7_CONFIG.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_DELETED.ibd

Any MySQL gurus with knowledge to share, and for the benefit of others that Internet search at a later time.

Related articles included Overview and Getting Started with InnoDB FTS and Difference between InnoDB FTS and MyISAM FTS but do not mention file specifics.

The article InnoDB Full-text Search in MySQL 5.6 (part 1) provides more insight that these files remain even if a full text index was created and has since being removed. It is not clear from the filename which tables these files relate to.

SQL_MODE and storage engines

I was again reminded why setting SQL_MODE is so important in any new MySQL environment. While performing benchmark tests on parallel backup features with a common InnoDB tablespace and per file tablespace, I inadvertently missed an important step in the data migration. The result was the subsequent test that performed data population worked without any issues however there was no data in any InnoDB tables.

These are the steps used in the migration of InnoDB tables from a common tablespace model to a per-table tablespace model.

  1. Dump all InnoDB tables
  2. Drop all InnoDB tables
  3. Shutdown MySQL
  4. Change the my.cnf to include innodb-file-per-table
  5. Remove the InnoDB ibdata1 tablespace file
  6. Remove the InnoDB transactional log files
  7. Start MySQL
  8. Verify the error log
  9. Create and load new InnoDB tables

However, step 6 was not performed correctly due to a sudo+shell wildcard issue. The result was MySQL started, and tables were subsequently created incorrectly. What should have happened was:

mysql> CREATE TABLE `album` (
    ->   `album_id` int(10) unsigned NOT NULL,
    ->   `artist_id` int(10) unsigned NOT NULL,
    ->   `album_type_id` int(10) unsigned NOT NULL,
    ->   `name` varchar(255) NOT NULL,
    ->   `first_released` year(4) NOT NULL,
    ->   `country_id` smallint(5) unsigned DEFAULT NULL,
    ->   PRIMARY KEY (`album_id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ERROR 1286 (42000): Unknown table engine 'InnoDB'

However, because by default MySQL will fallback to the legacy default of MyISAM, no actual error occurred. In order for this to produce an error, an appropriate SQL_MODE is necessary.

mysql> SET GLOBAL sql_mode='NO_ENGINE_SUBSTITUTION';

A check of the MySQL error log shows why InnoDB was not available.

120309  0:59:36  InnoDB: Starting shutdown...
120309  0:59:40  InnoDB: Shutdown completed; log sequence number 0 1087119693
120309  0:59:40 [Note] /usr/sbin/mysqld: Shutdown complete

120309  1:00:16 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-10-190-238-14-bin' to avoid this problem.
120309  1:00:16 [Note] Plugin 'FEDERATED' is disabled.
120309  1:00:16  InnoDB: Initializing buffer pool, size = 500.0M
120309  1:00:16  InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
120309  1:00:16  InnoDB: Setting file ./ibdata1 size to 64 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Error: all log files must be created at the same time.
InnoDB: All log files must be created also in database creation.
InnoDB: If you want bigger or smaller log files, shut down the
InnoDB: database and make sure there were no errors in shutdown.
InnoDB: Then delete the existing log files. Edit the .cnf file
InnoDB: and start the database again.
120309  1:00:17 [ERROR] Plugin 'InnoDB' init function returned error.
120309  1:00:17 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
120309  1:00:17 [Note] Event Scheduler: Loaded 0 events
120309  1:00:17 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.58-1ubuntu1-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)

NOTE: This was performed on Ubuntu using the standard distro MySQL version of MySQL 5.1.

As previously mentioned, SQL_MODE may not be perfect, however what features do exist warrant correctly configuration your MySQL environment not to use the default.

More Information.

Leveraging the InnoDB Plugin

Beginning with MySQL 5.1 as an additional plugin and included by default in MySQL 5.5 the InnoDB plugin includes many performance improvements. To leverage the support of new file formats however a very important setting is necessary.

#my.cnf
[mysqld]
innodb_file_per_table

The use of innodb_file_per_table with an existing system or during an upgrade to 5.1 or 5.5 requires a complete reload of your database to use effectively. In summary.

  • Backup all InnoDB tables via mysqldump
  • Drop InnoDB tables
  • Verify InnoDB not used
  • Stop MySQL
  • Enable innodb_file_per_table & simplified innodb_data_file_path (if applicable)
  • Remove ibdata? files
  • Start MySQL
  • Create Tables
  • Reload Data
  • Verify InnoDB Operation
    • The primary reason is we are moving from using a common tablespace to a tablespace per table. InnoDB wil not shrink the common tablespace so this process is necessary in order to purge the diskspace currently being used. You should also reduce your innodb_data_file_path options if specified. For example if currently set to :

      innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
      

      I would suggest you change to

      innodb_data_file_path = ibdata1:100M:autoextend
      

      InnoDB still requires this common tablespace, however now each table has it’s own disk file the volume required is signficantly less.

Five reasons to upgrade to MySQL 5.5

Updated Nov 2011. Check out my Reasons to use MySQL 5.5 Presentation for more in-depth information about installing/configuring and using MySQL 5.5

I have been looking forward to the general availability (GA) release of MySQL 5.5 since is was publically announced in September that we would see this in 2010. While I already have a production client with 5.5.7rc, the badge of general availability is a great way to promote why environments should consider moving to using MySQL 5.5. Here is my quick short list of why I’d promote moving to MySQL 5.5.

1. Improved integration

The first significant improvement is that InnoDB is now again firmly a default included storage engine. The InnoDB plugin 1.1.x is now the builtin version of the engine, not a plugin version. Also the 1.1.x version has continued improvements over the 1.0.x version available as an included but not enabled plugin in current MySQL 5.1.x versions. Removing the complexity for end users over the choice of InnoDB and the necessary configuration changes is a great simplification. The introduction in the InnoDB plugin 1.0.x of a new file format (known as Barracuda) and additional new row formats such as dynamic and compressed should also be considered for improved I/O performance. Greater eduction and customer comparison results are still needed here.

2. Improved performance

“Improved scalability on multi-core CPUs” is the tag line Oracle touts and while the marketing graphs do show improved performance with huge numbers of threads, it is the continued work at addressing the support for large cores that is the key to have MySQL perform on newer H/W. MySQL was always known as the software that runs on low cost commodity H/W, however it also needs to work well on new hardware architectures.

InnoDB has many improvements, you can see a long list at 13.7.7. Performance and Scalability Enhancements. I have seen an upgrade from 5.0 to 5.5 significantly reduce contention of queries. The help of multiple buffer pool instances, multiple rollback segments, additional threads and functional mutexes have all reduced high throughput mutex contention.

3. Improved recoverability

While I have yet to use Semisynchronous replication in a production situation I see this as a continued progression to improving the recoverability and resilience of MySQL. This was work first seen by the Google Patch in MySQL 5.0 several years ago. The lack of a single unbreakable backup strategy is the number 1 overlooked feature in MySQL as reported by customer advisory groups in the past. The move to making InnoDB the default storage engine is another small step to improved recoverability in a default MySQL setting.

4. Improved instrumentation

A lack of detailed instrumentation rates number 2 on my list of areas of improvement needed with MySQL. With 5.5 there are more continued improvements. While some of these are very technical for high end analysis (e.g. the performance_schema), simple improvements in common instrumentation can help you identify triggers for poor performance. There is a lot more information available in the INFORMATION_SCHEMA. Here is a great example via SHOW PROCESSLIST that shows a 4 second lock in the Query Cache (QC) that until now was not possible to summarize easily what happens in the Query Cache black box.

... | Query   |    4 | Waiting for query cache lock | REPLACE INTO statistics_w SET IP = ...

Identification is the first necessary step to continued improvement.

My work with the PERFORMANCE_SCHEMA has been limited to experimenting such as described by Tracking IO with performance_schema however a recent discussion with Mark Leith highlighted that with the performance_schema it will be possible to determine the exact size of disk based temporary tables created by the MySQL kernel for example.

5. Improved development methodology

We have heard a lot of noise from a vocal few that Oracle will not do right by MySQL, the product, the community, the OEM providers. While I believe it is too early to tell the long term plans of MySQL under Oracle, I believe that action speaks louder than words. Oracle has made a number of commitments publically, Oracle are investing in resources to improve the development of the product and It’s clear the targeting of the Microsoft OS highlights MySQL has a future as a SQL Server competitor. Oracle brings a great wealth of experience, resources and processes and hopefully that will be invested into MySQL.

There is a good list of What Is New in MySQL 5.5 as part of the current development, features I’ve not mention include SIGNAL/RESIGNAL, change in the default storage engine, partitioning additions and pluggable MySQL authentication capability.

The MySQL documentation team also do a great job. An example of new work is a great comparison of Options and Variables from all 5.x versions shown on a single page for comparison.

The way forward with MySQL

Oracle as steward of MySQL is here to stay. While people still rumble about it, everybody has to get over it. For the end consumer little will change, for clients that use MySQL Enterprise Support that are affected by the changes in policy, many other viable support options exist. What is unknown and my single point of issue with the EU 10 point Statement is what will happen with OEM providers. This affects the financial viability of a number of providers and while I’m not directly involved I do not want to see this good work lost to the MySQL community.

Points of uncertainty include what features will be developed next, will they be pay only features, will they be available to enterprise customers and not community customers, what is the frequency of community releases etc. These still need good PR from Oracle for the MySQL community. For those concerned, Oracle has made continued investment in acquired RDBMS products including BerkeleyDB and InnoDB and RDB. Oracle RDB for example was acquired in 2004 and is still being actively developed and supported. As long as MySQL is profitable, we will see continued releases.

About the Author

Ronald Bradford is a well respected expert in the MySQL community. Ronald is internationally recognized as an Oracle ACE Director in MySQL, the highest industry recognition and is also a published author of Expert PHP and MySQL. He is available for MySQL consulting work now.

MySQL 5.5 and transaction management

Announced at MySQL Sunday was the Release Candidate edition of MySQL 5.5.6. Also noted by Geert where he points out the default storage engine is now InnoDB.

However, for those from a background other then MySQL there is still a gotcha.

mysql> show global variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+

Unlike Oracle for example, the default autocommit is on.

Doing some other boundary conditions, it is no longer possible to disable InnoDB on startup which you would of course expect.

$ bin/mysqld_safe --skip-innodb &
$ tail error.log

101003 15:33:32 [Note] Plugin 'InnoDB' is disabled.
101003 15:33:32 [ERROR] Unknown/unsupported storage engine: InnoDB
101003 15:33:32 [ERROR] Aborting

MyISAM however can’t be removed and can’t be disabled. This is a question I’ve been asked by Oracle resources.

101003 15:34:55 [ERROR] /Users/rbradfor/mysql/mysql-5.5.6-rc-osx10.5-x86_64/bin/mysqld: ambiguous option '--skip-myisam' (--skip-myisam-block-size)
101003 15:34:55 [ERROR] Parsing options for plugin 'MyISAM' failed.
101003 15:34:55 [ERROR] Failed to initialize plugins.
101003 15:34:55 [ERROR] Aborting
mysql> select table_schema, table_name from information_schema.tables where engine='myisam';
+--------------------+---------------------------+
| table_schema       | table_name                |
+--------------------+---------------------------+
| information_schema | COLUMNS                   |
| information_schema | EVENTS                    |
| information_schema | PARAMETERS                |
| information_schema | PARTITIONS                |
| information_schema | PLUGINS                   |
| information_schema | PROCESSLIST               |
| information_schema | ROUTINES                  |
| information_schema | TRIGGERS                  |
| information_schema | VIEWS                     |
| mysql              | columns_priv              |
| mysql              | db                        |
| mysql              | event                     |
| mysql              | func                      |
| mysql              | help_category             |
| mysql              | help_keyword              |
| mysql              | help_relation             |
| mysql              | help_topic                |
| mysql              | host                      |
| mysql              | ndb_binlog_index          |
| mysql              | plugin                    |
| mysql              | proc                      |
| mysql              | procs_priv                |
| mysql              | servers                   |
| mysql              | tables_priv               |
| mysql              | time_zone                 |
| mysql              | time_zone_leap_second     |
| mysql              | time_zone_name            |
| mysql              | time_zone_transition      |
| mysql              | time_zone_transition_type |
| mysql              | user                      |
+--------------------+---------------------------+
30 rows in set (0.06 sec)

10x Performance Improvements in MySQL – A Case Study

The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.

The end result was a page load improvement from 700+ms load time to a a consistent 60ms.

Calculating your database size

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb.

This exact query *provoked* a crash:

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.

Understanding Innodb Transaction Isolation

The MySQL Innodb storage engine provides ACID compliance, the ‘I’ being isolation. There are four states defined in MySQL with the tx_isolation system variable, READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ and SERIALIZABLE.

Generally MySQL installations do not modify the default value of tx_isolation = REPEATABLE-READ, however I have seen with a number of clients the default value has been changed to READ-COMMITTED. When I attempt to identify the reason why, I have always received the same reason. Oracle uses a default transaction isolation of READ-COMMITTED. See how Oracle Manages Data Concurrency and Consistency for more information.

However, while the literal string is the same, the actual implementation of READ-COMMITTED in Oracle more closely represents the REPEATABLE-READ in MySQL.

The following demonstrates what you can expect to see between the operation of these two modes.

REPEATABLE-READ
Session 1 Session 2
DROP TABLE IF EXISTS transaction_test;
CREATE TABLE transaction_test(
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(20) NOT NULL,
  created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
) ENGINE=InnoDB DEFAULT CHARSET latin1;

INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation;
+-----------------------+------------------------+
| @@global.tx_isolation | @@session.tx_isolation |
+-----------------------+------------------------+
| REPEATABLE-READ       | REPEATABLE-READ        |
+-----------------------+------------------------+
START TRANSACTION;
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-21 00:19:43 |
|  2 | b   | 2009-09-21 00:19:43 |
|  3 | c   | 2009-09-21 00:19:43 |
+----+-----+---------------------+
SELECT SLEEP(20);
START TRANSACTION;
INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z');
SELECT * FROM transaction_test;
+----+-----------------+---------------------+
| id | val             | created             |
+----+-----------------+---------------------+
|  1 | a               | 2009-09-21 00:19:43 |
|  2 | b               | 2009-09-21 00:19:43 |
|  3 | c               | 2009-09-21 00:19:43 |
|  4 | x               | 2009-09-21 00:21:00 |
|  5 | y               | 2009-09-21 00:21:00 |
|  6 | z               | 2009-09-21 00:21:00 |
+----+-----------------+---------------------+
COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation);
SELECT * FROM transaction_test;
+----+-----------------+---------------------+
| id | val             | created             |
+----+-----------------+---------------------+
|  1 | a               | 2009-09-21 00:19:43 |
|  2 | b               | 2009-09-21 00:19:43 |
|  3 | c               | 2009-09-21 00:19:43 |
|  7 | REPEATABLE-READ | 2009-09-21 00:21:01 |
+----+-----------------+---------------------+

COMMIT;
READ-COMMITTED
SET SESSION tx_isolation='READ-COMMITTED';
TRUNCATE TABLE transaction_test;
INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation;
+-----------------------+------------------------+
| @@global.tx_isolation | @@session.tx_isolation |
+-----------------------+------------------------+
| REPEATABLE-READ       | READ-COMMITTED         |
+-----------------------+------------------------+
START TRANSACTION;
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-23 22:49:44 |
|  2 | b   | 2009-09-23 22:49:44 |
|  3 | c   | 2009-09-23 22:49:44 |
+----+-----+---------------------+
SELECT SLEEP(20);
START TRANSACTION;
INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z');
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-23 22:49:44 |
|  2 | b   | 2009-09-23 22:49:44 |
|  3 | c   | 2009-09-23 22:49:44 |
|  4 | x   | 2009-09-23 22:52:38 |
|  5 | y   | 2009-09-23 22:52:38 |
|  6 | z   | 2009-09-23 22:52:38 |
+----+-----+---------------------+
COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation);
SELECT * FROM transaction_test;
+----+----------------+---------------------+
| id | val            | created             |
+----+----------------+---------------------+
|  1 | a              | 2009-09-23 22:49:44 |
|  2 | b              | 2009-09-23 22:49:44 |
|  3 | c              | 2009-09-23 22:49:44 |
|  4 | x              | 2009-09-23 22:52:38 |
|  5 | y              | 2009-09-23 22:52:38 |
|  6 | z              | 2009-09-23 22:52:38 |
|  7 | READ-COMMITTED | 2009-09-23 22:56:10 |
+----+----------------+---------------------+

COMMIT;

As you can see, under READ-COMMITTED your result set can change during the transaction. However, how practical is this example in an actual application.

In what circumstances would you consider using READ-COMMITTED? Is there an improvement in locking contention that can lead to less deadlock contention? What is the overhead in other areas?

Harrison writes in My Favorite New Feature of MySQL 5.1: Less InnoDB Locking that best locking out of InnoDB in 5.1 will be with READ-COMMITTED. Note that as mentioned, the impact is a change in replication mode that may have a more dramatic effect.

Heikki Tuuri comments in Understanding InnoDB MVCC that using READ-COMMITTED should help in a specific locking issue.

I am still unclear of the specific benefits in general terms for all environments. Review of the The InnoDB Transaction Model and Locking and specifically Consistent Nonlocking Reads provides “With READ COMMITTED isolation level, each consistent read within a transaction sets and reads its own fresh snapshot” which indicates that for certain workloads the reduced locking is a benefit.

Every environment is different and ultimately the actual transaction statements will determine what options and benefits work best.

My favorite MySQL data type – DECIMAL(31,0)

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed a clear improvement in Innodb buffer pool reads and hit ratio.

Today’s lesson as described in my 2008 conference presentation Top 20 design tips for data architects is, choose the right integer data type for your data.

InnoDB I_S.tables.table_rows out by a factor of 100x

I’ve always believed that the MySQL Information_schema.tables.table_rows figure for Innodb tables to be while approximate, approximately accurate.

Today I found that the figures varied on one table from 10x to 100x wrong.

Before performing an ALTER I always verify sizes for reference.

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| table_s | table_name | engine | row_format | table_rows | avg_row_length | total_mb   | data_mb    | index_mb  | today      |
+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |     208993 |           7475 |  1491.5312 |  1490.0156 |    1.5156 | 2009-09-09 |

mysql> alter table s_a modify col VARCHAR(255);
Query OK, 23471 rows affected (4 min 26.23 sec)

After

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |    2407063 |            629 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

I was so caught out by this. The table reported 200k rows, but the alter returned 23k, that’s like 10x out.
I ran my query again, and the second time I got.

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |      21813 |          69487 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

This is closer to what I’d expected, 22k verses 23k.

I have to say, while I have always treated the data and index size as accurate, I now how little confidence in the table_rows any more.

Setting up sysbench with MySQL & Drizzle

Sysbench is a open source product that enables you to perform various system benchmarks including databases. Drizzles performs regression testing of every trunk revision with a branched version of sysbench within Drizzle Automation.

A pending branch https://code.launchpad.net/~elambert/sysbench/trunk_drizzle_merge by Eric Lambert now enables side by side testing with MySQL and Drizzle. On a system running MySQL and Drizzle I was able install this sysbench branch with the following commands.

cd bzr
bzr branch lp:~elambert/sysbench/trunk_drizzle_merge
cd trunk_drizzle_merge/
./autogen.sh
./configure
make
sudo make install

Running the default lua tests supplied required me to ensure drizzle was in my path and that I created the ‘sbtest’ schema. I’ll be sure it add that checking to my future developed benchmark scripts.

$ cd sysbench/tests/db
$ sysbench --test=insert.lua --db_driver=drizzle prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

FATAL: unable to connect to Drizzle server: 23
FATAL: error 0: Unknown database 'sbtest'
FATAL: failed to execute function `prepare': insert.lua:7: Failed to connect to the database
$ drizzle -e "create schema sbtest"
$ sysbench --test=insert.lua --db_driver=drizzle prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Creating table 'sbtest'...

And running produces the following results.

$ sysbench --num-threads=1 --test=insert.lua --db_driver=drizzle run
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Threads started!

OLTP test statistics:
    queries performed:
        read:                            0
        write:                           10000
        other:                           0
        total:                           10000
    transactions:                        0      (0.00 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 10000  (879.68 per sec.)
    other operations:                    0      (0.00 per sec.)

Test execution summary:
    total time:                          11.3678s
    total number of events:              10000
    total time taken by event execution: 11.3354s
    per-request statistics:
         min:                                  0.32ms
         avg:                                  1.13ms
         max:                                 68.74ms
         approx.  95 percentile:               2.41ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   11.3354/0.0

Rerunning the prepare also lacked some auto cleanup to allow for automated re-running.

$ sysbench --test=insert.lua --db_driver=drizzle prepare
Creating table 'sbtest'...
ALERT: Drizzle Query Failed: 1050:Table 'sbtest' already exists
FATAL: failed to execute function `prepare': insert.lua:57: Database query failed

For MySQL

$ sysbench --test=insert.lua --db_driver=mysql --mysql_table_engine=innodb prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Creating table 'sbtest'...

Unfortunately this doesn’t actually create the table in the right storage engine, I had to hack the code to ensure I was comparing InnoDB in each test.

$ sysbench --num-threads=1 --test=insert.l
ua --db_driver=mysql run
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Threads started!

OLTP test statistics:
    queries performed:
        read:                            0
        write:                           10000
        other:                           0
        total:                           10000
    transactions:                        0      (0.00 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 10000  (897.67 per sec.)
    other operations:                    0      (0.00 per sec.)

Test execution summary:
    total time:                          11.1399s
    total number of events:              10000
    total time taken by event execution: 11.1084s
    per-request statistics:
         min:                                  0.27ms
         avg:                                  1.11ms
         max:                                252.63ms
         approx.  95 percentile:               2.48ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   11.1084/0.00


Armed with a working environment I can now write some more realistic production like tests in Lua.

Understanding InnoDB MVCC

Multi versioning concurrency control (MVCC) is a database design theory that enables relational databases to support concurrency, or more simply multiple user access to common data in your database.

In MySQL the InnoDB storage engine provides MVCC, row-level locking, full ACID compliance as well as other features.

In my understanding of database theory, access to modify independent sections of unique data (e.g. UPDATE) under MVCC should fully support concurrency. I have however experienced a level of exclusive locking under Innodb.

I wanted to clearly document this situation so I could then seek the advice of the guru’s in InnoDB Internals such as Mark Callaghan, Percona and the Innodb development team for example. I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The situation

Single table updates on a range of rows by primary keys are being blocked by other similar operations on the same table yet the set of data for each query is effectively unique.

Reproducing the problem

$ mysql -u -p test
drop table if exists numbers;
create table numbers (id int unsigned not null primary key, f1 int not null, f2 int not null) engine=innodb;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers(in p_max int)
deterministic
begin
  declare counter int default 1;
  truncate table numbers;
  insert into numbers values (1,1,1);
  while counter < p_max
  do
      insert into numbers (id,f1, f2)
          select id + counter, counter + f1, id - f2
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers(2000000);

In two separate threads I execute similar statements on different ranges of the primary key.

--thread 1
start transaction;
update numbers
set f2 = f2 +200
where id between 1 and 1000000;
commit;

--thread 2
start transaction;
update numbers
set f2 = f2 +300
where id between 1000001 and 2000000;
commit;

And in a third thread we can monitor the transactions inside Innodb.

-- thread 3
show engine innodb statusG

During the update process, the following error can be observed.

---TRANSACTION 0 7741, ACTIVE 20 sec, process no 2159, OS thread id 1188534592 fetching rows, thread declared inside InnoDB 275
mysql tables in use 1, locked 1
2007 lock struct(s), heap size 292848, 1001862 row lock(s), undo log entries 999858
MySQL thread id 918563, query id 16802707 localhost root Updating
update numbers set f2 = f2 +300 where id between 1000001 and 2000000
---TRANSACTION 0 7740, ACTIVE 21 sec, process no 2159, OS thread id 1178949952 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 918564, query id 16802694 localhost root Updating
update numbers set f2 = f2 +200 where id between 1 and 1000000
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16052 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 7740 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000001e3d; asc      =;; 2: len 7; hex 00000033630110; asc    3c  ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 80050584; asc     ;;

The problem has been reproduced on various different MySQL versions and different hardware including, 5.0.67, 5.0.81 and 5.1.25.

What is causing the problem?

  • Is it a bug? No.
  • Is my understanding of MVCC theory incorrect? Maybe.
  • Is it InnoDB’s implementation of MVCC incomplete. No. Heikki and his team have a far greater understanding of data theory then most database experts
  • Is it the MySQL kernel interfering with the InnoDB storage engine? No, this is not possible as the MySQL kernel has passed the queries to InnoDB, and InnoDB is handling these threads independently.
  • Is it a gap locking issue, a problem that can cause deadlocks when inserting data in a high concurrency situation? Not likely as the data is inserted in primary key, i.e. auto increment order, and there are no gaps.
  • Is it related to InnoDB access method via the primary key, where InnoDB uses a clustered index to store the primary key. Given the data is physically in primary key order, this clustered index would in theory reduce possible locking.
  • Is it related to the page size of indexes, e.g. the 16k index page, effectively causing a page level lock for overlapping index data? My understanding is that InnoDB supports row level locking, and MVCC should cater for this.
  • Is is related to the ranges of primary keys being adjacent, i.e. 1,000,000 and 1,000,001. Not likely as I can reproduce the problem not using adjacent ranges.
  • Is it some weird interaction to managing the undo space of the transactions in the Innodb buffer pool?
  • Is it some weird interaction with marking/locking the dirty pages in the Innodb buffer pool of modified pages?
  • Is it some weird interaction with logging the successful Innodb transaction to the redo logs.

I’ve listed these points more as an information exercise for all those that have less understanding of the problem to see my though process.

Additional testing can definitely be performed. Additional analysis of InnoDB internals with SHOW ENGINE INNODB STATUS such as spin waits, OS waits (context switches), looking at Mutexes with SHOW ENGINE INNODB MUTEX can be undertaken.

My hope and request is that this has been observed by others and that a simple hybrid solution exists.