Understanding InnoDB MVCC

Multi versioning concurrency control (MVCC) is a database design theory that enables relational databases to support concurrency, or more simply multiple user access to common data in your database.

In MySQL the InnoDB storage engine provides MVCC, row-level locking, full ACID compliance as well as other features.

In my understanding of database theory, access to modify independent sections of unique data (e.g. UPDATE) under MVCC should fully support concurrency. I have however experienced a level of exclusive locking under Innodb.

I wanted to clearly document this situation so I could then seek the advice of the guru’s in InnoDB Internals such as Mark Callaghan, Percona and the Innodb development team for example. I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The situation

Single table updates on a range of rows by primary keys are being blocked by other similar operations on the same table yet the set of data for each query is effectively unique.

Reproducing the problem

$ mysql -u -p test
drop table if exists numbers;
create table numbers (id int unsigned not null primary key, f1 int not null, f2 int not null) engine=innodb;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers(in p_max int)
deterministic
begin
  declare counter int default 1;
  truncate table numbers;
  insert into numbers values (1,1,1);
  while counter < p_max
  do
      insert into numbers (id,f1, f2)
          select id + counter, counter + f1, id - f2
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers(2000000);

In two separate threads I execute similar statements on different ranges of the primary key.

--thread 1
start transaction;
update numbers
set f2 = f2 +200
where id between 1 and 1000000;
commit;

--thread 2
start transaction;
update numbers
set f2 = f2 +300
where id between 1000001 and 2000000;
commit;

And in a third thread we can monitor the transactions inside Innodb.

-- thread 3
show engine innodb statusG

During the update process, the following error can be observed.

---TRANSACTION 0 7741, ACTIVE 20 sec, process no 2159, OS thread id 1188534592 fetching rows, thread declared inside InnoDB 275
mysql tables in use 1, locked 1
2007 lock struct(s), heap size 292848, 1001862 row lock(s), undo log entries 999858
MySQL thread id 918563, query id 16802707 localhost root Updating
update numbers set f2 = f2 +300 where id between 1000001 and 2000000
---TRANSACTION 0 7740, ACTIVE 21 sec, process no 2159, OS thread id 1178949952 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 918564, query id 16802694 localhost root Updating
update numbers set f2 = f2 +200 where id between 1 and 1000000
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16052 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 7740 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000001e3d; asc      =;; 2: len 7; hex 00000033630110; asc    3c  ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 80050584; asc     ;;

The problem has been reproduced on various different MySQL versions and different hardware including, 5.0.67, 5.0.81 and 5.1.25.

What is causing the problem?

  • Is it a bug? No.
  • Is my understanding of MVCC theory incorrect? Maybe.
  • Is it InnoDB’s implementation of MVCC incomplete. No. Heikki and his team have a far greater understanding of data theory then most database experts
  • Is it the MySQL kernel interfering with the InnoDB storage engine? No, this is not possible as the MySQL kernel has passed the queries to InnoDB, and InnoDB is handling these threads independently.
  • Is it a gap locking issue, a problem that can cause deadlocks when inserting data in a high concurrency situation? Not likely as the data is inserted in primary key, i.e. auto increment order, and there are no gaps.
  • Is it related to InnoDB access method via the primary key, where InnoDB uses a clustered index to store the primary key. Given the data is physically in primary key order, this clustered index would in theory reduce possible locking.
  • Is it related to the page size of indexes, e.g. the 16k index page, effectively causing a page level lock for overlapping index data? My understanding is that InnoDB supports row level locking, and MVCC should cater for this.
  • Is is related to the ranges of primary keys being adjacent, i.e. 1,000,000 and 1,000,001. Not likely as I can reproduce the problem not using adjacent ranges.
  • Is it some weird interaction to managing the undo space of the transactions in the Innodb buffer pool?
  • Is it some weird interaction with marking/locking the dirty pages in the Innodb buffer pool of modified pages?
  • Is it some weird interaction with logging the successful Innodb transaction to the redo logs.

I’ve listed these points more as an information exercise for all those that have less understanding of the problem to see my though process.

Additional testing can definitely be performed. Additional analysis of InnoDB internals with SHOW ENGINE INNODB STATUS such as spin waits, OS waits (context switches), looking at Mutexes with SHOW ENGINE INNODB MUTEX can be undertaken.

My hope and request is that this has been observed by others and that a simple hybrid solution exists.

Killing my softly with QUERY

The MySQL KILL command as the name suggests kills queries that are running.

After identifying the Id using the SHOW PROCESSLIST command, the User of the connection/thread or a database user with SUPER privileges can execute KILL [id]; to remove the connection/thread.

However, there is an ability to kill just the query that is being executed rather the entire connection. The default when not specified is to kill the connection, however you can optional specify the CONNECTION or QUERY keywords.

For example, below is an interactive test.

Thread 1:

mysql> select sleep(10);

Thread 2:

mysql> show processlist;
+----+------+-----------+------+---------+------+-----------+------------------+
| Id | User | Host      | db   | Command | Time | State     | Info             |
+----+------+-----------+------+---------+------+-----------+------------------+
| 23 | root | localhost | NULL | Query   |    2 | executing | select sleep(10) |
| 24 | root | localhost | NULL | Query   |    0 | NULL      | show processlist |
+----+------+-----------+------+---------+------+-----------+------------------+
2 rows in set (0.00 sec)

mysql> kill query 23;

Thread 1:
Notice, the query exits, in less then 10 seconds, but the connection is still valid.

+-----------+
| sleep(10) |
+-----------+
|         1 |
+-----------+
1 row in set (7.27 sec)

mysql> select sleep(10);

Thread 2:

mysql> show processlist;
+----+------+-----------+------+---------+------+-----------+------------------+
| Id | User | Host      | db   | Command | Time | State     | Info             |
+----+------+-----------+------+---------+------+-----------+------------------+
| 23 | root | localhost | NULL | Query   |    4 | executing | select sleep(10) |
| 24 | root | localhost | NULL | Query   |    0 | NULL      | show processlist |
+----+------+-----------+------+---------+------+-----------+------------------+
2 rows in set (0.00 sec)

mysql> kill 23;
Query OK, 0 rows affected (0.00 sec)

Thread 1:
Query is killed and connection is lost. An attempt to run the command again within the interactive client causes the system to re-get a connection.

ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> select sleep(10);
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    25
Current database: *** NONE ***

+-----------+
| sleep(10) |
+-----------+
|         0 |
+-----------+
1 row in set (10.00 sec)

Percona Performance Conference Talk

My final presentation during the 2009 MySQL Conference and Expo week was with the Percona Performance Conference on the topic of The Ideal Performance Architecture. My talk included discussions on Technology, Disk, Memory, Indexes, SQL and Data.

Updated 09/18/09
you can now see video of the event at Percona TV.

Extending the MySQL Data Landscape

Learn how to extend your existing MySQL based website to leverage the power of MySQL variants, AWS cloud based MySQL deployments and RDBMS alternatives. Evaluate how to integrate and use these different various technologies such as MySQL based variations KickFire, a column based optimization and InfoBright, a data warehousing solution. Understand the means of approach towards data synchronization between various database solutions in your business.

At the MySQL Meetup in New York this month, I spoke on “Extending the MySQL Data Landscape“. A MySQL centric view on an earlier work, “The Data Landscape” which I presented at a recent GoDaddy Tech Day.

,

A 5.1 QEP nicety – Using join buffer

I was surprised to find yesterday when using MySQL 5.1.26-rc with a client I’m recommending 5.1 to, some information not seen in the EXPLAIN plan before while reviewing SQL Statements.

Using join buffer

+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key          | key_len | ref                    | rows  | Extra                                        |
+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | lr    | ALL    | NULL          | NULL         | NULL    | NULL                   |  1084 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | ca    | ref    | update_check  | update_check | 4       | XXXXXXXXXXXXXXXXX      |     4 | Using where; Using index                     |
|  1 | SIMPLE      | ce    | ALL    | NULL          | NULL         | NULL    | NULL                   | 13319 | Using where; Using join buffer               |
|  1 | SIMPLE      | co    | eq_ref | PRIMARY       | PRIMARY      | 4       | XXXXXXXXXXXXXXXXX      |     1 | Using where                                  |
+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
4 rows in set (0.00 sec)
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.26-rc |
+-----------+
1 row in set (0.00 sec)

Sergey Petrunia of the MySQL Optimizer team writes about this in Use of join buffer is now visible in EXPLAIN.

MySQL Monitoring 101 – Graph your results

The problem

Hands up those that don’t monitor their production MySQL web server. I’m a little surprised by this, but I’ve visited several clients that have absolutely no monitoring other then “the customers will tell us when something is wrong”. The lack of system monitoring is one of the topics in my book “101 ways to screw up a successful startup”.

Why is monitoring important? First it can tell you when something is wrong, most monitoring systems introduce some level of constraints that trigger notifications via email, SMS or red flashing screens. Second, and I consider more important, is it allows you to analyze change and compare results over time. Let’s say you added more memory to your server, and then remembered to also increase the MySQL buffers appropriately. How much improvement did it make? Rather then “it seems faster”, you can have hard and fast numbers to back it up.

The Monitoring

If you have zero monitoring, you need to implement at least the following. Create the following script and run daily at midnight via cron. I’ve made the script as simple as can be.

#!/bin/sh
SCRIPT_NAME="monitor"
DATETIME=`date +%Y%m%d.%H%M`
HOSTNAME=`hostname -s`
LOG_DIR="/tmp"

vmstat 5 17280 > $LOG_DIR/os.vmstat.5.$HOSTNAME.$DATETIME.log &
iostat -x 5 17280 > $LOG_DIR/os.iostat.5.$HOSTNAME.$DATETIME.log &
exit 0

If you don’t have iostat installed, you will find iostat/sar as part of the sysstat package that most distributions will have generally available.

The results

vmstat output of relative idle system.

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 304332 289484 435052    0    0     3     4  206  177  1  0 98  0
 0  0      0 304108 289496 435072    0    0     0    45  411  736  5  0 95  0
 0  0      0 304116 289500 435072    0    0     0    13  395  716  3  0 97  0

vmstat of system under disk load

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 234784 289844 516800    0    0     3     4  206  177  1  0 98  0
 1  1      0 101636 289988 645552    0    0     1 33190  448  626 35  5 31 29
 1  2      0  33092 280888 723276    0    0     4 17233  469  616 24  4  6 66
 1  0      0  62876 177444 797056    0    0     2 14846  837  938 33  5 31 30
 1  1      0  33252 168440 834064    0    0     1 30376  969  904 31  6 22 41

iostat of system under load

 iostat -x 5
Linux 2.6.22-14-generic (newyork)       02/16/2008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.31    0.00    0.29    0.15    0.00   98.26

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.83     5.82    0.71    0.94    13.04    53.56    40.53     0.09   52.60   2.08   0.34
sr0               0.00     0.00    0.00    0.00     0.00     0.00    10.40     0.00  237.00 235.00   0.01

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          41.30    0.00    5.70   29.00    0.00   24.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  5153.60    0.60   75.80     4.80 42774.40   559.94    81.57 1354.93  13.09 100.00
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          30.63    0.00    4.10   61.36    0.00    3.90

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  3850.00    0.20   60.00     1.60 39763.20   660.54    83.68 1450.90  13.86  83.44
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.10    0.00    3.80   42.30    0.00   31.80

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  7645.80    0.40   66.80     3.20 44009.60   654.95   100.34 1192.33  14.14  95.04
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

The analysis

The below graph shows CPU usage over a few hours. What was only observed via the graph was a spike of CPU that occurred every 5 minutes. In fact it occurred just on one server of several servers, and this enabled analysis to find a rouge cron job running every 5 minutes.

The below graph shows CPU idle usage over a 24 hour period. The colors represent different usage patterns such as during the day, batch processing overnight etc.

The below graph shows IO utilization over the same 24 hour period.

NOTE: Color’s added manually to highlight points.
These past two graphs highlight a number of things that require further analysis, but as in this example, it only shows part of a bigger picture. For example only CPU idle is graphed, not Wait I/O for example which as you can see from the I/O is excessive at times.

Still, these graphs were created from grep’ing the appropriate data into a CSV file, loading into Open Office, and click just the graph button, about a 60 second process for each type of data. This alone gave a better picture of what was happening then no picture.

What’s next

This level of monitoring is impractical for production systems, especially if you are monitoring more then one system. You need to monitoring more things, compare different results at the same time, and get better results at hand instantly. In my next topic “Monitoring 201″ I’ll talk about a number of easy and practical options.

UltimateLAMP

This product is no longer available. This was created over 6 years ago and software is too out of date.


As I discussed earlier in A picture can tell a thousand words, I outlined briefly what the intention of UltimateLAMP was for. Let me spill the beans so to speak.

What is UltimateLAMP?

UltimateLAMP is a fully functional environment allowing you to easily try and evaluate a number of LAMP stack software products without requiring any specific setup or configuration of these products. UltimateLAMP runs as a Virtual Machine with VMware Player (FREE). This demonstration package also enables the recording of all user entered information for later reference, indeed you will find a wealth of information already available within a number of the Product Recommendations starting with the supplied Documentation.

My executive punch line with the “right” buzz words is:


You have heard of all the hype in Open Source with lowering Total Cost of
Ownership (TCO) or Total Economic Impact (TEI)? Evaluate Open Source now
in your organistion at no cost or risk with this software package.

What are the uses for UltimateLAMP?







Well, in a nutshell UltimateLAMP allows for instant exposure of Open Source products that use MySQL. You have to remember my goal here is to promote MySQL. Unfortunately, as with any database the features alone will not get the new sale, it’s the applicability to an organisation, and with Open Source the wealth of existing and developing applications that exist can. I don’t necessarily like this approach. Indeed many open source products are poorly designed and can be poor choices in a large scale enterprise solution, but the flip side is, where else can you start.

The benefits of selling a MediaWiki for example due to the success and scalability with Wikipedia is great. So it’s important that the product list includes proven products and currently developing products (rather then stale ones). This is something that the community can definitely provide valuable feedback on to help in this selection.

Other then becoming a CD used as a drink coaster, I feel the potential is here already to provide a copy to people, even install it on a managers computer. You can’t break the software, so why not install it for your non-computer user friends/family. The goal is to move up to executive management however I feel the exposure to the general community first will greatly help.

How it came about

There is some history to this idea. Here are some of the highlights.

  • Late 2005, several planning sessions with Jon and Morgan about a more practical Open Source Contribution user group lead to obviously LAMP stack products for simplicity and exposure. This lead to exposure of LAMP stack products to more of the general person and split from the original intended goal, but was a great idea.
  • Early 2006, futher discussions of how MySQL could get exposure and traction into organisations. For myself professional, how could I promote in industry sectors that I work in.
  • In April 2006, the MySQL Users Conference with discussions of this idea with others and the positive feedback
  • In May 2006, the VMware Appliance Challenge was the possible exposure and deadline needed for me to “Just Do It”. Originally the idea was intended as a Live CD, but in some ways a virtual machine is just as good.

Where to from now!

Well, supply and demand. I don’t know if anybody else has a particular use, or will even download to use it, or market it.

I could see the potential for MySQL User Groups to get behind my idea, and enable members to filter this into known organisations. I could see for targeted opportunities/events, CD’s or information could be distributed. With the support and backing of MySQL AB, I could see the opportunity for even a breakfast CEO/CTI/CIO introduction or other format of meeting the ultimate intended audience.

I could see the potential that an organisation or entity could provide free hosting (30 days) to an organisation that pointed a 3rd level domain to the provider (e.g. wiki.acme.com). It’s like the honeypot, if it takes off after 30 days, the company will either want to pay for hosting, or what to move it. And that’s also fine, organisation provides a MySQL dump, and links to documented installation instructions, or perhaps a sale of services for initial installation/customisation/training can be made.

What can I do?

There is always a list of things that can be done. For now the greatest thing I can ask for is feedback. The good, the bad and the ugly. It’s find to get the comments to say, “That’s great”, or “Good job” or “I can use this”, it’s just as important to get the comments that are proactive in what’s not good. I would value any feedback. Please feel free to Download UltimateLAMP

On my immediate ToDo List or even partially complete is:

  • Document VMPlayer installation instructions for Windows/Linux (partial).
  • Add more product sample content.
  • Add more mediawiki content about the product, like customisation options, references to specific documentation, or other online working examples.
  • Documenting the installation/creation instructions for individual products.
  • Figure out a better way for users to contribute content that get’s rolled back into the Virtual Machines. For the mediawiki, I could see a public online copy, but for other products it could become harder
  • Optimise VM image (removing unnecessary OS stuff), removing product language support (not ideal), but my goal is to provide a 2 CD pack. The first CD has VMplayer in Win/Linux/RPM formats and the default VMware BrowserAppliance (All software from VMWare). The second CD is UltimateLAMP. Combined in a DVD 2x case along with perhaps a small booklet of a few pages, would enable this product to potential move to a commercial state.
  • Investigating other products

Should anybody wishing to help, leave a comment, that way I can see somebody is reading this and so can others.

Related Post:

UltimateLAMP Passwords