Understanding InnoDB MVCC

Multi versioning concurrency control (MVCC) is a database design theory that enables relational databases to support concurrency, or more simply multiple user access to common data in your database.

In MySQL the InnoDB storage engine provides MVCC, row-level locking, full ACID compliance as well as other features.

In my understanding of database theory, access to modify independent sections of unique data (e.g. UPDATE) under MVCC should fully support concurrency. I have however experienced a level of exclusive locking under Innodb.

I wanted to clearly document this situation so I could then seek the advice of the guru’s in InnoDB Internals such as Mark Callaghan, Percona and the Innodb development team for example. I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The situation

Single table updates on a range of rows by primary keys are being blocked by other similar operations on the same table yet the set of data for each query is effectively unique.

Reproducing the problem

$ mysql -u -p test
drop table if exists numbers;
create table numbers (id int unsigned not null primary key, f1 int not null, f2 int not null) engine=innodb;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers(in p_max int)
deterministic
begin
  declare counter int default 1;
  truncate table numbers;
  insert into numbers values (1,1,1);
  while counter < p_max
  do
      insert into numbers (id,f1, f2)
          select id + counter, counter + f1, id - f2
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers(2000000);

In two separate threads I execute similar statements on different ranges of the primary key.

--thread 1
start transaction;
update numbers
set f2 = f2 +200
where id between 1 and 1000000;
commit;

--thread 2
start transaction;
update numbers
set f2 = f2 +300
where id between 1000001 and 2000000;
commit;

And in a third thread we can monitor the transactions inside Innodb.

-- thread 3
show engine innodb statusG

During the update process, the following error can be observed.

---TRANSACTION 0 7741, ACTIVE 20 sec, process no 2159, OS thread id 1188534592 fetching rows, thread declared inside InnoDB 275
mysql tables in use 1, locked 1
2007 lock struct(s), heap size 292848, 1001862 row lock(s), undo log entries 999858
MySQL thread id 918563, query id 16802707 localhost root Updating
update numbers set f2 = f2 +300 where id between 1000001 and 2000000
---TRANSACTION 0 7740, ACTIVE 21 sec, process no 2159, OS thread id 1178949952 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 918564, query id 16802694 localhost root Updating
update numbers set f2 = f2 +200 where id between 1 and 1000000
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16052 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 7740 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000001e3d; asc      =;; 2: len 7; hex 00000033630110; asc    3c  ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 80050584; asc     ;;

The problem has been reproduced on various different MySQL versions and different hardware including, 5.0.67, 5.0.81 and 5.1.25.

What is causing the problem?

  • Is it a bug? No.
  • Is my understanding of MVCC theory incorrect? Maybe.
  • Is it InnoDB’s implementation of MVCC incomplete. No. Heikki and his team have a far greater understanding of data theory then most database experts
  • Is it the MySQL kernel interfering with the InnoDB storage engine? No, this is not possible as the MySQL kernel has passed the queries to InnoDB, and InnoDB is handling these threads independently.
  • Is it a gap locking issue, a problem that can cause deadlocks when inserting data in a high concurrency situation? Not likely as the data is inserted in primary key, i.e. auto increment order, and there are no gaps.
  • Is it related to InnoDB access method via the primary key, where InnoDB uses a clustered index to store the primary key. Given the data is physically in primary key order, this clustered index would in theory reduce possible locking.
  • Is it related to the page size of indexes, e.g. the 16k index page, effectively causing a page level lock for overlapping index data? My understanding is that InnoDB supports row level locking, and MVCC should cater for this.
  • Is is related to the ranges of primary keys being adjacent, i.e. 1,000,000 and 1,000,001. Not likely as I can reproduce the problem not using adjacent ranges.
  • Is it some weird interaction to managing the undo space of the transactions in the Innodb buffer pool?
  • Is it some weird interaction with marking/locking the dirty pages in the Innodb buffer pool of modified pages?
  • Is it some weird interaction with logging the successful Innodb transaction to the redo logs.

I’ve listed these points more as an information exercise for all those that have less understanding of the problem to see my though process.

Additional testing can definitely be performed. Additional analysis of InnoDB internals with SHOW ENGINE INNODB STATUS such as spin waits, OS waits (context switches), looking at Mutexes with SHOW ENGINE INNODB MUTEX can be undertaken.

My hope and request is that this has been observed by others and that a simple hybrid solution exists.

Comments

  1. Øystein Grøvlen says

    MVCC does not really apply here. MVCC is a mechanism that allows transactions to read older versions of a row when the current version is not yet committed. For updates, InnoDB uses traditional two-phase locking. Since your threads are updating they will not have any advantage of MVCC.

    Many databases systems will perform lock escalation when the number of row locks on a table gets high. That is, row locks will be converted into a table level lock. However, according to the manual, Innodb does not do lock escalation so that should not be the problem here.

    From the innodb status output, it seems like the transaction has completed all its updates when it waits for the lock. (100000 undo log entries has been generated). Hence, it looks to me like it tries to check the next record which is locked by the other transaction. However, that does not explain that you still get this with non-adjacent ranges.

  2. Gordon Irish says

    Ronald

    Just for interest, maybe try it with Oracle. I wonder if the the big Oracle database exhibits the same behavior.

    You can download Oracle Express Edition from otn.oracle.com, it’s free and fully functional for small databases tests like this. It’s the same Oracle database kernel code as Oracle’s big enterprise database.

    Gord Irish

  3. says

    Ronald,

    thread 1 is waiting for an exclusive lock on record 1,000,001 (== 0xf4241).

    When MySQL asks the engine to scan rows between 1 and 1,000,000, it does not tell the engine the end point of the range. The engine will therefore read, and lock, also row 1,000,001.

    You could try transaction isolation level READ COMMITTED in 5.1.xx. Or set innodb_locks_unsafe_for_binlog = 1 in 5.1.xx. Then InnoDB should read the latest committed version of row 1,000,001, and return it to MySQL. MySQL then determines that the row is not in the range requested by the query. I have not tested this specific case, but it should cause no lock wait, because MySQL notices that the range has ended, as 1,000,001 does not belong to the range, and MySQL should not issue further lock requests.

  4. Brandon Bates says

    Just a thought, but I read somewhere that procedures or functions or both will lock the whole table. I don’t know if this is true or not, but I have run into some anecdotal evidence suggesting it is.

  5. igaz says

    Heikki Tuuri commented:

    “When MySQL asks the engine to scan rows between 1 and 1,000,000, it does not tell the engine the end point of the range. The engine will therefore read, and lock, also row 1,000,001.”

    Huh? Isn’t the “end point of the range” = 1 000 000? That is how I would interpret “between 1 and 1 000 000″