Understanding more InnoDB MVCC

As I had written earlier in Understanding InnoDB MVCC, I am trying to understand why InnoDB is taking a lock on data during an UPDATE when I do not expect this to happen.

Not wanting to go looking at the InnoDB source code to understand the problem, I’m endeavouring to possibly use existing MySQL monitoring to try and understand the problem better. In this case, I’m going to investigate SHOW ENGINE INNODB STATUS first. An old but still relevant article for reference is SHOW INNODB STATUS walk through. The MySQL High Performance book is also a good starting reference.

I’ve just installed 5.1.36 on a new Linux 64 bit laptop for this test.

For now all I’ve done is ensure the innodb_buffer_pool_size is sufficient to hold the entire table in memory.

So what happens in the 21.5 seconds these queries took to run.

–thread 1

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> update numbers
    -> set f2 = f2 +200
    -> where id between 1 and 1000000; commit;
Query OK, 1000000 rows affected (21.50 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0
Query OK, 0 rows affected (0.02 sec)

–thread 2

Query OK, 0 rows affected (0.00 sec)

mysql> update numbers
    -> set f2 = f2 +300
    -> where id between 1000001 and 2000000; commit;
Query OK, 1000000 rows affected (20.06 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0
Query OK, 0 rows affected (0.02 sec)

In SHOW ENGINE INNODB STATUS I sampled 1718 unique times. Probably not the best approach but it did highlight some things.

The overall state of the main Innodb thread process cycled though states including sleeping, making checkpoint, flushing log, waiting for server activity, purging and reserving kernel mutex.

     49 Main thread process no. 5489, id 139872150796624, state: flushing log
    709 Main thread process no. 5489, id 139872150796624, state: making checkpoint
      1 Main thread process no. 5489, id 139872150796624, state: purging
      2 Main thread process no. 5489, id 139872150796624, state: reserving kernel mutex
    956 Main thread process no. 5489, id 139872150796624, state: sleeping
      1 Main thread process no. 5489, id 139872150796624, state: waiting for server activity

I was surprised to see making checkpoint here. I’ve actually run this monitoring on two separate servers, both running 5.1.x and both times this occured. On this test machine, I also saw an increase in the flush list, and pending checkpoint writes to confirm this. I’ve tried in the past to monitor this closely, so this test actually will be useful for a different problem analysis at a later time.

Some 3800 OS Waits (that’s context switches) indicates some critical code need to use a mutex. These mutexes listed in the Semaphores section occured in multiple areas including:

      4 Mutex at 0x1684b08 created file mem/mem0pool.c line 206, lock var 0
     61 Mutex at 0x7f369033e2b8 created file srv/srv0srv.c line 875, lock var 0
      1 Mutex at 0x7f36903408b8 created file fil/fil0fil.c line 1313, lock var 1
     22 Mutex at 0x7f3690340cb8 created file buf/buf0buf.c line 597, lock var 0
      5 Mutex at 0x7f36903410e0 created file btr/btr0sea.c line 139, lock var 0
      1 Mutex at 0x7f36903410e0 created file btr/btr0sea.c line 139, lock var 1
     21 Mutex at 0x7f3690341910 created file log/log0log.c line 738, lock var 0
     85 Mutex at 0x7f3690341910 created file log/log0log.c line 738, lock var 1
      8 Mutex at 0x7f36903508c0 created file trx/trx0rseg.c line 147, lock var 0
    232 Mutex at 0x7f36903508c0 created file trx/trx0rseg.c line 147, lock var 1

One transaction waited only a little more then 1 second, indicated by 58 samples.

---TRANSACTION 0 1327, ACTIVE 22 sec, process no 5489, OS thread id 139872177113424 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 66, query id 11435 localhost root Updating
update numbers
set f2 = f2 +200
where id between 1 and 1000000
------- TRX HAS BEEN WAITING 1 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2612 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 1327 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000000530; asc      0;; 2: len 7; hex 00000000f20110; asc        ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 8005032c; asc    ,;;

And the blocker in this instance while waiting for this lock to be released was continuing to get row locks, and create undo log entries. That was to be expected.

963 lock struct(s), heap size 292848, 979851 row lock(s), undo log entries 977891
1979 lock struct(s), heap size 292848, 988001 row lock(s), undo log entries 986025
1989 lock struct(s), heap size 292848, 993013 row lock(s), undo log entries 991027

I doubled checked to ensure no other queries were inside the InnoDB queue.

      4 0 queries inside InnoDB, 0 queries in queue
    303 1 queries inside InnoDB, 0 queries in queue
   1411 2 queries inside InnoDB, 0 queries in queue

Only 12 of the 1718 samples showed any Pending asynchronous I/O writes, however there was a higher amount of pending buffer pool and log syncs. This is all to be expected.

This quick inspection, especially at 1am has not given me any answers. With greater information, the need for better understanding is required.

Comments

  1. Øystein Grøvlen says

    That a checkpoint will be initiated is not very surprising with your load. AFAIK, Innodb will start a checkpoint if the percentage of pages that are dirty are above a specified limit. Since you are updating close to every record in your table, I believe that limit has been exceeded.