Archive for July 15th, 2009

Never let your binlog directory fill up

Wednesday, July 15th, 2009

Recently with a client while running a number of disaster recovery tests I came across a nasty situation which was not part of the original plan and provided a far worse disaster situation then expected.

I should preface this condition with some of the environment conditions.

  • MySQL 5.0 Enterprise 5.0.54
  • RHEL 5 64bit
  • Master and 2 Slaves
  • MySQL Data and MySQL Binary Logs/MySQL Error Logs are on separate disk partitions

While running stress tests under high load, we tested the filling of partition containing the logs. This partition included the binary log and MySQL error log.

The observed output was.

  • An error message was written to the MySQL error log. See below.
  • Application throughput dropped, but did not stop.
  • Binary logs stopped occurring.
  • MySQL proactively stopped logging but continued to process transactions.

The end result was:

  • The error log was ultimately truncated after reboot, so if the information was not captured while this was in action, this important messages would be lost.
  • The primary slave used for read scalability and the secondary slave used for backups are now unusable.
  • The backup and recovery process using slaves and point in time recovery via binary logs is not unusable.
  • The three backup methods in use for the client are ineffective. It was necessary to disable access to the Master, take a full backup, and then sync the slaves from this copy.
090710 19:01:25 [ERROR] /opt/mysql/bin/mysqld: Disk is full writing '/mysqllog/binlog/hostname-3306-bin.000020'
     (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
090710 19:01:46 [ERROR] Error writing file '/mysqllog/slow_log/hostname_3306_slow_queries.log' (errno: 1)
090710 19:02:25 [ERROR] Error writing file '/mysqllog/binlog/hostname-3306-bin' (errno: 28)
090710 19:02:25 [ERROR] Could not use /mysqllog/binlog/hostname-3306-bin for logging (error 28).
    Turning logging off for the whole duration of the MySQL server process. 
    To turn it on again: fix the cause, shutdown the MySQL server and restart it.

Updated

I discuss in detail the options for the MySQL error log including recommendations for the MySQL error log file location in Monitoring MySQL – The error log

Getting wireless working on Ubuntu Macbook

Wednesday, July 15th, 2009

I run Ubuntu 9.04 Januty on my Macbook. Previously installing Ubuntu 8.10, wireless worked automatically, for 9.04 it did not.

This is what I did to fix it.

  1. Verify your Macbook is seeing the Broadcom controller. See below for the lspci command, and expected output.
  2. Goto System -> Administration -> Hardware Drivers. The Broadcom STA wireless driver is activated, deactivate it.
  3. Add to /etc/modules a line with wl
  4. Reboot
  5. Goto System -> Administration -> Hardware Drivers. Activate the Broadcom STA wireless driver.
  6. Reboot
  7. Wireless now operational.
$ lspci
...
02:00.0 Network controller: Broadcom Corporation BCM4328 802.11a/b/g/n (rev 03)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. Marvell Yukon 88E8058 PCI-E Gigabit Ethernet Controller (rev 13)
04:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)

For reference, modprobe wl does not return any output in my environment, yet wireless works fine.

References:

Understanding more InnoDB MVCC

Wednesday, July 15th, 2009

As I had written earlier in Understanding InnoDB MVCC, I am trying to understand why InnoDB is taking a lock on data during an UPDATE when I do not expect this to happen.

Not wanting to go looking at the InnoDB source code to understand the problem, I’m endeavouring to possibly use existing MySQL monitoring to try and understand the problem better. In this case, I’m going to investigate SHOW ENGINE INNODB STATUS first. An old but still relevant article for reference is SHOW INNODB STATUS walk through. The MySQL High Performance book is also a good starting reference.

I’ve just installed 5.1.36 on a new Linux 64 bit laptop for this test.

For now all I’ve done is ensure the innodb_buffer_pool_size is sufficient to hold the entire table in memory.

So what happens in the 21.5 seconds these queries took to run.

–thread 1

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> update numbers
    -> set f2 = f2 +200
    -> where id between 1 and 1000000; commit;
Query OK, 1000000 rows affected (21.50 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0
Query OK, 0 rows affected (0.02 sec)

–thread 2

Query OK, 0 rows affected (0.00 sec)

mysql> update numbers
    -> set f2 = f2 +300
    -> where id between 1000001 and 2000000; commit;
Query OK, 1000000 rows affected (20.06 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0
Query OK, 0 rows affected (0.02 sec)

In SHOW ENGINE INNODB STATUS I sampled 1718 unique times. Probably not the best approach but it did highlight some things.

The overall state of the main Innodb thread process cycled though states including sleeping, making checkpoint, flushing log, waiting for server activity, purging and reserving kernel mutex.

     49 Main thread process no. 5489, id 139872150796624, state: flushing log
    709 Main thread process no. 5489, id 139872150796624, state: making checkpoint
      1 Main thread process no. 5489, id 139872150796624, state: purging
      2 Main thread process no. 5489, id 139872150796624, state: reserving kernel mutex
    956 Main thread process no. 5489, id 139872150796624, state: sleeping
      1 Main thread process no. 5489, id 139872150796624, state: waiting for server activity

I was surprised to see making checkpoint here. I’ve actually run this monitoring on two separate servers, both running 5.1.x and both times this occured. On this test machine, I also saw an increase in the flush list, and pending checkpoint writes to confirm this. I’ve tried in the past to monitor this closely, so this test actually will be useful for a different problem analysis at a later time.

Some 3800 OS Waits (that’s context switches) indicates some critical code need to use a mutex. These mutexes listed in the Semaphores section occured in multiple areas including:

      4 Mutex at 0x1684b08 created file mem/mem0pool.c line 206, lock var 0
     61 Mutex at 0x7f369033e2b8 created file srv/srv0srv.c line 875, lock var 0
      1 Mutex at 0x7f36903408b8 created file fil/fil0fil.c line 1313, lock var 1
     22 Mutex at 0x7f3690340cb8 created file buf/buf0buf.c line 597, lock var 0
      5 Mutex at 0x7f36903410e0 created file btr/btr0sea.c line 139, lock var 0
      1 Mutex at 0x7f36903410e0 created file btr/btr0sea.c line 139, lock var 1
     21 Mutex at 0x7f3690341910 created file log/log0log.c line 738, lock var 0
     85 Mutex at 0x7f3690341910 created file log/log0log.c line 738, lock var 1
      8 Mutex at 0x7f36903508c0 created file trx/trx0rseg.c line 147, lock var 0
    232 Mutex at 0x7f36903508c0 created file trx/trx0rseg.c line 147, lock var 1

One transaction waited only a little more then 1 second, indicated by 58 samples.

---TRANSACTION 0 1327, ACTIVE 22 sec, process no 5489, OS thread id 139872177113424 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 66, query id 11435 localhost root Updating
update numbers
set f2 = f2 +200
where id between 1 and 1000000
------- TRX HAS BEEN WAITING 1 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2612 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 1327 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000000530; asc      0;; 2: len 7; hex 00000000f20110; asc        ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 8005032c; asc    ,;;

And the blocker in this instance while waiting for this lock to be released was continuing to get row locks, and create undo log entries. That was to be expected.

963 lock struct(s), heap size 292848, 979851 row lock(s), undo log entries 977891
1979 lock struct(s), heap size 292848, 988001 row lock(s), undo log entries 986025
1989 lock struct(s), heap size 292848, 993013 row lock(s), undo log entries 991027

I doubled checked to ensure no other queries were inside the InnoDB queue.

      4 0 queries inside InnoDB, 0 queries in queue
    303 1 queries inside InnoDB, 0 queries in queue
   1411 2 queries inside InnoDB, 0 queries in queue

Only 12 of the 1718 samples showed any Pending asynchronous I/O writes, however there was a higher amount of pending buffer pool and log syncs. This is all to be expected.

This quick inspection, especially at 1am has not given me any answers. With greater information, the need for better understanding is required.

Understanding InnoDB MVCC

Wednesday, July 15th, 2009

Multi versioning concurrency control (MVCC) is a database design theory that enables relational databases to support concurrency, or more simply multiple user access to common data in your database.

In MySQL the InnoDB storage engine provides MVCC, row-level locking, full ACID compliance as well as other features.

In my understanding of database theory, access to modify independent sections of unique data (e.g. UPDATE) under MVCC should fully support concurrency. I have however experienced a level of exclusive locking under Innodb.

I wanted to clearly document this situation so I could then seek the advice of the guru’s in InnoDB Internals such as Mark Callaghan, Percona and the Innodb development team for example. I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The situation

Single table updates on a range of rows by primary keys are being blocked by other similar operations on the same table yet the set of data for each query is effectively unique.

Reproducing the problem

$ mysql -u -p test
drop table if exists numbers;
create table numbers (id int unsigned not null primary key, f1 int not null, f2 int not null) engine=innodb;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers(in p_max int)
deterministic
begin
  declare counter int default 1;
  truncate table numbers;
  insert into numbers values (1,1,1);
  while counter < p_max
  do
      insert into numbers (id,f1, f2)
          select id + counter, counter + f1, id - f2
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers(2000000);

In two separate threads I execute similar statements on different ranges of the primary key.

--thread 1
start transaction;
update numbers
set f2 = f2 +200
where id between 1 and 1000000;
commit;

--thread 2
start transaction;
update numbers
set f2 = f2 +300
where id between 1000001 and 2000000;
commit;

And in a third thread we can monitor the transactions inside Innodb.

-- thread 3
show engine innodb status\G

During the update process, the following error can be observed.

---TRANSACTION 0 7741, ACTIVE 20 sec, process no 2159, OS thread id 1188534592 fetching rows, thread declared inside InnoDB 275
mysql tables in use 1, locked 1
2007 lock struct(s), heap size 292848, 1001862 row lock(s), undo log entries 999858
MySQL thread id 918563, query id 16802707 localhost root Updating
update numbers set f2 = f2 +300 where id between 1000001 and 2000000
---TRANSACTION 0 7740, ACTIVE 21 sec, process no 2159, OS thread id 1178949952 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 918564, query id 16802694 localhost root Updating
update numbers set f2 = f2 +200 where id between 1 and 1000000
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16052 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 7740 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000001e3d; asc      =;; 2: len 7; hex 00000033630110; asc    3c  ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 80050584; asc     ;;

The problem has been reproduced on various different MySQL versions and different hardware including, 5.0.67, 5.0.81 and 5.1.25.

What is causing the problem?

  • Is it a bug? No.
  • Is my understanding of MVCC theory incorrect? Maybe.
  • Is it InnoDB’s implementation of MVCC incomplete. No. Heikki and his team have a far greater understanding of data theory then most database experts
  • Is it the MySQL kernel interfering with the InnoDB storage engine? No, this is not possible as the MySQL kernel has passed the queries to InnoDB, and InnoDB is handling these threads independently.
  • Is it a gap locking issue, a problem that can cause deadlocks when inserting data in a high concurrency situation? Not likely as the data is inserted in primary key, i.e. auto increment order, and there are no gaps.
  • Is it related to InnoDB access method via the primary key, where InnoDB uses a clustered index to store the primary key. Given the data is physically in primary key order, this clustered index would in theory reduce possible locking.
  • Is it related to the page size of indexes, e.g. the 16k index page, effectively causing a page level lock for overlapping index data? My understanding is that InnoDB supports row level locking, and MVCC should cater for this.
  • Is is related to the ranges of primary keys being adjacent, i.e. 1,000,000 and 1,000,001. Not likely as I can reproduce the problem not using adjacent ranges.
  • Is it some weird interaction to managing the undo space of the transactions in the Innodb buffer pool?
  • Is it some weird interaction with marking/locking the dirty pages in the Innodb buffer pool of modified pages?
  • Is it some weird interaction with logging the successful Innodb transaction to the redo logs.

I’ve listed these points more as an information exercise for all those that have less understanding of the problem to see my though process.

Additional testing can definitely be performed. Additional analysis of InnoDB internals with SHOW ENGINE INNODB STATUS such as spin waits, OS waits (context switches), looking at Mutexes with SHOW ENGINE INNODB MUTEX can be undertaken.

My hope and request is that this has been observed by others and that a simple hybrid solution exists.