Ronald Bradford
MySQL Expert

MySQL Expert Ronald Bradford shares valuable input in MySQL Performance Tuning, MySQL Scalability and general MySQL Help from his two decades of working with MySQL, Oracle, Ingres and development technologies.

Posts Tagged ‘InnoDB’

10x Performance Improvements in MySQL – A Case Study

Sunday, February 7th, 2010

The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.

The end result was a page load improvement from 700+ms load time to a a consistent 60ms.

Calculating your database size

Friday, September 25th, 2009

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040 

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb. 

This exact query *provoked* a crash: 

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.

Understanding Innodb Transaction Isolation

Thursday, September 24th, 2009

The MySQL Innodb storage engine provides ACID compliance, the ‘I’ being isolation. There are four states defined in MySQL with the tx_isolation system variable, READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ and SERIALIZABLE.

Generally MySQL installations do not modify the default value of tx_isolation = REPEATABLE-READ, however I have seen with a number of clients the default value has been changed to READ-COMMITTED. When I attempt to identify the reason why, I have always received the same reason. Oracle uses a default transaction isolation of READ-COMMITTED. See how Oracle Manages Data Concurrency and Consistency for more information.

However, while the literal string is the same, the actual implementation of READ-COMMITTED in Oracle more closely represents the REPEATABLE-READ in MySQL.

The following demonstrates what you can expect to see between the operation of these two modes.

REPEATABLE-READ
Session 1 Session 2
DROP TABLE IF EXISTS transaction_test;
CREATE TABLE transaction_test(
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(20) NOT NULL,
  created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
) ENGINE=InnoDB DEFAULT CHARSET latin1;

INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation;
+-----------------------+------------------------+
| @@global.tx_isolation | @@session.tx_isolation |
+-----------------------+------------------------+
| REPEATABLE-READ       | REPEATABLE-READ        |
+-----------------------+------------------------+
START TRANSACTION;
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-21 00:19:43 |
|  2 | b   | 2009-09-21 00:19:43 |
|  3 | c   | 2009-09-21 00:19:43 |
+----+-----+---------------------+
SELECT SLEEP(20);
START TRANSACTION;
INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z');
SELECT * FROM transaction_test;
+----+-----------------+---------------------+
| id | val             | created             |
+----+-----------------+---------------------+
|  1 | a               | 2009-09-21 00:19:43 |
|  2 | b               | 2009-09-21 00:19:43 |
|  3 | c               | 2009-09-21 00:19:43 |
|  4 | x               | 2009-09-21 00:21:00 |
|  5 | y               | 2009-09-21 00:21:00 |
|  6 | z               | 2009-09-21 00:21:00 |
+----+-----------------+---------------------+
COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation);
SELECT * FROM transaction_test;
+----+-----------------+---------------------+
| id | val             | created             |
+----+-----------------+---------------------+
|  1 | a               | 2009-09-21 00:19:43 |
|  2 | b               | 2009-09-21 00:19:43 |
|  3 | c               | 2009-09-21 00:19:43 |
|  7 | REPEATABLE-READ | 2009-09-21 00:21:01 |
+----+-----------------+---------------------+

COMMIT;
READ-COMMITTED
SET SESSION tx_isolation='READ-COMMITTED';
TRUNCATE TABLE transaction_test;
INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation;
+-----------------------+------------------------+
| @@global.tx_isolation | @@session.tx_isolation |
+-----------------------+------------------------+
| REPEATABLE-READ       | READ-COMMITTED         |
+-----------------------+------------------------+
START TRANSACTION;
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-23 22:49:44 |
|  2 | b   | 2009-09-23 22:49:44 |
|  3 | c   | 2009-09-23 22:49:44 |
+----+-----+---------------------+
SELECT SLEEP(20);
START TRANSACTION;
INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z');
SELECT * FROM transaction_test;
+----+-----+---------------------+
| id | val | created             |
+----+-----+---------------------+
|  1 | a   | 2009-09-23 22:49:44 |
|  2 | b   | 2009-09-23 22:49:44 |
|  3 | c   | 2009-09-23 22:49:44 |
|  4 | x   | 2009-09-23 22:52:38 |
|  5 | y   | 2009-09-23 22:52:38 |
|  6 | z   | 2009-09-23 22:52:38 |
+----+-----+---------------------+
COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation);
SELECT * FROM transaction_test;
+----+----------------+---------------------+
| id | val            | created             |
+----+----------------+---------------------+
|  1 | a              | 2009-09-23 22:49:44 |
|  2 | b              | 2009-09-23 22:49:44 |
|  3 | c              | 2009-09-23 22:49:44 |
|  4 | x              | 2009-09-23 22:52:38 |
|  5 | y              | 2009-09-23 22:52:38 |
|  6 | z              | 2009-09-23 22:52:38 |
|  7 | READ-COMMITTED | 2009-09-23 22:56:10 |
+----+----------------+---------------------+

COMMIT;

As you can see, under READ-COMMITTED your result set can change during the transaction. However, how practical is this example in an actual application.

In what circumstances would you consider using READ-COMMITTED? Is there an improvement in locking contention that can lead to less deadlock contention? What is the overhead in other areas?

Harrison writes in My Favorite New Feature of MySQL 5.1: Less InnoDB Locking that best locking out of InnoDB in 5.1 will be with READ-COMMITTED. Note that as mentioned, the impact is a change in replication mode that may have a more dramatic effect.

Heikki Tuuri comments in Understanding InnoDB MVCC that using READ-COMMITTED should help in a specific locking issue.

I am still unclear of the specific benefits in general terms for all environments. Review of the The InnoDB Transaction Model and Locking and specifically Consistent Nonlocking Reads provides “With READ COMMITTED isolation level, each consistent read within a transaction sets and reads its own fresh snapshot” which indicates that for certain workloads the reduced locking is a benefit.

Every environment is different and ultimately the actual transaction statements will determine what options and benefits work best.

My favorite MySQL data type – DECIMAL(31,0)

Friday, September 18th, 2009

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed a clear improvement in Innodb buffer pool reads and hit ratio.

Today’s lesson as described in my 2008 conference presentation Top 20 design tips for data architects is, choose the right integer data type for your data.

InnoDB I_S.tables.table_rows out by a factor of 100x

Wednesday, September 9th, 2009

I’ve always believed that the MySQL Information_schema.tables.table_rows figure for Innodb tables to be while approximate, approximately accurate.

Today I found that the figures varied on one table from 10x to 100x wrong.

Before performing an ALTER I always verify sizes for reference.

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| table_s | table_name | engine | row_format | table_rows | avg_row_length | total_mb   | data_mb    | index_mb  | today      |
+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |     208993 |           7475 |  1491.5312 |  1490.0156 |    1.5156 | 2009-09-09 |

mysql> alter table s_a modify col VARCHAR(255);
Query OK, 23471 rows affected (4 min 26.23 sec)

After

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |    2407063 |            629 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

I was so caught out by this. The table reported 200k rows, but the alter returned 23k, that’s like 10x out.
I ran my query again, and the second time I got.

+---------+------------+--------+------------+------------+----------------+------------+------------+-----------+------------+
| XXXXX   | s_a        | InnoDB | Compact    |      21813 |          69487 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

This is closer to what I’d expected, 22k verses 23k.

I have to say, while I have always treated the data and index size as accurate, I now how little confidence in the table_rows any more.

Setting up sysbench with MySQL & Drizzle

Thursday, July 23rd, 2009

Sysbench is a open source product that enables you to perform various system benchmarks including databases. Drizzles performs regression testing of every trunk revision with a branched version of sysbench within Drizzle Automation.

A pending branch https://code.launchpad.net/~elambert/sysbench/trunk_drizzle_merge by Eric Lambert now enables side by side testing with MySQL and Drizzle. On a system running MySQL and Drizzle I was able install this sysbench branch with the following commands.

cd bzr
bzr branch lp:~elambert/sysbench/trunk_drizzle_merge
cd trunk_drizzle_merge/
./autogen.sh
./configure
make
sudo make install

Running the default lua tests supplied required me to ensure drizzle was in my path and that I created the ’sbtest’ schema. I’ll be sure it add that checking to my future developed benchmark scripts.

$ cd sysbench/tests/db
$ sysbench --test=insert.lua --db_driver=drizzle prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

FATAL: unable to connect to Drizzle server: 23
FATAL: error 0: Unknown database 'sbtest'
FATAL: failed to execute function `prepare': insert.lua:7: Failed to connect to the database
$ drizzle -e "create schema sbtest"
$ sysbench --test=insert.lua --db_driver=drizzle prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Creating table 'sbtest'...

And running produces the following results.

$ sysbench --num-threads=1 --test=insert.lua --db_driver=drizzle run
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Threads started!

OLTP test statistics:
    queries performed:
        read:                            0
        write:                           10000
        other:                           0
        total:                           10000
    transactions:                        0      (0.00 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 10000  (879.68 per sec.)
    other operations:                    0      (0.00 per sec.)

Test execution summary:
    total time:                          11.3678s
    total number of events:              10000
    total time taken by event execution: 11.3354s
    per-request statistics:
         min:                                  0.32ms
         avg:                                  1.13ms
         max:                                 68.74ms
         approx.  95 percentile:               2.41ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   11.3354/0.0

Rerunning the prepare also lacked some auto cleanup to allow for automated re-running.

$ sysbench --test=insert.lua --db_driver=drizzle prepare
Creating table 'sbtest'...
ALERT: Drizzle Query Failed: 1050:Table 'sbtest' already exists
FATAL: failed to execute function `prepare': insert.lua:57: Database query failed

For MySQL

$ sysbench --test=insert.lua --db_driver=mysql --mysql_table_engine=innodb prepare
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Creating table 'sbtest'...

Unfortunately this doesn’t actually create the table in the right storage engine, I had to hack the code to ensure I was comparing InnoDB in each test.

$ sysbench --num-threads=1 --test=insert.l
ua --db_driver=mysql run
sysbench v0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Threads started!

OLTP test statistics:
    queries performed:
        read:                            0
        write:                           10000
        other:                           0
        total:                           10000
    transactions:                        0      (0.00 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 10000  (897.67 per sec.)
    other operations:                    0      (0.00 per sec.)

Test execution summary:
    total time:                          11.1399s
    total number of events:              10000
    total time taken by event execution: 11.1084s
    per-request statistics:
         min:                                  0.27ms
         avg:                                  1.11ms
         max:                                252.63ms
         approx.  95 percentile:               2.48ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   11.1084/0.00

Armed with a working environment I can now write some more realistic production like tests in Lua.

Understanding InnoDB MVCC

Wednesday, July 15th, 2009

Multi versioning concurrency control (MVCC) is a database design theory that enables relational databases to support concurrency, or more simply multiple user access to common data in your database.

In MySQL the InnoDB storage engine provides MVCC, row-level locking, full ACID compliance as well as other features.

In my understanding of database theory, access to modify independent sections of unique data (e.g. UPDATE) under MVCC should fully support concurrency. I have however experienced a level of exclusive locking under Innodb.

I wanted to clearly document this situation so I could then seek the advice of the guru’s in InnoDB Internals such as Mark Callaghan, Percona and the Innodb development team for example. I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The situation

Single table updates on a range of rows by primary keys are being blocked by other similar operations on the same table yet the set of data for each query is effectively unique.

Reproducing the problem

$ mysql -u -p test
drop table if exists numbers;
create table numbers (id int unsigned not null primary key, f1 int not null, f2 int not null) engine=innodb;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers(in p_max int)
deterministic
begin
  declare counter int default 1;
  truncate table numbers;
  insert into numbers values (1,1,1);
  while counter < p_max
  do
      insert into numbers (id,f1, f2)
          select id + counter, counter + f1, id - f2
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers(2000000);

In two separate threads I execute similar statements on different ranges of the primary key.

--thread 1
start transaction;
update numbers
set f2 = f2 +200
where id between 1 and 1000000;
commit;

--thread 2
start transaction;
update numbers
set f2 = f2 +300
where id between 1000001 and 2000000;
commit;

And in a third thread we can monitor the transactions inside Innodb.

-- thread 3
show engine innodb status\G

During the update process, the following error can be observed.

---TRANSACTION 0 7741, ACTIVE 20 sec, process no 2159, OS thread id 1188534592 fetching rows, thread declared inside InnoDB 275
mysql tables in use 1, locked 1
2007 lock struct(s), heap size 292848, 1001862 row lock(s), undo log entries 999858
MySQL thread id 918563, query id 16802707 localhost root Updating
update numbers set f2 = f2 +300 where id between 1000001 and 2000000
---TRANSACTION 0 7740, ACTIVE 21 sec, process no 2159, OS thread id 1178949952 fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2008 lock struct(s), heap size 292848, 1002005 row lock(s), undo log entries 1000000
MySQL thread id 918564, query id 16802694 localhost root Updating
update numbers set f2 = f2 +200 where id between 1 and 1000000
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16052 n bits 568 index `PRIMARY` of table `test`.`numbers` trx id 0 7740 lock_mode X waiting
Record lock, heap no 256 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 000f4241; asc   BA;; 1: len 6; hex 000000001e3d; asc      =;; 2: len 7; hex 00000033630110; asc    3c  ;; 3: len 4; hex 800f4241; asc   BA;; 4: len 4; hex 80050584; asc     ;;

The problem has been reproduced on various different MySQL versions and different hardware including, 5.0.67, 5.0.81 and 5.1.25.

What is causing the problem?

  • Is it a bug? No.
  • Is my understanding of MVCC theory incorrect? Maybe.
  • Is it InnoDB’s implementation of MVCC incomplete. No. Heikki and his team have a far greater understanding of data theory then most database experts
  • Is it the MySQL kernel interfering with the InnoDB storage engine? No, this is not possible as the MySQL kernel has passed the queries to InnoDB, and InnoDB is handling these threads independently.
  • Is it a gap locking issue, a problem that can cause deadlocks when inserting data in a high concurrency situation? Not likely as the data is inserted in primary key, i.e. auto increment order, and there are no gaps.
  • Is it related to InnoDB access method via the primary key, where InnoDB uses a clustered index to store the primary key. Given the data is physically in primary key order, this clustered index would in theory reduce possible locking.
  • Is it related to the page size of indexes, e.g. the 16k index page, effectively causing a page level lock for overlapping index data? My understanding is that InnoDB supports row level locking, and MVCC should cater for this.
  • Is is related to the ranges of primary keys being adjacent, i.e. 1,000,000 and 1,000,001. Not likely as I can reproduce the problem not using adjacent ranges.
  • Is it some weird interaction to managing the undo space of the transactions in the Innodb buffer pool?
  • Is it some weird interaction with marking/locking the dirty pages in the Innodb buffer pool of modified pages?
  • Is it some weird interaction with logging the successful Innodb transaction to the redo logs.

I’ve listed these points more as an information exercise for all those that have less understanding of the problem to see my though process.

Additional testing can definitely be performed. Additional analysis of InnoDB internals with SHOW ENGINE INNODB STATUS such as spin waits, OS waits (context switches), looking at Mutexes with SHOW ENGINE INNODB MUTEX can be undertaken.

My hope and request is that this has been observed by others and that a simple hybrid solution exists.