Calculating your database size

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb.

This exact query *provoked* a crash:

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.

My favorite MySQL data type – DECIMAL(31,0)

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed a clear improvement in Innodb buffer pool reads and hit ratio.

Today’s lesson as described in my 2008 conference presentation Top 20 design tips for data architects is, choose the right integer data type for your data.

Percona Performance Conference Talk

My final presentation during the 2009 MySQL Conference and Expo week was with the Percona Performance Conference on the topic of The Ideal Performance Architecture. My talk included discussions on Technology, Disk, Memory, Indexes, SQL and Data.

Updated 09/18/09
you can now see video of the event at Percona TV.