Part 2 – Simple lessons in improving scalability

Given the popular response from my first lesson in improving scalability where I detailed simple ways to eliminate unnecessary SQL, let me share another common bottleneck with MySQL scalability that can be instantly overcome.

Analyzing the writes that occur on a system can expose obvious potential bottlenecks. The MySQL Binary Log is a wealth of information that can be mined. Simple DML Counts per table can be achieved by a single line command.

Let’s look at the following example output of a production system:

mysqlbinlog /path/to/mysql-bin.000999 |  
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | 
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  
   sed -e "s/t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  
   sort | uniq -c | sort -nr

Of the approx 100,000 DML statements we get the following breakdown.

  55283 update sessions
  25204 insert into sessions
  12610 update items
  10536 insert into item_categories
   7532 update users
   5168 delete from item_categories
 

More then 50% of the statements that are written to the binary log and therefore replicated are INSERT’s into the sessions table. A further 25% are UPDATE’s to the same table. This represents 75% of DML statements in just the two most frequent statements.

What is disappointing is that these statements do not belong in MySQL. This is an example of when MySQL is being abused for a purpose where other products are more suited. While there is the argument in using MySQL for storing data, the impact in MySQL memory management, backup/recovery and slave replication throughput and lag can significantly impact scalability of your important MySQL data.

What is observed here is session management where a key value store product should be used as an alternative. In most circumstances it is likely this information is not even required to be persisted. The obvious replacement here is using memcached. If you do wish to persist this data there is an ever increasing list of products including Redis, Tokyo Cabinet/Kyoto Cabinent, Membrain, Voldemort etc that are specifically designed as a key-value store. Even the popular noSQL MongoDB can be easily substituted to perform as a key-value session manager with the added benefits of being a more fully functional product for other purposes.

This is often a common mistake when you use a framework such as Ruby on Rails (RoR) or PHP Code Igniter and many others.

Optimizing UPDATE and DELETE statements

Updated Nov 2011. Check out my latest book on Optimizing SQL Statements for more information. MySQL 5.6.2 also now provides an EXPLAIN syntax for UPDATE and DELETE statements natively.

While most people look at performance optimizations for SELECT statements, UPDATE and DELETE statements are often overlooked. These can benefit from the principles of analyzing the Query Execution Plan (QEP). You can only run an EXPLAIN on a SELECT statement, however it’s possible to rewrite an UPDATE or DELETE statement to perform like a SELECT statement.

To optimize an UPDATE, look at the WHERE clause. If you are using the PRIMARY KEY, no further analysis is necessary. If you are not, it is of benefit to rewrite your UPDATE statement as a SELECT statement and obtain a QEP as previously detailed to ensure optimal indexes are used. For example:

UPDATE t
SET	c1 = ‘x’, c2 = ‘y’, c3 = 100
WHERE c1 = ‘x’
AND	d = CURDATE()

You can rewrite this UPDATE statement as a SELECT statement for using EXPLAIN:

EXPLAIN SELECT c1, c2, c3 FROM	t WHERE c1 = ‘x’ AND	d = CURDATE()

You should now apply the same principles as you would when optimizing SELECT statements.

Simple lessons in improving scalability

It can be very easy to improve scalability with a MySQL server by a few simple rules. Here is one of them.

“The most efficient way to improve an SQL statement is to eliminate it”

There are numerous ways to eliminate SQL statements, however before I give a classic example that I’ve observed again with a client, let me explain the basic premise of why this improves scalability?

The MySQL kernel can only physically process a certain number of SQL statements for a given time period (e.g. per second). Regardless of the type of machine you have, there is a physical limit. If you eliminate SQL statements that are unwarranted and unnecessary, you automatically enable more important SQL statements to run. There are numerous other downstream affects, however this is the simple math. To run more SQL, reduce the number of SQL you need to run.

Here is the output of a small sample of analyzed TCP/IP packets via mk-query-digest.

# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0xD631CB919867DB50  0.0436 47.3%    92 0.0005 1.00  0.00 SELECT TTDOD
#    2 0x04FE01C5B31FD305  0.0258 27.9%   329 0.0001 1.00  0.00 ADMIN PING
#    3 0x93321857BCD8E771  0.0229 24.8%    36 0.0006 1.00  0.00 SELECT TTD

There are many problems here including the Row at a Time (RAT) nature of the SQL, the excessive pings however that’s a topic for another time. Let us look at the first statement.

SELECT `Date` FROM TTDOD WHERE ID = 9999;

That seems a simple enough query however let’s look at the table.

mysql> select count(*) from TTDOD;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

In this case, the query will NEVER return any rows because the table is currently empty. Sure this may change in the future, however as this is more an exception processing situation the simple act of managing the knowledge this table rarely has any rows, and building a solution to inform the application of this can completely eliminate the need for this query to ever be executed.

FYI, the above sample is from less then 2 seconds of sampling. Removing the first query reduces the number of queries executed in this time slice by 20%. Regardless of whether this is typical load or load during a batch job the principle stands. We have not even started to look at what we can do with the next query.

MySQL 5.5.9

You blink and there is a new version. I have not seen an Planet MySQL release as yet about this new version. Release Notes.

I’d like to say I installed it, but I downloaded the Linux – Generic 2.6 (x86, 64-bit), TAR file, only to find it contains 6 rpm files. #fail, I’m using Ubuntu.

You have to scroll to the bottom of the list (another stupid thing for a generic binary choice) to get Linux – Generic 2.6 (x86, 64-bit), Compressed TAR Archive. Double #fail

Leveraging the InnoDB Plugin

Beginning with MySQL 5.1 as an additional plugin and included by default in MySQL 5.5 the InnoDB plugin includes many performance improvements. To leverage the support of new file formats however a very important setting is necessary.

#my.cnf
[mysqld]
innodb_file_per_table

The use of innodb_file_per_table with an existing system or during an upgrade to 5.1 or 5.5 requires a complete reload of your database to use effectively. In summary.

  • Backup all InnoDB tables via mysqldump
  • Drop InnoDB tables
  • Verify InnoDB not used
  • Stop MySQL
  • Enable innodb_file_per_table & simplified innodb_data_file_path (if applicable)
  • Remove ibdata? files
  • Start MySQL
  • Create Tables
  • Reload Data
  • Verify InnoDB Operation
    • The primary reason is we are moving from using a common tablespace to a tablespace per table. InnoDB wil not shrink the common tablespace so this process is necessary in order to purge the diskspace currently being used. You should also reduce your innodb_data_file_path options if specified. For example if currently set to :

      innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
      

      I would suggest you change to

      innodb_data_file_path = ibdata1:100M:autoextend
      

      InnoDB still requires this common tablespace, however now each table has it’s own disk file the volume required is signficantly less.

Interesting MySQL 5.5 upgrade gotcha

Today I discovered an interesting upgrade problem with a client migrating from MySQL 5.0 to 5.5. The client who is undertaking the upgrade reported that MySQL 5.5 did not support the DECIMAL(18,5) data type. I easily confirmed this not to be the case:

mysql> drop table if exists x;
mysql> create table x (col1 DECIMAL(18,5));
Query OK, 0 rows affected (0.01 sec)

Delving more into the issue in question, I looked at the complete CREATE TABLE statement, recreating the syntax.

mysql> drop table if exists x;
mysql> create table x ( MinValue DECIMAL(18,5));
Query OK, 0 rows affected (0.00 sec)

No problem there.

mysql> drop table if exists x;
mysql> create table x (Department INT NOT NULL, MinValue DECIMAL(18,5) NULL, MaxValue DECIMAL(18,5) NULL);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'MaxValue DECIMAL(18,5) NULL)' at line 1

I could immediately see the problem because if you look closely at the error message it starts with the word “MaxValue”, most likely this is a reserved word. This was easily confirmed with:

mysql> create table x (Department INT NOT NULL, MinValue DECIMAL(18,5) NULL, `MaxValue` DECIMAL(18,5) NULL);
Query OK, 0 rows affected (0.00 sec)

And a confirmation with the 5.5 Reference Manual does indeed show there are new reserved words in MySQL 5.5. These are:

  • GENERAL
  • IGNORE_SERVER_IDS
  • MASTER_HEARTBEAT_PERIOD
  • MAXVALUE
  • RESIGNAL SIGNAL
  • SLOW

Higher Availability (HA) starts with two database servers

Many early startups that use a single server for all services or a single database server for their website talk about how they would like to achieve higher availability with MySQL. This is not possible without at least two database servers. Using MySQL replication you can then support higher availability in several varying capacities. An additional MySQL database server can satisfy several infrastructure needs including:

  • A primary hot backup
  • A datasource for performing backups
  • Read scalability infrastructure
  • A reporting server
  • A benchmarking server
  • A fail-over/fail back master environment

MySQL replication is very easy to configure and deploy, a task that takes < 30 minutes for an experienced MySQL DBA. However, altering your backup and recovery strategy, modifying your application to support read/write splitting or implementing a fail-over/fail back strategy (also known as a MySQL Pairs implementation) are more complex tasks for implementing a higher availability MySQL solution.

About the Author

Ronald Bradford is a well respected industry expert in the MySQL community. Ronald is internationally recognized as an Oracle ACE Director in MySQL, a published author of Expert PHP and MySQL and the all-time top individual contributor of MySQL blog content at Planet MySQL.

Do you use PHPMyAdmin?

If so then were is it installed on your publicly accessible website. If the location is where the documentation states not to put it, or in other popular locations then you can easily become open to an attack. The following are apache logs of a simple hacker test to find a potential security hole on a new IP public address for a client.

My advice is never have PHPMyAdmin accessible by default. You should use HTTP authentication, firewall rules and additional security practices all to protect any level of access to your data.

78.111.81.180 - - [02/Feb/2011:05:29:24 -0500] "GET //phpmyadmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:24 -0500] "GET //phpMyAdmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:24 -0500] "GET //pma/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:24 -0500] "GET //dbadmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:25 -0500] "GET //myadmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:25 -0500] "GET //phppgadmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:26 -0500] "GET //PMA/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:26 -0500] "GET //admin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
78.111.81.180 - - [02/Feb/2011:05:29:26 -0500] "GET //MyAdmin/ HTTP/1.1" 401 290 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"

If you are an administrator, 78.111.81.180 should be added to your blacklist permanently.
I should also state I do not use PHPMyAdmin on public servers, and also note the 401 response.