Simple lessons in improving scalability

It can be very easy to improve scalability with a MySQL server by a few simple rules. Here is one of them.

“The most efficient way to improve an SQL statement is to eliminate it”

There are numerous ways to eliminate SQL statements, however before I give a classic example that I’ve observed again with a client, let me explain the basic premise of why this improves scalability?

The MySQL kernel can only physically process a certain number of SQL statements for a given time period (e.g. per second). Regardless of the type of machine you have, there is a physical limit. If you eliminate SQL statements that are unwarranted and unnecessary, you automatically enable more important SQL statements to run. There are numerous other downstream affects, however this is the simple math. To run more SQL, reduce the number of SQL you need to run.

Here is the output of a small sample of analyzed TCP/IP packets via mk-query-digest.

# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0xD631CB919867DB50  0.0436 47.3%    92 0.0005 1.00  0.00 SELECT TTDOD
#    2 0x04FE01C5B31FD305  0.0258 27.9%   329 0.0001 1.00  0.00 ADMIN PING
#    3 0x93321857BCD8E771  0.0229 24.8%    36 0.0006 1.00  0.00 SELECT TTD

There are many problems here including the Row at a Time (RAT) nature of the SQL, the excessive pings however that’s a topic for another time. Let us look at the first statement.

SELECT `Date` FROM TTDOD WHERE ID = 9999;

That seems a simple enough query however let’s look at the table.

mysql> select count(*) from TTDOD;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

In this case, the query will NEVER return any rows because the table is currently empty. Sure this may change in the future, however as this is more an exception processing situation the simple act of managing the knowledge this table rarely has any rows, and building a solution to inform the application of this can completely eliminate the need for this query to ever be executed.

FYI, the above sample is from less then 2 seconds of sampling. Removing the first query reduces the number of queries executed in this time slice by 20%. Regardless of whether this is typical load or load during a batch job the principle stands. We have not even started to look at what we can do with the next query.

Tagged with: Databases MySQL

Related Posts

Successful MySQL Scalability Presentation

Last night I was the invited guest at the SF MySQL Meetup . In my presentation “Successful MySQL Scalability” I talked about a set of principles to ensure appropriate system architecture, data availability and best practices to build an ideal solution for your business.

Read more

Speaking at Surge Scalability 2010 – Baltimore, MD

I will be joining a great list of quality speakers including John Allspaw, Theo Schlossnagle, Rasmus Lerdorf and Tom Cook at Surge 2010 in Baltimore, Maryland on Thu 30 Sep, and Fri Oct 1st 2010.

Read more

Five reasons to upgrade to MySQL 5.5

Updated Nov 2011. Check out my Reasons to use MySQL 5.5 Presentation for more in-depth information about installing/configuring and using MySQL 5.5 I have been looking forward to the general availability (GA) release of MySQL 5.

Read more