Understanding when EXPLAIN is not using an index as intended

When reading a MySQL Query Execution Plan (QEP) produced by the EXPLAIN command, generally one of the first observations is to validate an index is being used per table (i.e. per row of output). In MySQL, this is observed with the key column.

In the following two simple single table examples we see the use of the PRIMARY key. To the untrained eye this may lead to assume that the right index is being used.

Example 1

+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table          | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | txxxxxxxxxxxx  | index | NULL          | PRIMARY | 4       | NULL |  100 | Using where |
+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------------+

Example 2

+----+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | txxxxxxxxx | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+------------+-------+---------------+---------+---------+-------+------+-------+

However this is not entirely true. While used, it is not as intended. A MySQL index can be used during three different stages of the query execution, the JOIN/WHERE, the GROUP BY and the ORDER BY. This type of use can be seen by looking at the Index Hint Syntax which enables you to suggest or force an index at these three various stages.

Without looking at the SQL statements for the above plans, the first giveaway is the possible_keys column which indicates the indexes considered when evaluating the JOIN/WHERE of your SQL statement. The absence in the first query informs you that no index was used. The use of the PRIMARY key in the first query is the result of an ORDER BY syntax. If we remove this ORDER BY we see the true possible execution.

+----+-------------+----------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+----------------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | txxxxxxxxxxxx  | ALL  | NULL          | NULL | NULL    | NULL | 262827 | Using where |
+----+-------------+----------------+------+---------------+------+---------+------+--------+-------------+

The first query also includes a LIMIT and hence gives the perception that only a small number of rows are processed. In relational theory you would surmise this query is not efficient. Unfortunately as this example was on version MySQL 5.5, it was not possible to use the optimizer_trace functionality in MySQL 5.6 to delve deeper into understanding what decisions MySQL would take.

This example is not to say that you should add an index. This is one possible outcome in optimizing the SQL statement, however there are advantages and disadvantages. Each SQL statement should be reviewed in conjunction with it’s usage, the overall table structure(s) and all other SQL statements that utilize applicable tables.

Tagged with: MySQL

Producing Alternative Means statistics with SQL

MySQL’s built-in AVG() computes the arithmetic mean — the sum divided by the count. That is the right default for many questions, but it is not always the right measure of central tendency.

Extending MySQL Capabilities with UDFs, Plugins and Components

MySQL offers three different approaches to extending the SQL capabilities with the default product you download and install. These are: User Defined Function (UDF) MySQL Manual MySQL Plugin MySQL Manual MySQL Component MySQL Manual For the purposes of this post I will be using the current LTS version MySQL 8.

Producing One-Sample Z-Test statistics with SQL

The one-sample Z-test determines whether a sample mean differs significantly from a known population mean when the population standard deviation is also known. It is the appropriate test when the population parameters are established — quality control benchmarks, national averages, long-run process measurements — and you want to evaluate whether a new sample is consistent with them.