Testing your system

I have raised this specific topic 3 times this week alone, twice in a MySQL setting.

The fundamental philosophy of testing is NOT to verify features of your product that work, it is to BREAK your system.

One such discussion this week was with a service provider that deployed a new system into an existing ecosystem. The release has been delayed due to development issue, and credibility with customers is now being further damaged because the system is reaching physical hardware limitations after just one month.

With this was described to me, my simple response was. You did not test you system to stress the system to breaking point. To know the limit of your capacity ahead of time is a proactive analysis, not a reactive one.

It’s not that complicated to do, easier in early stage before you have a 50-100-1000 server total environment, but it’s a best practice not see often enough.

Configuration management concepts for database objects

Correctly managing your MySQL database objects such as schemas, tables, indexes, base data etc, is critical to the success of a 24×7 online website. I rarely encounter a robust working solution as part of my consulting so I would like to share my experience in identifying the best practices you should be adopting whether your an existing organization or just an individual with a simple website.

Much of the following concept actually pre dates my involvement in MySQL (since 1999), so this is not just applicable for a MySQL RDBMS. For the purposes of this discussion I’d like to focus on the theory successfully used with clients.

Under version control I have the following directory structure:

NOTE If your first observation was “Arrh, Version Control?”, you are in more trouble then you want to be right from day one. You need Version Control such as svn, cvs, bzr, git etc for any website no matter how small.

/database
  /scripts
  /sql
    /schema
    /patch
    /revert
    /admin
  /data
  ....

The /database is a top level directory, and for software packaging for all database related operations, you simply include all contents from /database.

At it’s core, every database object change for configuration management will be addressed in three (3) files.

  • A schema file
  • A patch file
  • A revert file

In fact, you can add version control rules for example to ensure if you add a patch file, a corresponding revert and schema file is also specified.

For a “current” working environment, there are two paths for database object management.

  • An upgrade path
  • A new version creation.

An upgrade path which is the normal production operation, takes an existing database schema and ‘patches’ this to a new revision. As the name suggests, for each ‘patch’ file a corresponding ‘revert’ file can be used to revert the upgrade. For testing and development environments, a current version of the full schema can always be created without using the upgrade path simply by creating the schema with the current schema file.

For the purposes of understanding how this would work in a real environment, I’ll use the Sakila test database and I’ll step through a few examples.

Seeding your configuration management

Because we already have an existing schema, the first step is to seed our new configuration management with the existing schema information.

This would actually involve some duplication, however this will become more apparent in future examples.

We will be creating the following three (3) files:

  • /database/sql/schema/schema.sakila.sql
  • /database/sql/patch/patch.20090303.01.sql
  • /database/sql/revert/revert.20090303.01.sql

/database/sql/schema/schema.sakila.sql
This will be a copy of the sakila-db/sakila-schema.sql. You will need to edit this file to remove the following lines.

DROP SCHEMA IF EXISTS sakila;
CREATE SCHEMA sakila;
USE sakila;

All configuration files must not contain any schema definitions. This will be discussed in more detail at a later time.

/database/sql/schema/patch.20090303.01.sql
This will be a copy of the above file.

/database/sql/schema/revert.20090303.01.sql

DROP PROCEDURE IF EXISTS rewards_report ;
DROP FUNCTION IF EXISTS get_customer_balance;
DROP PROCEDURE IF EXISTS film_in_stock;
DROP PROCEDURE IF EXISTS film_not_in_stock;
DROP FUNCTION IF EXISTS inventory_held_by_customer;
DROP FUNCTION IF EXISTS inventory_in_stock;
DROP VIEW IF EXISTS customer_list;
DROP VIEW IF EXISTS film_list;
DROP VIEW IF EXISTS nicer_but_slower_film_list;
DROP VIEW IF EXISTS staff_list;
DROP VIEW IF EXISTS sales_by_store;
DROP VIEW IF EXISTS sales_by_film_category;
DROP VIEW IF EXISTS actor_info;
DROP TABLE IF EXISTS actor;
DROP TABLE IF EXISTS address;
DROP TABLE IF EXISTS category;
DROP TABLE IF EXISTS city;
DROP TABLE IF EXISTS country;
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS film;
DROP TABLE IF EXISTS film_actor;
DROP TABLE IF EXISTS film_category;
DROP TABLE IF EXISTS film_text;
DROP TABLE IF EXISTS inventory;
DROP TABLE IF EXISTS language;
DROP TABLE IF EXISTS payment;
DROP TABLE IF EXISTS rental;
DROP TABLE IF EXISTS staff;
DROP TABLE IF EXISTS store;

For the purposes of this theory, I will discuss schema creation and management at a later time. For this example, we assume the ‘sakila’ schema has been created and is empty.

The ‘two’ paths

The default path is to apply the patch file to the appropriate schema. In this case, by using the patch file, this would create the current ‘sakila’ schema.

If this fails for example, you should automatically apply the revert script which should restore your environment to it’s original state, in this case an empty schema.

If you wanted to create a new test environment for example, (following creation of the schema), you could simply apply the schema file.

Let’s perform another iteration, to see the full working process.

Adding new objects

Let’s say we wanted to keep additional information such as famous quotes an actor has made. We want to create a new table ‘actor_quote’.

For this we would first create a patch and revert script to manage this new object.
/database/sql/patch/patch.20090303.02.sql

CREATE TABLE actor_quote (
  quote_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  actor_id SMALLINT UNSIGNED NOT NULL,
  quote   VARCHAR(200) NOT NULL,
  PRIMARY KEY  (quote_id),
  KEY idx_fk_actor_id (actor_id),
  CONSTRAINT fk_actor_quote_actor FOREIGN KEY (actor_id) REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

/database/sql/revert/revert.20090303.02.sql

DROP TABLE IF EXISTS actor_quote;

/database/sql/schema/schema.sakila.sql
The contents of the patch file should be appended to this file.

The ‘two’ paths

If we look at the two paths again.

The normal production operation, by using the patch file, would create the new database object.

If this fails for example, you should automatically apply the revert script which should restore your environment to it’s original state, in this case drop the table if it exists. In this simplest example,

If you wanted to create a new test environment for example, (following creation of the schema), you could simply apply the schema file.

Review

We have only touched on the entire process of configuration management for database objects. The implementation of this practice includes meta data and controlling scripts that manage the order of execution, recording operations performed successfully or unsuccessfully for example.

About Standards

Within this overview a number of standards are in place. These include:

  • SQL scripts do not contain any CREATE/DROP DATABASE|SCHEMA commands
  • SQL scripts do not contain any schema/database specific references. This is important for being able to easily test and verify operations. In our above examples, the default Sakila DB contains such information and would be edited appropriately.
  • For Patch and Revert files a chronological date format for naming is used, e.g. YYYYMMDD.XX, where XX is a sequential number for multiple patch/revert scripts for any given day.
  • All SQL statements must be terminated with ‘;’. This is important for the management processes and automated scripts that take these fundamental schema/patch/revert scripts as source information.
  • Where possible, try to make revert scripts, support either a successful or failed patch process. For example, adding IF EXISTS to a DROP TABLE statements supports both cases.
  • It is reasonably obvious to have schema, patch and revert directories as a naming standard, but file name also include this as a prefix. This is performed as a double check, if a file is seen in isolation it’s type can be determined regardless of directory location. Also for logging, only filenames are used.

More Information

Configuration Management in MySQL is one of the topics discussed in the “MySQL Essentials” training course. You can find more information regarding this and other training offerings including an upcoming schedule at 42SQL Education.

Planet MySQL at a new URL

Did anybody notice that http://planetmysql.org now redirects to http://planet.mysql.com?

Curious to know the reason why, perhaps an official MySQL person can give us some details.
Also it’s a 302 redirect, not a 301 redirect, interesting?

 wget http://planetmysql.org
--2009-02-26 14:40:09--  http://planetmysql.org/
Resolving planetmysql.org... 213.136.52.29
Connecting to planetmysql.org|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.planetmysql.org/ [following]
--2009-02-26 14:40:10--  http://www.planetmysql.org/
Resolving www.planetmysql.org... 213.136.52.29
Connecting to www.planetmysql.org|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://planet.mysql.com/ [following]
--2009-02-26 14:40:10--  http://planet.mysql.com/
Resolving planet.mysql.com... 213.136.52.29
Connecting to planet.mysql.com|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

Eliminating unnecessary internal temporary tables

I can’t stress enough that people look at SQL statements that are being executed against your production MySQL database, and you optimize queries when you can.

Often it’s the improvement to the large number of similar queries executed that can optimize resources. In this example, we take a very simple query, and by removing an unnecessary order by, we eliminate MySQL internally creating a temporary (in memory) table.

So what’s the big deal.

  • The query is simpler to read and understand
  • Memory required for the connection is not assigned
  • A number of internal steps are no longer required (4 of 21 logging messages, not an ideal measurement, but an indication). In this case, it was easily a 10% performance improvement for each query.

This query is executed 10-100 times per second, so the improvement in performance is significant.

mysql> explain select max(mdate) as mdate from tbl  where original_account = '[email protected]' and id = '15847' order by mdate desc;
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
| id | select_type | table                | type | possible_keys        | key              | key_len | ref         | rows | Extra                        |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
|  1 | SIMPLE      | tbl                  | ref  | ids,original_account | original_account | 388     | const,const |  146 | Using where; Using temporary |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
1 row in set (0.00 sec)
mysql> explain select max(mdate) as mdate from tbl  where original_account = '[email protected]' and id = '15847';
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
| id | select_type | table                | type | possible_keys        | key              | key_len | ref         | rows | Extra       |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | tbl                  | ref  | ids,original_account | original_account | 388     | const,const |  146 | Using where |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
mysql> show profile cpu,memory,source for query 1;
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
| Status                         | Duration | CPU_user | CPU_system | Source_function           | Source_file   | Source_line |
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
| (initialization)               | 0.00001  | 0        | 0          | send_result_to_client     | sql_cache.cc  |        1143 |
| checking query cache for query | 0.000079 | 0        | 0          | open_tables               | sql_base.cc   |        2652 |
| Opening tables                 | 0.000024 | 0        | 0          | mysql_lock_tables         | lock.cc       |         153 |
| System lock                    | 0.000015 | 0        | 0          | mysql_lock_tables         | lock.cc       |         163 |
| Table lock                     | 0.000041 | 0        | 0          | mysql_select              | sql_select.cc |        2282 |
| init                           | 0.000046 | 0        | 0          | optimize                  | sql_select.cc |         765 |
| optimizing                     | 0.000027 | 0        | 0          | optimize                  | sql_select.cc |         924 |
| statistics                     | 0.000173 | 0        | 0          | optimize                  | sql_select.cc |         934 |
| preparing                      | 0.000028 | 0        | 0          | optimize                  | sql_select.cc |        1383 |
| Creating tmp table             | 0.000053 | 0        | 0          | exec                      | sql_select.cc |        1603 |
| executing                      | 0.000011 | 0        | 0          | exec                      | sql_select.cc |        1743 |
| Copying to tmp table           | 0.002226 | 0        | 0          | exec                      | sql_select.cc |        2123 |
| Sending data                   | 0.000148 | 0        | 0          | mysql_select              | sql_select.cc |        2327 |
| end                            | 0.000013 | 0        | 0          | free_tmp_table            | sql_select.cc |       10115 |
| removing tmp table             | 0.000064 | 0        | 0          | free_tmp_table            | sql_select.cc |       10143 |
| end                            | 0.000014 | 0        | 0          | mysql_execute_command     | sql_parse.cc  |        5154 |
| query end                      | 0.000012 | 0        | 0          | query_cache_end_of_result | sql_cache.cc  |         735 |
| storing result in query cache  | 0.000047 | 0        | 0          | mysql_parse               | sql_parse.cc  |        6155 |
| freeing items                  | 0.000021 | 0        | 0          | dispatch_command          | sql_parse.cc  |        2146 |
| closing tables                 | 0.000014 | 0        | 0          | log_slow_statement        | sql_parse.cc  |        2204 |
| logging slow query             | 0.000011 | 0        | 0          | dispatch_command          | sql_parse.cc  |        2169 |
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
21 rows in set (0.00 sec)


mysql> show profile cpu,memory,source for query 2;
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
| Status                         | Duration  | CPU_user | CPU_system | Source_function           | Source_file   | Source_line |
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
| (initialization)               | 0.000021  | 0        | 0          | send_result_to_client     | sql_cache.cc  |        1143 |
| checking query cache for query | 0.000090  | 0        | 0          | open_tables               | sql_base.cc   |        2652 |
| Opening tables                 | 0.000022  | 0        | 0          | mysql_lock_tables         | lock.cc       |         153 |
| System lock                    | 0.000014  | 0        | 0          | mysql_lock_tables         | lock.cc       |         163 |
| Table lock                     | 0.000044  | 0        | 0          | mysql_select              | sql_select.cc |        2282 |
| init                           | 0.000049  | 0        | 0          | optimize                  | sql_select.cc |         765 |
| optimizing                     | 0.000028  | 0        | 0          | optimize                  | sql_select.cc |         924 |
| statistics                     | 0.000179  | 0        | 0          | optimize                  | sql_select.cc |         934 |
| preparing                      | 0.000029  | 0        | 0          | exec                      | sql_select.cc |        1603 |
| executing                      | 0.000016  | 0        | 0          | exec                      | sql_select.cc |        2123 |
| Sending data                   | 0.00229   | 0        | 0          | mysql_select              | sql_select.cc |        2327 |
| end                            | 0.000039  | 0        | 0          | mysql_execute_command     | sql_parse.cc  |        5154 |
| query end                      | 0.000012  | 0        | 0          | query_cache_end_of_result | sql_cache.cc  |         735 |
| storing result in query cache  | 0.000011  | 0        | 0          | mysql_parse               | sql_parse.cc  |        6155 |
| freeing items                  | 0.00002   | 0        | 0          | dispatch_command          | sql_parse.cc  |        2146 |
| closing tables                 | 0.000014  | 0        | 0          | log_slow_statement        | sql_parse.cc  |        2204 |
| logging slow query             | 0.00001   | 0        | 0          | dispatch_command          | sql_parse.cc  |        2169 |
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
17 rows in set (0.00 sec)

Announcing "MySQL Essentials" Training

Are you having problems getting up to speed on MySQL? Are you asking yourself “Is there a hands-on training course we can send a developer/system admin to learn MySQL?”. In response, at 42SQL we have put together two new training courses, MySQL Essentials and MySQL Operations.

MySQL Essentials Training Details

With MySQL Essentials we tackle the core essentials that a developer/system admin/junior DBA would require in order to support an initial development environment that uses MySQL. Essentials training teaches the following skills:

  • Which version of MySQL to use (including the various different variants and patches available)
  • Backup, retention, and recovery strategies
  • Configuration and Monitoring of MySQL
  • Optimal schema and data objects configuration management
  • more information here

We are now accepting registrations for MySQL Essentials training being held on April 1st – 2nd in New York, and April 6th – 7 th in Washington DC.

About the presenter

Ronald Bradford is a two-decade veteran with extensive database experience in MySQL, Oracle and Ingres. His expertise covers data architecture, software development, migration, performance analysis and production system implementations. With ten years experience in MySQL, his involvement in the MySQL ecosystem has included working as Senior Consultant with MySQL Inc, speaker at four MySQL Conferences, and creator of the “MySQL for Oracle DBA’s” one-day workshop. Ronald holds MySQL Certifications including DBA 5.0, Developer 5.0 and MySQL Cluster 5.1.

The art of looking at the actual SQL statements

It’s a shame that MySQL does not provide better granularity when you want to look at all SQL statements being executed in a MySQL server. I canvas that you can with the general log, but the inherit starting/stopping problems in 5.0, improved in 5.1, but I would still like to see the option on a per connection basis, or even a time period. MySQL Proxy can provide a solution here but also with some caveats.

You should however in a NON production environment, take the time to enable the general log and look the SQL Statements. Prior to looking at the SQL, monitoring of the GLOBAL STATUS variables combined with Statpack revealed the following in a 1 minute interval.

====================================================================================================
                                         Statement Activity
====================================================================================================

                     SELECT:           16,042                   267.37                8,177,050 (46.03%)
                     INSERT:            5,838                    97.30                1,826,616 (10.28%)
                     UPDATE:            1,109                    18.48                  738,546 (4.16%)
                     DELETE:            2,018                    33.63                1,374,983 (7.74%)
                    REPLACE:                0                     0.00                        0 (0.00%)
          INSERT ... SELECT:                0                     0.00                       27 (0.00%)
         REPLACE ... SELECT:                0                     0.00                        0 (0.00%)
               Multi UPDATE:                0                     0.00                        0 (0.00%)
               Multi DELETE:                0                     0.00                        0 (0.00%)
                     COMMIT:            5,708                    95.13                2,161,232 (12.17%)
                   ROLLBACK:            5,746                    95.77                3,485,828 (19.62%)

If you notice the last 2 lines, some 19% of statements executed on the server are ROLLBACK. Further analysis of the schema shows mainly Innodb tables (good as COMMIT and ROLLBACK are supported), but also some MyISAM tables.

The following is a snippet from the general log.

                     23 Query       select 1
                     23 Query       INSERT INTO JMS_TRANSACTIONS (TXID) values(17719)
                     23 Query       UPDATE JMS_MESSAGES SET TXID=17719, TXOP='D' WHERE MESSAGEID=16248 AND DESTINATION='QUEUE.receivemail'
                     23 Query       commit
                     23 Query       rollback
                     23 Query       select 1
                     23 Query       DELETE FROM JMS_MESSAGES WHERE TXID=17719 AND TXOP='D'
                     23 Query       DELETE FROM JMS_TRANSACTIONS WHERE TXID = 17719
                     23 Query       commit
                     23 Query       rollback

This turns out to be most interesting. These tables are use by Java Messaging Service but I observed three points.

  • the ‘select 1′ is effectively a ping test to confirm the connection is still valid. MySQL provides a more lightweight COM_PING. It would be good to know if this environment using JBoss could support that.
  • There is a ‘ROLLBACK’ after every command, totally redundant, and most likely part of higher level framework.
  • The ‘COMMIT’ is used in conjunction with a number of statements, however when I mentioned earlier some tables were MyISAM, these were the JMS tables, so in this situation the commit is useless as this is not a transactional storage engine.

A number of decisions are needed to correct this problem, however the point of raising this is, always look at the your SQL.

Watching a slave catchup

This neat one line command can be of interest when you are rebuilding a MySQL slave and replication is currently catching up.

$ watch --interval=1 --differences 'mysql -uuser -ppassword -e "SHOW SLAVE STATUS\G"'

You will see the standard SHOW SLAVE STATUS output, but the watch command presents an updated view every second, and highlights differences. This can be useful in a background window to keep an eye on those ‘Seconds Behind Master’.

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.10.10.10
                Master_User: slave
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000626
        Read_Master_Log_Pos: 88159239
             Relay_Log_File: slave-relay.000005
              Relay_Log_Pos: 426677632
      Relay_Master_Log_File: mysql-bin.000621
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 426677495
            Relay_Log_Space: 2714497549
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 24131

Some Drupal observations

I had the opportunity to review a client’s production Drupal installation recently. This is a new site and traffic is just starting to pick up. Drupal is a popular LAMP stack open source CMS system using the MySQL Database.

Unfortunately I don’t always have the chance to focus on one product when consulting, sometimes the time can be minutes to a few hours. Some observations from looking at Drupal.

Disk footprint

Presently, volume and content is of a low volume, but expecting to ramp up. I do however find 90% of disk volume in one table called ‘watchdog';


+--------------+--------------+--------------+-------------+--------+
| table_schema | total_mb     | data_mb      | index_mb    | tables |
+--------------+--------------+--------------+-------------+--------+
| xxxxx        | 812.95555878 | 745.34520721 | 67.61035156 |    191 |
+--------------+--------------+--------------+-------------+--------+

+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| table_name                                | engine | row_format | table_rows | avg_row_length | total_mb     | data_mb      | index_mb    |
+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| watchdog                                  | MyISAM | Dynamic    |      63058 |            210 | 636.42242813 | 607.72516251 | 28.69726563 |
| cache_menu                                | MyISAM | Dynamic    |        145 |         124892 |  25.33553696 |  25.32577133 |  0.00976563 |
| search_index                              | MyISAM | Dynamic    |     472087 |             36 |  23.40134048 |  16.30759048 |  7.09375000 |
| comments                                  | MyISAM | Dynamic    |      98272 |            208 |  21.83272934 |  19.58272934 |  2.25000000 |

Investigating the content of the ‘watchdog’ table shows detailed logging. Drilling down just on the key ‘type’ records shows the following.

mysql> select message,count(*) from watchdog where type='page not found' group by message order by 2 desc limit 10;
+--------------------------------------+----------+
| message                              | count(*) |
+--------------------------------------+----------+
| content/images/loadingAnimation.gif  |    17198 |
| see/images/loadingAnimation.gif      |     6659 |
| images/loadingAnimation.gif          |     6068 |
| node/images/loadingAnimation.gif     |     2774 |
| favicon.ico                          |     1772 |
| sites/all/modules/coppa/coppa.js     |      564 |
| users/images/loadingAnimation.gif    |      365 |
| syndicate/google-analytics.com/ga.js |      295 |
| content/img_pos_funny_lowsrc.gif     |      230 |
| content/google-analytics.com/ga.js   |      208 |
+--------------------------------------+----------+
10 rows in set (2.42 sec)

Some 25% of rows is just the reporting one missing file. Correcting this one file cuts down a pile of unnecessary logging.

Repeating Queries

Looking at just 1 random second of SQL logging shows 1200+ SELECT statements.
355 are SELECT changed FROM node

$ grep would_you_rather drupal.1second.log
              7 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              5 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              3 Query       SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1
              4 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              6 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
             10 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              9 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              8 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              9 Query       SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1

There is plenty of information regarding monitoring the Slow Queries in MySQL, but I have also promoted that’s it not the slow queries that ultimately slow a system down, but the 1000’s of repeating fast queries.

MySQL of course has the Query Cache to assist, but this is a course grade solution, and a high volume read/write environment this is meaningless.

There is a clear need for either a application level caching, or a database redesign to pull rather then poll this information, however without more in depth review of Drupal I can not make any judgment calls.

Best Practices in Migrating to MySQL

This week I was the invited speaker to give a 4 hr presentation to the Federal Government Sector in Washington DC on “Best Practices in Migrating to MySQL“. This was a followup to my day long “MySQL for the Oracle DBA Bootcamp” which I presented in Washington DC last year. It was good to see a number of attendees from my first DC presentation.

There was good attendance across various government departments and companies providing services to the government sector, as well a variety of job descriptions.

Thanks to Carahsoft and Sun/MySQL for organizing and sponsoring the event. Thanks also to Phil Hildebrand who provided fantastic support during my preparation answering all my SQL Server questions.

Thanks also to Baron Schwartz creator of Maatkit who as my invited guest was nice enough to table a list of attendee questions, which is always a good reference for revising slides and writing more blog posts.

You can find the first of seven sessions online in my presentations section.

Updated
Thanks to Baron Schwartz for his follow-up blog posts Migrating US Government applications from Oracle to MySQL and 50 things to know before migrating Oracle to MySQL.

Strict mode can still throw warnings

MySQL by default is vary lax with data validation. Silent conversions is a concept that is not a common practice in other databases. In MySQL, instead of throwing an error, a warning was thrown and many applications simply did not handle warnings. With the introduction of sql_mode=STRICT_ALL_TABLES (or TRADITIONAL), in MySQL 5, a better level of validation now exists.

My understanding was that Warnings are now thrown as Errors, therefore eliminating the need to do a SHOW WARNINGS to confirm any problems after every query (this is a performance overhead on a high volume system due to the round trip latency).

However I found an instance where MySQL in STRICT Mode still throws warnings, leading to the question, are there any other areas, and does the earlier statement “Warnings are now thrown as Errors” hold true.

Here is my seeding process to showing the problem.

mysql> create table i(i tinyint, unique key( i));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into i values(999);
Query OK, 1 rows affected (0.00 sec)

Using default settings, attempting to INSERT a duplicate row throws an error, using INSERT IGNORE does not.

mysql> insert into i values(999);
ERROR 1062 (23000): Duplicate entry '127' for key 'i'
mysql> insert ignore into i values(999);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

When using a Strict Mode, a recommendation for all new systems, it is generally accepted that warnings are translated into errors, which implies your could should never have to consider checking for warnings.

mysql> truncate table i;
mysql> set sql_mode=strict_all_tables;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into i values(999);
ERROR 1264 (22003): Out of range value for column 'i' at row 1
mysql> insert ignore into i values(999);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)
mysql> set sql_mode=traditional;
Query OK, 0 rows affected (0.00 sec)

mysql> insert ignore into i values(9990);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

I should caveat this post also with using caution with INSERT IGNORE. This should only be used if you never care about errors which I would never consider as a best practice design approach.

Reducing the MySQL 5.1.30 disk footprint

The current size of a MySQL 5.1.30 installation is around 420M.

$ du -sh .
426M	.

A further breakdown.

$ du -sh *
213M	bin
20K	COPYING
9.8M	docs
8.0K	EXCEPTIONS-CLIENT
436K	include
12K	INSTALL-BINARY
121M	lib
504K	man
4.0K	my.cnf
77M	mysql-test
4.0K	README
20K	scripts
2.3M	share
2.9M	sql-bench
100K	support-files

A means to reduce the footprint by 25% is to delete some unused stuff.

$ rm -rf docs/ mysql-test/ sql-bench/
$ du -sh .
337M	.

It’s no big deal, however it certainly does cut down on verbose output in the backup logs removing the mysql-test directory and files.

Best practices for migrating applications to MySQL

In just over 2 weeks I’ll be the invited speaker in Washington DC to Best practices for migrating applications to MySQL. This workshop is being held in conjunction with Carahsoft and Sun/MySQL and aims to provide to the Federal sector valuable information for the continued usage and uptake of Open Source and specifically MySQL.

As part of my preparation I’m happy to hear from any organizations that have successfully migrated from Oracle/SQL Server/Informix/Sybase etc to MySQL and would like to be cited.

While I have been involved in the process I am also happy to hear of reasons why a migration failed, was aborted or postponed. This is all valuable information in determining what are the most ideal applications.

Extending the MySQL Data Landscape

Learn how to extend your existing MySQL based website to leverage the power of MySQL variants, AWS cloud based MySQL deployments and RDBMS alternatives. Evaluate how to integrate and use these different various technologies such as MySQL based variations KickFire, a column based optimization and InfoBright, a data warehousing solution. Understand the means of approach towards data synchronization between various database solutions in your business.

At the MySQL Meetup in New York this month, I spoke on “Extending the MySQL Data Landscape“. A MySQL centric view on an earlier work, “The Data Landscape” which I presented at a recent GoDaddy Tech Day.

,

Dependency error installing mylvmbackup on Ubuntu 8.04

I’ve started an investigation of MySQL Backups using LVM. I’m working with Lenz’s mylvmbackup but I found it both used Perl and needed a number of dependencies installed.

Installing dependencies failed on my test system, yet I found it actually worked when I went back to my dev system (but it is not configured with LVM for full testing).

$ sudo cpan Config::IniFiles Sys::Syslog Date::Format Getopt::Long  DBI

Details of error:

....
 CPAN.pm: Going to build S/SA/SAPER/Sys-Syslog-0.27.tar.gz

WARNING: LICENSE is not a known parameter.
Checking if your kit is complete...
Looks good
'LICENSE' is not a known MakeMaker parameter name.
Writing Makefile for Sys::Syslog
cp Syslog.pm blib/lib/Sys/Syslog.pm
/usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp -noprototypes -typemap /usr/share/perl/5.8/ExtUtils/typemap  Syslog.xs > Syslog.xsc && mv Syslog.xsc Syslog.c
cc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2   -DVERSION="0.27" -DXS_VERSION="0.27" -fPIC "-I/usr/lib/perl/5.8/CORE"  -DUSE_PPPORT_H Syslog.c
In file included from Syslog.xs:6:
/usr/lib/perl/5.8/CORE/perl.h:420:24: error: sys/types.h: No such file or directory
/usr/lib/perl/5.8/CORE/perl.h:451:19: error: ctype.h: No such file or directory
/usr/lib/perl/5.8/CORE/perl.h:463:23: error: locale.h: No such file or directory
/usr/lib/perl/5.8/CORE/perl.h:480:20: error: setjmp.h: No such file or directory

Some searching was necessary to find this thread and confirm that my prod server did not have a correct dev package.

apt-get install libc6-dev

NOTE: While the doc refers to the module File::Basename, trying to install this throws an error which when you investigate further is a false positive. The README does refer to this being normally part of the default perl installation.

The size of memory tables

I was doing some database sizing in MySQL 5.1.30 GA for memory tables. Generally I have used INFORMATION_SCHEMA.TABLES data_length,index_length as a reasonable guide.

However working with a MEMORY table, after deleting rows, the size did not decrease as expected. I deleted 10% of rows, and saw 0% reduction. This was confirmed by doing a subsequent ALTER where I saw the 10% reduction in memory size.

It requires more investigation, however I found these results unexpected and worthy of publishing.

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.30    |
+-----------+


+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| table_name      | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb   |
+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| location_ex4    | MEMORY | Fixed      |    1111000 |             45 | 59.68744659 | 51.16348267 | 8.52396393 |


mysql> delete from location_ex4 limit 111000;
Query OK, 111000 rows affected (0.16 sec)


+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| table_name      | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb   |
+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| location_ex4    | MEMORY | Fixed      |    1000000 |             45 | 59.68744659 | 51.16348267 | 8.52396393 |


mysql> alter table location_ex4 engine=memory;
Query OK, 1000000 rows affected (2.95 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| table_name      | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb   |
+-----------------+--------+------------+------------+----------------+-------------+-------------+------------+
| location_ex4    | MEMORY | Fixed      |    1000000 |             45 | 53.75530243 | 45.97259521 | 7.78270721

Using Flipper to manage MySQL Pairs

As discussed previously in Options using MySQL pairs I have started evaluating the strengths and weaknesses of various open source options. This is an evaluation of Flipper, a product from Proven Scaling a MySQL consulting organization.

Overall

  • Pros When correctly configured and with a working installation it just works, simple and functional, which is good design.
  • Cons The functionality is incomplete especially when it comes to edge cases, additional manual scripting especially for MySQL specifics is necessary and could have be easily added.

The Flipper documentation is detailed, but I found the implementation could have been easier without reading most of the documentation first. The software comes in RPM packages, but as I’m using Ubuntu, installation is via source.

The documentation however assumes your Master/Fail Over master MySQL environment is already correctly configured, and running with Virtual IP’s and the correct read_only status. There is no information for configuration here, so you need to be comfortable with MySQL Replication before starting.

The default notification of IP addresses is managed by arping. Under Ubuntu 8.04 this actually throws an error for a virtual IP on the same host and then Flipper fails to operate as designed. I spent some time to diagnose the problem first before submitting to the flipper-devel list. The response was prompt, the recommendation was to actually use Linux Heatbeat for the purposes of the address notification. The installation of this was easy, via apt-get and the configuration change to Flipper a single row meta data change in one table, which showed good design in this flexibility.

Overall however, Flipper is only a partial solution. It lacked some functionality I just expected would be included in the initial version. The ability to set read_only on a server, Flipper handles this for a controlled failover, but not for just setting against the read only host. There is no means of starting a MySQL slave using the Flipper CLI, you need to do this again manually with additional scripts.

Overall, while a level of information feedback is available, and controlled failover of a correctly working and configured environment works great, manual steps are necessary in the “not ideal” case, when the tool could offer more.

Some points in addition to the supplied documentation.

  • The ‘flipper’ user may only need SELECT privileges to the necessary meta data tables, but it requires ‘SUPER’ privilege for SLAVE management.
  • Installation of arping necessary with ‘sudo apt-get install arping’
  • The arping command syntax needs to be updated for Ubunutu to ”/usr/sbin/arping -I $sendarp_interface -c 5 $sendarp_ip’ ‘. The path and options change. See 2.5.2 ARP sending command. You also need to adjust the Sudo Privileges for the command

Using Heatbeat

The solution to the ‘arping’ problem was to actually use a different command, send_arp which is part of Heatbeat. It’s ironic that Heatbeat is an entire product that could be used for managing pairs. However the following did work.

sudo apt-get install heartbeat
# Install fails consistently on 8.04, following needed
sudo apt-get update
sudo apt-get install heatbeat
# Weird but necessary
INSERT INTO masterpair (masterpair, name, value) VALUES
  ('pairname', 'broadcast', '192.168.2.255');
UPDATE masterpair SET value="/usr/lib/heartbeat/send_arp -p /tmp/send_arp.pid -i 100 -r 5 $sendarp_interface $sendarp_ip auto $sendarp_broadcast $sendarp_netmask" WHERE masterpair="pairname" AND name="send_arp_command";

Bugs

As a result of this, I found at least one bug. With the send_arp_command you can specifiy $sendarp_broadcast as an argument in the value, however when you do, if the variable is not set, there should be a configuration error in Flipper, rather then it attempting to execute a remote SSH command with the variable undefined which causes an error, but could if not support the write variable protection cause other issues depending on syntax used.

Annoyances

1. One annoying thing was unnecessary stderr for SSH connections under Ubuntu, you can fix by doing a 2>/dev/null to address it. It was however useful in debugging to see the number of SSH connections, and then it help find the ‘arping’ issue, but in general it’s annoying, for example.

./flipper developer swap
Connection to 192.168.2.181 closed.
Connection to 192.168.2.181 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.187 closed.
Connection to 192.168.2.181 closed.
Connection to 192.168.2.181 closed.
Connection to 192.168.2.181 closed.
Connection to 192.168.2.181 closed.

2. The slave is listed first, you just automatically think master/slave, the output should be formatted in this though.

./flipper developer status 2>/dev/null
MASTERPAIR: developer
NODE: beta181 has read IP, is read-only, replication running, 0s delay
NODE: alpha187 has write IP, is writable, replication running, 0s delay

3. No information after swap. When you do a swap, it would be good for the status to be shown. You are only going to run this command anyway to confirm.

Enhancements

I started working on modifying the ‘flipper’ script to support a read_only command, but I only had 1 day and ran out of time to finish.

Some MySQL pairs terminology

In response to a number of comments, I thought I would clarify the scope of my discussion regarding Options using MySQL pairs before I begin. As mentioned their is no one way or type of configuration for MySQL in a HA solution, however the simplest progression from a single Master/Slave environment is the concept of a pair of servers, configured to support a fail over and fail back via MySQL Replication.

The concept of a MySQL Pair in this context is to have a “hot” MySQL standby ready for controlled and hopefully! automated fail over. I say hopefully because with MySQL Replication as an asynchronous solution there is no guarantee for no loss of data.

I consider DRBD/Heatbeat for example a “cold” standby, as MySQL on the slave server is not actually running. DRBD does provide a guarantee of consistency in data (a synchronous solution) that is written at a disk level, which is a significant advantage over asynchronous replication. I consider Red Hat Cluster suite, simply a management process, and definitely “cold”.

A Shared disk solution, for example a SAN, and a failover server that uses the shared storage, is also a “cold” standby.

There are advantages and disadvantages to each option. These relative merits of the strengths and weaknesses should be considered carefully when you are making a design decision.

Options using MySQL Pairs

Configuring a production environment using a pair of MySQL servers in a Master/Fail Over Master situation is a common process to provide many benefits including supporting failover, backup/recovery, higher availability for software & database upgrades. This is also a common method for database shards. One of the key hidden benefits is by performing regular controlled failovers for example with software upgrades you are actively testing your disaster recovery procedures. Most organizations have a partial plan, some don’t have any, but rarely do people test their disaster recovery.

There is no one way to configure and manage such an environment. There are a number of options including:

  • Develop your own home grown scripts
  • Flipper by Proven Scaling
  • MMM by Percona
  • Heatbeat by the Linux High Availability Project

I have started a detailed review of a number of these technologies and will be providing my findings for review.

This is not the only way to solve the problems of course. Google for example have provided MySQL Patches that include features such as semi-sync replication and mirrored binary logs. Red Hat Cluster suite, and MySQL/DRBD are other technologies but less idea for various reasons specifically the “cold” nature of the failover environment.

Where is the innovation?

The 2009 MySQL Conference has closed it’s submissions for papers. This year the motto is “Innovation Everywhere”.

Last weekend’s Open SQL Camp in Charlottesville, Virginia, we had the chance to talk about the movements in the MySQL ecosystem. I was impressed to get the details of the Percona MySQL Patches, but focus is still in 5.0. (Welcome to the Percona team Tom Basil) Our Delta is attempting now to integrate patches into various MySQL branches. There was an opening keynote by Brian Aker from Drizzle, and Drizzle team Jay Pipes and Stewart Smith on hand. It was also announced that MySQL 5.1.30 will be GA, available in early December.

But these are not innovations that are ground breaking. Last year, it was the announcement of KickFire that I found most intriguing regarding innovation.

What is there this year?. The most interesting thing I read last week was Memcached as a L2 Cache for Innodb – The Waffle Grid Project. This is my kind of innovation. It’s sufficiently MySQL, but just adds another dimension with another companion technology. The patch seems relatively simple in concept and code size, and I’m almost prepared to fire up a few EC2’s to take this one for a spin. I’m doubly impressed because the creators are two friends and colleagues that are not hard core kernel hackers, but professionals on the front line dealing with clients daily. Will it be successful, or viable? That is the question about innovation.

Unfortunately I spend more time these days not seeing innovation in MySQL, but in other alternative database solutions in general. Projects like Clustrix, Inc., LucidDB, and Mongo in the 10gen stack.

When mysqldump –no-set-names matters

I had this perplexing problem yesterday where a mysql dump and restore was producing different results when using MaatKit mk-table-checksum.

mk-table-checksum --algorithm=BIT_XOR h=192.168.X.XX,u=user,p=password --databases=db1 --tables=c
DATABASE TABLE   CHUNK HOST         ENGINE      COUNT         CHECKSUM TIME WAIT STAT  LAG
db1      c           0 192.168.X.XX InnoDB     215169         d1d52a31    2    0 NULL NULL
mk-table-checksum --algorithm=BIT_XOR h=localhost,u=user,p=password --databases=db1 --tables=c
DATABASE TABLE   CHUNK HOST      ENGINE      COUNT         CHECKSUM TIME WAIT STAT  LAG
db1      c           0 localhost InnoDB     215169         91e7f182    0    0 NULL NULL

It was rather crazy until I reviewed the mysqldump settings I was using, and I realized I was using –no-set-names.

So just what does this option remove. Here is a diff of mysqldump with and without.

5a6,10
>
> /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
> /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
> /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
> /*!40101 SET NAMES utf8 */;
153a159,161
> /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
> /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
> /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
156c164

As you can see it executes a SET NAMES utf8. The problem here is I’m exporting a table, and it is DEFAULT CHARSET=latin1, and no columns are defined as utf8.

I’m not expert in character sets, but this strikes me as strange, and a problem that remains unresolved to my satisfaction, resolved, but not to my comfort level.

ORDER BY (the lesser known way)

We all know with MySQL you can use ORDER BY with a list of columns to return an ordered set, e.g. ORDER BY name, type, state;
I often use the syntax ORDER BY 1,2; which I’m surprised that some people do not know about.

However I needed to do some selective ordering of a type field, I didn’t want to for example have a lookup table just to join for ordering. While contemplating a means of achieving this, I asked a work colleague, who I figured may have just experienced this problem before. Lone behold I became the student as I discovered there is a third syntax with ORDER BY, using expressions.

mysql> create table test(name varchar(10) not null, type varchar(10) not null);
Query OK, 0 rows affected (0.06 sec)

mysql> insert into test(name,type) values
('Apples','Fruit'),
('Bananas','Fruit'),
('Carrots','Veg'),
('Onions','Veg'),
('Beer','Liquid'),
('Water','Liquid'),
('Crackers','Food');
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select name from test
order by type='Veg' DESC,
         type='Fruit' DESC,
         type='Food' DESC,
         type='Liquid' DESC;
+----------+
| name     |
+----------+
| Carrots  |
| Onions   |
| Apples   |
| Bananas  |
| Crackers |
| Beer     |
| Water    |
+----------+
7 rows in set (0.00 sec)

Of course, reading the MySQL Manual confirms this on the SELECT command.
I’ve not read the MySQL manual from cover to cover, since 4.x days. Perhaps it’s time.

Thanks to Nick Pisarro of Blog Revolution for this most valuable tip.

Selecting wise indexes

Indexes are a great way to improve performed in a MySQL database, when used appropriately.
When used in-appropriately the impact can be a degradation of performance.

The following example from Movable Type shows how when reviewing the slow query log I found numerous occurrences of Inserts take 3 or more seconds, with no reported lock contention time for this insert.

# Query_time: 3  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SET insert_id=6281;
INSERT INTO mt_comment
(comment_author, comment_blog_id, comment_commenter_id, comment_created_by,
 comment_created_on, comment_email, comment_entry_id, comment_ip, comment_junk_log,
comment_junk_score, comment_junk_status, comment_last_moved_on, comment_modified_by,
comment_modified_on, comment_parent_id, comment_text, comment_url, comment_visible)
VALUES (...)

The impact here, is that SELECT statements to the mt_comment table are also blocked because this table is in MyISAM. It was reviewing slow running SELECT statements that the cause of the slow inserts was easily determined.

mysql> explain SELECT comment_id
    -> FROM mt_comment
    -> WHERE (comment_visible = '1') AND (comment_blog_id = '3') AND (comment_entry_id = '276')
    -> ORDER BY comment_created_on DESC;


*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mt_comment
         type: ref
possible_keys: mt_comment_visible,mt_comment_entry_id,mt_comment_blog_id,mt_comment_blog_stat,mt_comment_visible_date,mt_comment_entry_visible,mt_comment_blog_visible,mt_comment_blog_ip_date,mt_comment_blog_url
          key: mt_comment_entry_visible
      key_len: 6
          ref: const,const
         rows: 99
        Extra: Using where
1 row in set (0.00 sec)


CREATE TABLE `mt_comment` (
  `comment_id` int(11) NOT NULL auto_increment,
  `comment_author` varchar(100) default NULL,
  `comment_blog_id` int(11) NOT NULL default '0',
  `comment_commenter_id` int(11) default NULL,
  `comment_created_by` int(11) default NULL,
  `comment_created_on` datetime default NULL,
  `comment_email` varchar(75) default NULL,
  `comment_entry_id` int(11) NOT NULL default '0',
  `comment_ip` varchar(16) default NULL,
  `comment_junk_log` mediumtext,
  `comment_junk_score` float default NULL,
  `comment_junk_status` smallint(6) default '0',
  `comment_last_moved_on` datetime NOT NULL default '2000-01-01 00:00:00',
  `comment_modified_by` int(11) default NULL,
  `comment_modified_on` datetime default NULL,
  `comment_parent_id` int(11) default NULL,
  `comment_text` mediumtext,
  `comment_url` varchar(255) default NULL,
  `comment_visible` tinyint(4) default NULL,
  PRIMARY KEY  (`comment_id`),
  KEY `mt_comment_commenter_id` (`comment_commenter_id`),
  KEY `mt_comment_visible` (`comment_visible`),
  KEY `mt_comment_junk_score` (`comment_junk_score`),
  KEY `mt_comment_ip` (`comment_ip`),
  KEY `mt_comment_parent_id` (`comment_parent_id`),
  KEY `mt_comment_entry_id` (`comment_entry_id`),
  KEY `mt_comment_email` (`comment_email`),
  KEY `mt_comment_last_moved_on` (`comment_last_moved_on`),
  KEY `mt_comment_created_on` (`comment_created_on`),
  KEY `mt_comment_junk_status` (`comment_junk_status`),
  KEY `mt_comment_blog_id` (`comment_blog_id`),
  KEY `mt_comment_blog_stat` (`comment_blog_id`,`comment_junk_status`,`comment_created_on`),
  KEY `mt_comment_visible_date` (`comment_visible`,`comment_created_on`),
  KEY `mt_comment_entry_visible` (`comment_entry_id`,`comment_visible`,`comment_created_on`),
  KEY `mt_comment_blog_visible` (`comment_blog_id`,`comment_visible`,`comment_created_on`,`comment_id`),
  KEY `mt_comment_blog_ip_date` (`comment_blog_id`,`comment_ip`,`comment_created_on`),
  KEY `mt_comment_junk_date` (`comment_junk_status`,`comment_created_on`),
  KEY `mt_comment_blog_url` (`comment_blog_id`,`comment_visible`,`comment_url`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

As you can see, the table has 18 indexes. This means that for every row inserted, 18 separate index inserts are required.

When adding an Index to a table, first determine the usage patterns that will use the index, consolidating indexes when possible and removing obvious duplicates (in the above example, the single column comment_blog_id is a classic duplicate index).

Adding an index will generally help SELECT performance, depending on cardinality, but will always impact INSERT,UPDATE and DELETE performance.
Another down side of too many indexes is the MySQL optimizer has much more work to do to eliminate beneficial indexes for every Query Execution Plan (QEP) that is undertaken.

Indeed I have seen worse, in one case a table with ~120 columns, move then 20 single column indexes AND a 3 part primary key summing 40 bytes in InnoDB. The impact was terrible for performance, with the Index size being 3x times the data size.


About the Author

Ronald Bradford, Principal of 42SQL provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of specialized consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

Why you do not use GRANT ALL ON *.*?

Why you do not use GRANT ALL ON *.*?

I was with a client today, and after rebooting a MySQL 5.0.22 instance cleanly with /etc/init.d/mysqld service, I observed the following error, because you always check the log file after starting MySQL.

080923 16:16:24  InnoDB: Started; log sequence number 0 406173600
080923 16:16:24 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.22-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
080923 16:16:24 [ERROR] /usr/libexec/mysqld: Table './schema_name/table_name' is marked as crashed and should be repaired
080923 16:16:24 [Warning] Checking table:   './schema_name/table_name'

Now, I’d just added to the /etc/my.cnf a number of settings including:

myisam_recovery=FORCE,BACKUP

which explains the last line of the log file. When attempting to connect to the server via the mysql client I got the error:

“To many connections”

So now, I’m in a world of hurt, I can’t connect to the database as the ‘root’ user to observe what’s going on. I know that table it’s decided to repair is 1.4G in size, and the server is madly reading from disk. Shutting down the apache server that was connecting to the database is not expected to solve the problem, and does not, because connections must wait to timeout.

MySQL reserves a single super privileged connection, i.e. ‘root’ to the mysql server specifically for this reason, unless all the connections have this privilege. The problem, as often experienced with clients, is the permissions of the application user is simply unwarranted.

mysql> select host,user,password from mysql.user;
+-----------+-------------+------------------+
| host      | user        | password         |
+-----------+-------------+------------------+
| localhost | root        | 76bec9cc7dd32bc0 |
| xxxxxx    | root        |                  |
| xxxxxx    |             |                  |
| localhost |             |                  |
| %         | xxxxxxxxxxx | 0716d6776318d605 |
| localhost | xxxxxxxxxxx | 0716d6776318d605 |
| localhost | xxxxxxx     | 6885269c4a550a03 |
+-----------+-------------+------------------+
7 rows in set (0.00 sec)

mysql> show grants for xxxxxxx@localhost;
+---------------------------------------------------------------------------------------+
| Grants for xxxxxxx@localhost                                                          |
+---------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO xxxxxxx'@'localhost' IDENTIFIED BY PASSWORD '6885269c4a550a03'  |
| GRANT ALL PRIVILEGES ON `xxxxxxx`.* TO 'xxxxxxx'@'localhost' WITH GRANT OPTION        |
+---------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

So the problem is ALL PRIVILEGES is granted to an application user. Never do this!

The solution is to remove all unused users, anonymous users, and create the application user with just the privileges needed.

DROP USER xxxxxxxxxxx@localhost;
DROP USER xxxxxxxxxxx@'%';

DELETE FROM mysql.user WHERE user='';
FLUSH PRIVILEGES;
DROP USER xxxxxxx@localhost;
CREATE USER xxxxxxx@localhost IDENTIFIED BY 'xxxxxxx';

GRANT SELECT,INSERT,UPDATE,DELETE ON xxxxxxx.* TOxxxxxxx@localhost;

NY Tech 1995-2008. Opening Web 2.0 Expo NY Keynote

Web 2.0 Expo NY keynotes are happening today. Technology in use included CrowdVine which I’d not heard of, and plenty of Twitter feeds such as w2e_NY08.

The opening keynote was Fred Wilson from Union Square Ventures with his presentation New York’s Web Industry From 1995 to 2008: From Nascent to Ascendent .

Some stats, Seed and early stage deals.

  • 1995 230 SF Bay area, 30 in NY
  • 2008 360 SF Bay area, 116 in NY

Fred first asked “New York is not an alley. Call it Broadway, or just New York.”

Here is a summary of his history of New York Web Industry.

  • 1991 – ZDNet
  • 1993 – New York Online Dialup services
  • 1993 – Jupiter Communications online conference
  • 1993 – Prodigy
  • 1994 – Startups such as Pseudo, Total New york, Razorfish.
  • 1994 – Time Warner Pathfinder
  • 1995 -NYIC 55 Broad St. – Technology oriented building
  • 1995 – Seth Godin – Yoyodyne – Permission Marketing
  • 1995 – itraffic, agency.com, NY Times online
  • 1995 – Softbank, Double Click, 24×7, Real Media
  • 1996 – Silicon Alley Reporter
  • 1996 – ivillage, the knot
  • 1996 – Flatiron Partners – good sued for that
  • 1997 – The Silicon Alley Report Radio Show
  • 1997 – mining co.
  • 1997 – Total NY sold to AOL
  • 1997 – Agency rollups razorfish buying 4 companies
  • 1997 – DoubleClick IPO
  • 1998 – Seth Godin moves to Yahoo
  • 1998 – Burn Rate
  • 1998 – Kozmo – We’ll be right over
  • 1998 – was the last year of sanity in the Internet wave
  • 1999 – The start of the boom
  • 1999 the big players came online , all hell breaks loose. 200 startups were funded in 1999, 300 in 2000.
  • 2000 – The Crash & Burn
  • 2000 – f**kedcompany
  • 2000 – Google came to New York. – 86th St Starbucks
  • 2001 – Layoffs, Landlords and bankruptcies
  • 2002 – Rock bottom
  • 2003 – Renewal
  • 2003 – Blogging started gizmodo
  • 2003 – Web 2.0 coined
  • 2003 – del.icio.us was launched from a computer in an apartment
  • 2004 – NY Tech Meetup
  • 2004 – Union Square Ventures $120million raised
  • 2005 – about.com acquired by NY Times
  • 2005 – Etsy
  • 2006 – Google took over port authority building, now with 750 engineers in NY
  • 2008 – Web 2.0 comes to New York City

New York is now 1/3 of Silicon valley, compared to 1/8 of funded Internet companies.

One thing mentioned is a documentary called “We live in Public”. Some of the footage from 1999, is so early Big Brother.

A neat trick for a row number in a MySQL recordset

While working for a client, I had need to produce canned results of certain different criteria, recording the result in a table for later usage, and keep the position within each result.

Knowing no way to do this via a single INSERT INTO … SELECT statement, I reverted to using a MySQL Stored Procedure. For example, using a sample I_S query and the following snippet:

  ...
  DECLARE list CURSOR FOR SELECT select table_name from information_schema.tables where table_schema='INFORMATION_SCHEMA';
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=TRUE;

  OPEN list;
  SET result_position = 1;
  SET done = FALSE;
  lab: LOOP
    FETCH list INTO table_name;
    IF done THEN
      CLOSE list;
      LEAVE lab;
    END IF;
    INSERT INTO  summary_table(val,pos) VALUES (table_name,result_position);
    SET result_position = result_position + 1;
  END LOOP;

However, in reviewing with another colleague after writing some 10+ different queries and SP loops, I realized that it is possible to record the position of each row in a result set using session variables, negating the need for all that code.

SET @rowcount = 0;
SELECT table_name, @rowcount := @rowcount + 1 FROM information_schema.tables WHERE table_schema = 'INFORMATION_SCHEMA';
+---------------------------------------+----------------------------+
| table_name                            | @rowcount := @rowcount + 1 |
+---------------------------------------+----------------------------+
| CHARACTER_SETS                        |                          1 |
| COLLATIONS                            |                          2 |
| COLLATION_CHARACTER_SET_APPLICABILITY |                          3 |
| COLUMNS                               |                          4 |
| COLUMN_PRIVILEGES                     |                          5 |
| ENGINES                               |                          6 |
| EVENTS                                |                          7 |
| FILES                                 |                          8 |
| GLOBAL_STATUS                         |                          9 |
| GLOBAL_VARIABLES                      |                         10 |
| KEY_COLUMN_USAGE                      |                         11 |
| PARTITIONS                            |                         12 |
| PLUGINS                               |                         13 |
| PROCESSLIST                           |                         14 |
| PROFILING                             |                         15 |
| REFERENTIAL_CONSTRAINTS               |                         16 |
| ROUTINES                              |                         17 |
| SCHEMATA                              |                         18 |
| SCHEMA_PRIVILEGES                     |                         19 |
| SESSION_STATUS                        |                         20 |
| SESSION_VARIABLES                     |                         21 |
| STATISTICS                            |                         22 |
| TABLES                                |                         23 |
| TABLE_CONSTRAINTS                     |                         24 |
| TABLE_PRIVILEGES                      |                         25 |
| TRIGGERS                              |                         26 |
| USER_PRIVILEGES                       |                         27 |
| VIEWS                                 |                         28 |
+---------------------------------------+----------------------------+
28 rows in set (0.01 sec)

Of course you need the all important SET before each query, if not specified however, the subsequent query does not result in an error, just NULL.

So all I needed was:

INSERT INTO summary_table(val,pos)
SELECT table_name, @rowcount := @rowcount + 1
FROM information_schema.tables
WHERE table_schema = 'INFORMATION_SCHEMA';

A simple and trivial solution.

DISCLAIMER:
How this performs under load, and how it is supported in different and future versions of MySQL is not determined.

Securing your OS for MySQL with JeOS

Do you have a full time System Administrator? Do you have only a part-time SA, or none at all?

Packet General’s Data Security and PCI Compliance solutions run on a dedicated appliance, based on a “Just Enough Operating System” (JeOS) to minimize exposure.

This appliance actually improves not just the security of your data, but ensures your Operating System is secure and up to date. With only 4 services and a footprint < 600MB this is an ideal solution for running even a normal MySQL installation. Security upgrades can also be provided as an automated feature, eliminating the need for this management internally.

Tomorrow in the MySQL Webinar How to secure MySQL data and achieve PCI compliance which is being held Thursday, September 11, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT we will be discussing this in more detail.

How to secure MySQL data and achieve PCI compliance

This week I will be the moderator for a MySQL Webinar How to secure MySQL data and achieve PCI compliance being held Thursday, September 11, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT.

Recently I wrote about Do you store credit cards in your MySQL Database?. If you do, then PCI Compliance is not something you can ignore.

This webinar will not only be discussing PCI Compliance, but also MySQL data security. Our panel includes Didier Godart from MasterCard Worldwide, one of three members who drafted the Payment Card Industry Data Security Standard 1.0.

For more information on the various PCI Compliance and Encryption options for MySQL , check out the Packet General website.