CommunityOne East – An open developer conference

With an opening video from thru-you.com – an individual taking random you-tube video and producing video mashup’s, the CommunityOne East conference in New York, NY beings.

The opening introduction was by Chief Sustainability Officer Dave Douglas. Interesting job title.

His initial discussion was around what is the relationship between technology and society. A plug for his upcoming book “Citizen Engineer” – The responsibilities of a 21st Century Engineer. He quotes “Crisis loves an Innovation” by Jonathan Schwartz, and extends with “Crisis loves a Community”.

He asks us to consider the wider community ecosystem such as schools, towns, governments, NGO’s etc with our usage and knowledge of technology.

Event: CommunityOne East in New York, NY.
Author: Ronald Bradford

Hurting the little guy?

Today I come back from the dentist, if that wasn’t bad enough news, I get an email from Google AdWords titled Your Google AdWords Approval Status.

In the email, all my AdWords campaigns are now disapproved, because of:

SUGGESTIONS:
-> Ad Content: Please remove the following trademark from your ad:
mysql.

Yeah right. I can’t put the word ‘MySQL’ in my ads. How are people to now find me? It would appear that many ads have been pulled not just mine. Is this a proactive measure by Google? is this a complaint from the MySQL trademark holder Sun Microsystems?

I’d like any comment, feedback or suggestions on how one can proceed here.

It reminds me of the days CentOS advertised itself as an “Open source provider of a popular North American Operating System”, or something of that nature.

Understanding the various MySQL Products & Variants

The MySQL marketplace today is far more complex then simply choosing between a particular version of MySQL that Sun/MySQL produces.
The MySQL server product in general is released under the GNU General Public License (GPL) v2, however you should carefully review the MySQL Legal Policies as a number of exceptions and different license agreements operate for companion tools such as MySQL Cluster, MySQL client libraries and documentation for example.

Looking into the MySQL ecosystem for products, I’ve produced the following categories:

  • Sun/MySQL Official Products
    • MySQL Versions
  • MySQL Variants
    • Community
    • Enterprise
  • MySQL Plugins
  • MySQL Patches
  • MySQL Alternatives

Why does such a diversification occur?  I attribute this to three primary causes:

  • The GPL license by nature allows for an organization to take the product, modify it and use it for their specific needs. They can also provide these patches under GPL for others to use and incorporate. While this has occurred for example  Google , FaceBook, eBay , Proven Scaling and Percona to name a few, Sun/MySQL has elected not to undertake any proactive process of incorporating these in any timely fashion.
  • The policy of Sun/MySQL to allow for contributions was so strict, and combined with a properietory Version Control System BitKeeper you had to purchase, there was little incentive for community contributions in relation to so many other open source projects
  • The Sun/MySQL management and decision makers didn’t listen to the community and paying customers, and over the past 3-5 years the product life cycle, features, release schedule and quality can be questioned.

Sun/MySQL Official Products

Sun/MySQL holds the license to the MySQL server products. They release official binaries and the source code (due to GPL).  Even within MySQL, there are several products that differ subtly and to the untrained eye it can be confusing to understand and determine what is best. Your can download from www.mysql.com the following versions:

  • MySQL Server 5.1  GA
  • MySQL Community Server 5.0 GA
  • MySQL Enterprise Server 5.0 GA
  • MySQL Cluster NDB 6.3

  • MySQL Server 4.1 (EOL)
  • MYSQL Server 6.0 (Alpha)

MySQL Versions

It is important that you understand the MySQL Versions, especially in evaluating any of the following referenced variants, patches etc.
The common path for MySQL Server versions is with a generally linear numbering systems including historical versions 3.23, 4.0 and 4.1.  These versions have now reached End Of Life (EOL) for support, however emergency security patches are applied where necessary.
Continuing from 4.1, you have the 5.0, 5.1 versions which are both Generally Available (GA), and then version 6.0 which is currently Alpha.

Further complexity happens when within the Sun/MySQL Official products, several forks/branches have occurred.  These include:

  • The MySQL 5.0 Community & Enterprise split occurred at MySQL Version 5.0.27
  • At this time, the Community version (free to download) continued with the intention of allowing for community contributions. Only one patch was ever accepted, and SHOW PROFILES was introduced in MySQL 5.0.37.  To date, 11 versions have been released to the current 5.0.77 version.
  • MySQL Enterprise (available under subscription) is itself comprised of three subtypes, these are Rapid Update Service Packs(monthly), Quarterly Service Packs (quarterly) and Hot-fix releases.  To date 37 versions have been released to the current 5.0.78 version.
  • MySQL Cluster, was part of the MySQL Server product until this was branched/forked at  MySQL Version 5.1.23. This enabled MySQL Cluster to be labeled as Production Ready for Cluster clients, and not be held back by continued delays in the 5.1 server release.   Starting with a new versioning scheme with 6.1, the MySQL Cluster NDB produces new versions far exceeding the volume of the server, with to date 23 versions in 6.1 , 18 in version 6.2 and 24 in version 6.3.      I am not advocating that features and quality are better or worse, simply that activity and interaction with community and users is far greater.
  • MySQL 5.1 Maria is a special branch starting at MySQL Version 5.1.24 that includes the Maria Storage Engine. This is the next generation of the MyISAM Storage Engine, both architected by the creator of MySQL, Monty Widenius.  It is undercertain this will continue as a product released officially by Sun/MySQL.

In Review

With just reading this introduction you can understand the confusion that exists when new customers/clients are beginning to evaluate the different MySQL Versions.

In my next post, I’ll talk more about:

  • MySQL Variants, those I consider variants use the MySQL Interface, protocol and support the standard connectors.  These include community versions (e.g. Solid, Infobright, Sphinx) and commercial versions (e.g. KickFire, InfoBright, Nitro).  
  • MySQL patches are improvements that have been released to the community and are now becoming part of common third party MySQL packages, such as Percona, Proven Scaling and Out Delta
  • MySQL Plugins are a feature of MySQL 5.1, and allow for pluggable storage engines into MySQL.  While several companies have had to produce custom binaries due to the API limitions (especially with the optimizer), a number of engines support the API including Innodb, PBXT and filesystem engine.
  • MySQL Alternatives include any MySQL related products that have now deviated from being supported under the MySQL protocol.  Most notably here is Drizzle.

More Information

Ronald Bradford is Principal at 42SQL. We provide consulting and advisory services for the MySQL ecosystem with a focus on MySQL database performance, architecture and scalability. 42SQL also provides education in MySQL including the “MySQL Essentials” training course. You can find more information regarding this offering and an upcoming schedule at 42SQL Education.

Beginner CSV Engine issues

I’ve just started using the CSV engine for a practical application and I’ve come across a few subtle and unexpected issues/limitations.

First, you can’t create any columns in your CSV table nullable.

mysql> create table t1(i INT) ENGINE=CSV;
ERROR 1178 (42000): The storage engine for the table doesn't support nullable columns

RTFM shows this was introduced in 5.1.23. See CSV Limitations

The second and more annoying was creating a CSV table, inserting a sample row (just to check the syntax), then replacing the file ([datadir]/[schema]/[table].CSV) with the same format, but with additional generated rows. This was to no avail when attempting to SELECT from the table.

The solution was to do a REPAIR TABLE [table] in order to see the newly refreshed data.
Futhermore, some more RTFM shows in Repairing and Checking CSV Tables an important caveat to using the CSV engine and loading data directly by file copy.

Warning: Note that during repair, only the rows from the CSV file up to the first damaged row are copied to the new table. All other rows from the first damaged row to the end of the table are removed, even valid rows.

Buyer beware.

Infobright Community Edition(ICE) – It's Free

The March NY MySQL Meetup featured a presentation from Infobright, a data warehousing solution built on the MySQL Product.

With a pitch of “Simplicity, Scalability and low TCO” I became more impressed with the capability to delivery on these as the presentation proceeded. Here are some highlights.

  • The company and product has been around for a few years. Infobright started as a compression engine to sit beside Teradata, providing a significant cost saving to clients, and allowing a two way data transfer between Teradata.
  • In September 2008, a open source community edition was released, called ICE. (Which I didn’t know)
  • The technology is based on a Rough Set theory, a mathematical approach
  • Using a column oriented approach, compression generally starts at 10:1, different applications can get 30:1 or better
  • There is basically no tuning, there are no indexes. Knowledge is gleaned at data loading and each data pack node holds key information per column, such as range of values (min,max).
  • Some interesting results are, there is a constant load time, it doesn’t degrade over time as the size of your data increases. Also, Query performance scales with data volume.
  • Depending on queries, the knowledge grid can retrieve results without having to uncompress the data, i.e. introspection of the meta data is all that is needed
  • Infobright is not a pluggable storage engine, rather a custom binary of MySQL. This is due to the restrictions of the API and the lack of optimizer push down conditions for example.

The product is not without some limitations, but you have to realize the product is for a data warehousing implementation, not an OLTP web app. It’s not great with SELECT *, and large text strings for example.

Functionality continues to be added, with a recent release adding many more MySQL Functions, but again, Infobright does not claim to be a solution to everybody, there is not UDF support or SP support at this time, however I’d warrant this is really not needed.

While the presentation went into some detail regarding the knowledge grid, data packs, data pack nodes, and pack to pack integration from a slide perspective, the presentation lacked the technical here is how you use the loader to get data out of MySQL and into Infobright. Here is the throughput, etc. As a marketing presentation it had the right content, but I’d like to now see the companion technical presentation.

Having previously been part of the MySQL Consulting team, and having worked also in the Storage Engine API with the Nitro Storage engine I have a distinct advantage of knowing the complexities of integration with MySQL. We can only hope this continues to improve with future releases of MySQL enabling Infobright and other products to integrate better and keep up to date with the MySQL Release cycle.

Identifying Bad Memory

I was having problems recently with a dedicated production server, that runs my MySQL Server and a number of websites. It’s most annoying when your system crashes without any reporting in /var/log/messages

The tool of choice from the host provider SoftLayer was PassMark BurnInTest Linux which is installed with every dedicated server.

I will need to investigate open source alternatives, as this is a commercial product, but for the purposes of my pain, this included tool was well worth the investment.

**************
RESULT SUMMARY
**************
Test Start time: Sun Feb 22 16:02:48 2009
Test Stop time: Sun Feb 22 16:07:49 2009
Test Duration: 000h 05m 01s

Test Name Cycles Operations Result Errors Last Error
CPU - Maths 261 488 Billion PASS 0 No errors
Memory (RAM) 2 3.081 Billion FAIL 1 Error verifying data in RAM
Network: 127.0.0.1 412995 4.295 Billion PASS 0 No errors
TEST RUN FAILED

*********************
SERIOUS ERROR SUMMARY
*********************
SERIOUS : 2009-02-22 16:07:31, RAM, SERIOUS: Error verifying data in RAM (x 1)

It was great to get a simple resolution to the problem, bad memory?
With a scheduled maintenance replacement I was operational again.

 **************
RESULT SUMMARY
**************
Test Start time: Sun Feb 22 20:34:37 2009
Test Stop time: Sun Feb 22 20:39:38 2009
Test Duration: 000h 05m 01s

Test Name Cycles Operations Result Errors Last Error
CPU - Maths 267 406 Billion PASS 0 No errors
Memory (RAM) 1 3.664 Billion PASS 0 No errors
Network: 127.0.0.1 334578 3.480 Billion PASS 0 No errors
TEST RUN PASSED

*********************
SERIOUS ERROR SUMMARY
*********************

Are you monitoring RSS & VSZ?

Monitoring MySQL Memory is a rather critical task because you can’t limit MySQL’s usage of physical memory resources. Improperly configured servers running MySQL can crash because you don’t understand memory usage.

MySQL uses memory in a number of different ways. Using the Oracle analogy, you can divide the mysqld memory usage into main areas of:

  • SGA – System Global Area
  • PGA – Process Global Area

The SGA is the footprint that MySQL uses for startup. This is attributed to the base footprint of the mysqld process and a number of buffers including:

NOTE: This is for a default MySQL 5.1 install. Other storage engines and/or other versions of MySQL may have additional buffers. Falcon for example in MySQL 6.x has additional buffers.

The PGA is more complex, and the cause of problems for the possible occurrence of your server running out of memory and needing to swap. The goal of monitoring memory usage is to of course avoid this.
This additional memory is a combination of a few areas including:

  • MEMORY tables
  • Connection management (such as thread_cache and table_cache)
  • Per Connection memory usage

The later is the cause of greatest concern, especially for environments that have persistent connections. Per Connection memory usage is a combination of many buffers including the thread_stack, 2 x net_buffer_length (to max_allowed_packet), read_buffer_size, read_rnd_buffer_size, join_buffer_size, sort_buffer_size, and up to min(max_heap_table_size,tmp_table_size). In fact, for example with temporary tables, a query may use multiple temporary tables.

Memory on a per connection basis is kept until the connection is closed. In some instances such as next_buffer_size, this is apparently reduced aftter each SQL Statement result. With a persistent connection model (e.g. Java), ensuring idle connections drop to a low watermark is a valuable task. The confusing part is MySQL instrumentation does not tell you exactly how much is used, and it’s impossible to calculate with available provided data.

As part of monitoring your server, you should monitor the size of the mysqld memory usage, because this will cause you to be proactive rather then reactive to scarce memory resources. You can easily get this using the ps command. For example:

$ps -eopid,fname,rss,vsz,user,command | grep -e "RSS" -e "mysql"
  PID COMMAND    RSS    VSZ USER     COMMAND
 5463 grep       764   5204 ronald   grep -e RSS -e mysql
13894 mysqld_s   596   3936 root     /bin/sh /usr/bin/mysqld_safe
13933 mysqld   4787812 5127208 mysql /usr/sbin/mysqld --basedir=/usr --datadir=/vol/mysql/mysqldata --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
13934 logger     608   3840 root     logger -p daemon.err -t mysqld_safe -i -t mysqld
$ ps -eopid,fname,rss,vsz,user,command | grep " mysqld " | grep -v grep | awk '{print $3,$4}'
4787820 5127208


From man ps
rss RSS resident set size, the non-swapped physical memory that a task has used (in kiloBytes). (alias rssize, rsz).
vsz VSZ virtual memory size of the process in KiB (1024-byte units). Device mappings are currently excluded; this is subject to change.
(alias vsize).

The motto of the story, don’t just monitor the free memory of your system, for a database server, you need to closely monitor the primary process on the server, that is mysqld.

References

How MySQL Uses Memory

More Information

Join me for my MySQL User Conference talk on “Monitoring 101 – Simple stuff to save your bacon”.

I also cover monitoring MySQL in my “MySQL Essentials” training course. For more information visit MySQL Education.

Testing your system

I have raised this specific topic 3 times this week alone, twice in a MySQL setting.

The fundamental philosophy of testing is NOT to verify features of your product that work, it is to BREAK your system.

One such discussion this week was with a service provider that deployed a new system into an existing ecosystem. The release has been delayed due to development issue, and credibility with customers is now being further damaged because the system is reaching physical hardware limitations after just one month.

With this was described to me, my simple response was. You did not test you system to stress the system to breaking point. To know the limit of your capacity ahead of time is a proactive analysis, not a reactive one.

It’s not that complicated to do, easier in early stage before you have a 50-100-1000 server total environment, but it’s a best practice not see often enough.

Configuration management concepts for database objects

Correctly managing your MySQL database objects such as schemas, tables, indexes, base data etc, is critical to the success of a 24×7 online website. I rarely encounter a robust working solution as part of my consulting so I would like to share my experience in identifying the best practices you should be adopting whether your an existing organization or just an individual with a simple website.

Much of the following concept actually pre dates my involvement in MySQL (since 1999), so this is not just applicable for a MySQL RDBMS. For the purposes of this discussion I’d like to focus on the theory successfully used with clients.

Under version control I have the following directory structure:

NOTE If your first observation was “Arrh, Version Control?”, you are in more trouble then you want to be right from day one. You need Version Control such as svn, cvs, bzr, git etc for any website no matter how small.

/database
  /scripts
  /sql
    /schema
    /patch
    /revert
    /admin
  /data
  ....

The /database is a top level directory, and for software packaging for all database related operations, you simply include all contents from /database.

At it’s core, every database object change for configuration management will be addressed in three (3) files.

  • A schema file
  • A patch file
  • A revert file

In fact, you can add version control rules for example to ensure if you add a patch file, a corresponding revert and schema file is also specified.

For a “current” working environment, there are two paths for database object management.

  • An upgrade path
  • A new version creation.

An upgrade path which is the normal production operation, takes an existing database schema and ‘patches’ this to a new revision. As the name suggests, for each ‘patch’ file a corresponding ‘revert’ file can be used to revert the upgrade. For testing and development environments, a current version of the full schema can always be created without using the upgrade path simply by creating the schema with the current schema file.

For the purposes of understanding how this would work in a real environment, I’ll use the Sakila test database and I’ll step through a few examples.

Seeding your configuration management

Because we already have an existing schema, the first step is to seed our new configuration management with the existing schema information.

This would actually involve some duplication, however this will become more apparent in future examples.

We will be creating the following three (3) files:

  • /database/sql/schema/schema.sakila.sql
  • /database/sql/patch/patch.20090303.01.sql
  • /database/sql/revert/revert.20090303.01.sql

/database/sql/schema/schema.sakila.sql
This will be a copy of the sakila-db/sakila-schema.sql. You will need to edit this file to remove the following lines.

DROP SCHEMA IF EXISTS sakila;
CREATE SCHEMA sakila;
USE sakila;

All configuration files must not contain any schema definitions. This will be discussed in more detail at a later time.

/database/sql/schema/patch.20090303.01.sql
This will be a copy of the above file.

/database/sql/schema/revert.20090303.01.sql

DROP PROCEDURE IF EXISTS rewards_report ;
DROP FUNCTION IF EXISTS get_customer_balance;
DROP PROCEDURE IF EXISTS film_in_stock;
DROP PROCEDURE IF EXISTS film_not_in_stock;
DROP FUNCTION IF EXISTS inventory_held_by_customer;
DROP FUNCTION IF EXISTS inventory_in_stock;
DROP VIEW IF EXISTS customer_list;
DROP VIEW IF EXISTS film_list;
DROP VIEW IF EXISTS nicer_but_slower_film_list;
DROP VIEW IF EXISTS staff_list;
DROP VIEW IF EXISTS sales_by_store;
DROP VIEW IF EXISTS sales_by_film_category;
DROP VIEW IF EXISTS actor_info;
DROP TABLE IF EXISTS actor;
DROP TABLE IF EXISTS address;
DROP TABLE IF EXISTS category;
DROP TABLE IF EXISTS city;
DROP TABLE IF EXISTS country;
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS film;
DROP TABLE IF EXISTS film_actor;
DROP TABLE IF EXISTS film_category;
DROP TABLE IF EXISTS film_text;
DROP TABLE IF EXISTS inventory;
DROP TABLE IF EXISTS language;
DROP TABLE IF EXISTS payment;
DROP TABLE IF EXISTS rental;
DROP TABLE IF EXISTS staff;
DROP TABLE IF EXISTS store;

For the purposes of this theory, I will discuss schema creation and management at a later time. For this example, we assume the ‘sakila’ schema has been created and is empty.

The ‘two’ paths

The default path is to apply the patch file to the appropriate schema. In this case, by using the patch file, this would create the current ‘sakila’ schema.

If this fails for example, you should automatically apply the revert script which should restore your environment to it’s original state, in this case an empty schema.

If you wanted to create a new test environment for example, (following creation of the schema), you could simply apply the schema file.

Let’s perform another iteration, to see the full working process.

Adding new objects

Let’s say we wanted to keep additional information such as famous quotes an actor has made. We want to create a new table ‘actor_quote’.

For this we would first create a patch and revert script to manage this new object.
/database/sql/patch/patch.20090303.02.sql

CREATE TABLE actor_quote (
  quote_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  actor_id SMALLINT UNSIGNED NOT NULL,
  quote   VARCHAR(200) NOT NULL,
  PRIMARY KEY  (quote_id),
  KEY idx_fk_actor_id (actor_id),
  CONSTRAINT fk_actor_quote_actor FOREIGN KEY (actor_id) REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

/database/sql/revert/revert.20090303.02.sql

DROP TABLE IF EXISTS actor_quote;

/database/sql/schema/schema.sakila.sql
The contents of the patch file should be appended to this file.

The ‘two’ paths

If we look at the two paths again.

The normal production operation, by using the patch file, would create the new database object.

If this fails for example, you should automatically apply the revert script which should restore your environment to it’s original state, in this case drop the table if it exists. In this simplest example,

If you wanted to create a new test environment for example, (following creation of the schema), you could simply apply the schema file.

Review

We have only touched on the entire process of configuration management for database objects. The implementation of this practice includes meta data and controlling scripts that manage the order of execution, recording operations performed successfully or unsuccessfully for example.

About Standards

Within this overview a number of standards are in place. These include:

  • SQL scripts do not contain any CREATE/DROP DATABASE|SCHEMA commands
  • SQL scripts do not contain any schema/database specific references. This is important for being able to easily test and verify operations. In our above examples, the default Sakila DB contains such information and would be edited appropriately.
  • For Patch and Revert files a chronological date format for naming is used, e.g. YYYYMMDD.XX, where XX is a sequential number for multiple patch/revert scripts for any given day.
  • All SQL statements must be terminated with ‘;’. This is important for the management processes and automated scripts that take these fundamental schema/patch/revert scripts as source information.
  • Where possible, try to make revert scripts, support either a successful or failed patch process. For example, adding IF EXISTS to a DROP TABLE statements supports both cases.
  • It is reasonably obvious to have schema, patch and revert directories as a naming standard, but file name also include this as a prefix. This is performed as a double check, if a file is seen in isolation it’s type can be determined regardless of directory location. Also for logging, only filenames are used.

More Information

Configuration Management in MySQL is one of the topics discussed in the “MySQL Essentials” training course. You can find more information regarding this and other training offerings including an upcoming schedule at 42SQL Education.

Planet MySQL at a new URL

Did anybody notice that http://planetmysql.org now redirects to http://planet.mysql.com?

Curious to know the reason why, perhaps an official MySQL person can give us some details.
Also it’s a 302 redirect, not a 301 redirect, interesting?

 wget http://planetmysql.org
--2009-02-26 14:40:09--  http://planetmysql.org/
Resolving planetmysql.org... 213.136.52.29
Connecting to planetmysql.org|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.planetmysql.org/ [following]
--2009-02-26 14:40:10--  http://www.planetmysql.org/
Resolving www.planetmysql.org... 213.136.52.29
Connecting to www.planetmysql.org|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://planet.mysql.com/ [following]
--2009-02-26 14:40:10--  http://planet.mysql.com/
Resolving planet.mysql.com... 213.136.52.29
Connecting to planet.mysql.com|213.136.52.29|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

Eliminating unnecessary internal temporary tables

I can’t stress enough that people look at SQL statements that are being executed against your production MySQL database, and you optimize queries when you can.

Often it’s the improvement to the large number of similar queries executed that can optimize resources. In this example, we take a very simple query, and by removing an unnecessary order by, we eliminate MySQL internally creating a temporary (in memory) table.

So what’s the big deal.

  • The query is simpler to read and understand
  • Memory required for the connection is not assigned
  • A number of internal steps are no longer required (4 of 21 logging messages, not an ideal measurement, but an indication). In this case, it was easily a 10% performance improvement for each query.

This query is executed 10-100 times per second, so the improvement in performance is significant.

mysql> explain select max(mdate) as mdate from tbl  where original_account = '[email protected]' and id = '15847' order by mdate desc;
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
| id | select_type | table                | type | possible_keys        | key              | key_len | ref         | rows | Extra                        |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
|  1 | SIMPLE      | tbl                  | ref  | ids,original_account | original_account | 388     | const,const |  146 | Using where; Using temporary |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+------------------------------+
1 row in set (0.00 sec)
mysql> explain select max(mdate) as mdate from tbl  where original_account = '[email protected]' and id = '15847';
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
| id | select_type | table                | type | possible_keys        | key              | key_len | ref         | rows | Extra       |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | tbl                  | ref  | ids,original_account | original_account | 388     | const,const |  146 | Using where |
+----+-------------+----------------------+------+----------------------+------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
mysql> show profile cpu,memory,source for query 1;
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
| Status                         | Duration | CPU_user | CPU_system | Source_function           | Source_file   | Source_line |
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
| (initialization)               | 0.00001  | 0        | 0          | send_result_to_client     | sql_cache.cc  |        1143 |
| checking query cache for query | 0.000079 | 0        | 0          | open_tables               | sql_base.cc   |        2652 |
| Opening tables                 | 0.000024 | 0        | 0          | mysql_lock_tables         | lock.cc       |         153 |
| System lock                    | 0.000015 | 0        | 0          | mysql_lock_tables         | lock.cc       |         163 |
| Table lock                     | 0.000041 | 0        | 0          | mysql_select              | sql_select.cc |        2282 |
| init                           | 0.000046 | 0        | 0          | optimize                  | sql_select.cc |         765 |
| optimizing                     | 0.000027 | 0        | 0          | optimize                  | sql_select.cc |         924 |
| statistics                     | 0.000173 | 0        | 0          | optimize                  | sql_select.cc |         934 |
| preparing                      | 0.000028 | 0        | 0          | optimize                  | sql_select.cc |        1383 |
| Creating tmp table             | 0.000053 | 0        | 0          | exec                      | sql_select.cc |        1603 |
| executing                      | 0.000011 | 0        | 0          | exec                      | sql_select.cc |        1743 |
| Copying to tmp table           | 0.002226 | 0        | 0          | exec                      | sql_select.cc |        2123 |
| Sending data                   | 0.000148 | 0        | 0          | mysql_select              | sql_select.cc |        2327 |
| end                            | 0.000013 | 0        | 0          | free_tmp_table            | sql_select.cc |       10115 |
| removing tmp table             | 0.000064 | 0        | 0          | free_tmp_table            | sql_select.cc |       10143 |
| end                            | 0.000014 | 0        | 0          | mysql_execute_command     | sql_parse.cc  |        5154 |
| query end                      | 0.000012 | 0        | 0          | query_cache_end_of_result | sql_cache.cc  |         735 |
| storing result in query cache  | 0.000047 | 0        | 0          | mysql_parse               | sql_parse.cc  |        6155 |
| freeing items                  | 0.000021 | 0        | 0          | dispatch_command          | sql_parse.cc  |        2146 |
| closing tables                 | 0.000014 | 0        | 0          | log_slow_statement        | sql_parse.cc  |        2204 |
| logging slow query             | 0.000011 | 0        | 0          | dispatch_command          | sql_parse.cc  |        2169 |
+--------------------------------+----------+----------+------------+---------------------------+---------------+-------------+
21 rows in set (0.00 sec)


mysql> show profile cpu,memory,source for query 2;
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
| Status                         | Duration  | CPU_user | CPU_system | Source_function           | Source_file   | Source_line |
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
| (initialization)               | 0.000021  | 0        | 0          | send_result_to_client     | sql_cache.cc  |        1143 |
| checking query cache for query | 0.000090  | 0        | 0          | open_tables               | sql_base.cc   |        2652 |
| Opening tables                 | 0.000022  | 0        | 0          | mysql_lock_tables         | lock.cc       |         153 |
| System lock                    | 0.000014  | 0        | 0          | mysql_lock_tables         | lock.cc       |         163 |
| Table lock                     | 0.000044  | 0        | 0          | mysql_select              | sql_select.cc |        2282 |
| init                           | 0.000049  | 0        | 0          | optimize                  | sql_select.cc |         765 |
| optimizing                     | 0.000028  | 0        | 0          | optimize                  | sql_select.cc |         924 |
| statistics                     | 0.000179  | 0        | 0          | optimize                  | sql_select.cc |         934 |
| preparing                      | 0.000029  | 0        | 0          | exec                      | sql_select.cc |        1603 |
| executing                      | 0.000016  | 0        | 0          | exec                      | sql_select.cc |        2123 |
| Sending data                   | 0.00229   | 0        | 0          | mysql_select              | sql_select.cc |        2327 |
| end                            | 0.000039  | 0        | 0          | mysql_execute_command     | sql_parse.cc  |        5154 |
| query end                      | 0.000012  | 0        | 0          | query_cache_end_of_result | sql_cache.cc  |         735 |
| storing result in query cache  | 0.000011  | 0        | 0          | mysql_parse               | sql_parse.cc  |        6155 |
| freeing items                  | 0.00002   | 0        | 0          | dispatch_command          | sql_parse.cc  |        2146 |
| closing tables                 | 0.000014  | 0        | 0          | log_slow_statement        | sql_parse.cc  |        2204 |
| logging slow query             | 0.00001   | 0        | 0          | dispatch_command          | sql_parse.cc  |        2169 |
+--------------------------------+-----------+----------+------------+---------------------------+---------------+-------------+
17 rows in set (0.00 sec)

Announcing "MySQL Essentials" Training

Are you having problems getting up to speed on MySQL? Are you asking yourself “Is there a hands-on training course we can send a developer/system admin to learn MySQL?”. In response, at 42SQL we have put together two new training courses, MySQL Essentials and MySQL Operations.

MySQL Essentials Training Details

With MySQL Essentials we tackle the core essentials that a developer/system admin/junior DBA would require in order to support an initial development environment that uses MySQL. Essentials training teaches the following skills:

  • Which version of MySQL to use (including the various different variants and patches available)
  • Backup, retention, and recovery strategies
  • Configuration and Monitoring of MySQL
  • Optimal schema and data objects configuration management
  • more information here

We are now accepting registrations for MySQL Essentials training being held on April 1st – 2nd in New York, and April 6th – 7 th in Washington DC.

About the presenter

Ronald Bradford is a two-decade veteran with extensive database experience in MySQL, Oracle and Ingres. His expertise covers data architecture, software development, migration, performance analysis and production system implementations. With ten years experience in MySQL, his involvement in the MySQL ecosystem has included working as Senior Consultant with MySQL Inc, speaker at four MySQL Conferences, and creator of the “MySQL for Oracle DBA’s” one-day workshop. Ronald holds MySQL Certifications including DBA 5.0, Developer 5.0 and MySQL Cluster 5.1.

The art of looking at the actual SQL statements

It’s a shame that MySQL does not provide better granularity when you want to look at all SQL statements being executed in a MySQL server. I canvas that you can with the general log, but the inherit starting/stopping problems in 5.0, improved in 5.1, but I would still like to see the option on a per connection basis, or even a time period. MySQL Proxy can provide a solution here but also with some caveats.

You should however in a NON production environment, take the time to enable the general log and look the SQL Statements. Prior to looking at the SQL, monitoring of the GLOBAL STATUS variables combined with Statpack revealed the following in a 1 minute interval.

====================================================================================================
                                         Statement Activity
====================================================================================================

                     SELECT:           16,042                   267.37                8,177,050 (46.03%)
                     INSERT:            5,838                    97.30                1,826,616 (10.28%)
                     UPDATE:            1,109                    18.48                  738,546 (4.16%)
                     DELETE:            2,018                    33.63                1,374,983 (7.74%)
                    REPLACE:                0                     0.00                        0 (0.00%)
          INSERT ... SELECT:                0                     0.00                       27 (0.00%)
         REPLACE ... SELECT:                0                     0.00                        0 (0.00%)
               Multi UPDATE:                0                     0.00                        0 (0.00%)
               Multi DELETE:                0                     0.00                        0 (0.00%)
                     COMMIT:            5,708                    95.13                2,161,232 (12.17%)
                   ROLLBACK:            5,746                    95.77                3,485,828 (19.62%)

If you notice the last 2 lines, some 19% of statements executed on the server are ROLLBACK. Further analysis of the schema shows mainly Innodb tables (good as COMMIT and ROLLBACK are supported), but also some MyISAM tables.

The following is a snippet from the general log.

                     23 Query       select 1
                     23 Query       INSERT INTO JMS_TRANSACTIONS (TXID) values(17719)
                     23 Query       UPDATE JMS_MESSAGES SET TXID=17719, TXOP='D' WHERE MESSAGEID=16248 AND DESTINATION='QUEUE.receivemail'
                     23 Query       commit
                     23 Query       rollback
                     23 Query       select 1
                     23 Query       DELETE FROM JMS_MESSAGES WHERE TXID=17719 AND TXOP='D'
                     23 Query       DELETE FROM JMS_TRANSACTIONS WHERE TXID = 17719
                     23 Query       commit
                     23 Query       rollback

This turns out to be most interesting. These tables are use by Java Messaging Service but I observed three points.

  • the ‘select 1′ is effectively a ping test to confirm the connection is still valid. MySQL provides a more lightweight COM_PING. It would be good to know if this environment using JBoss could support that.
  • There is a ‘ROLLBACK’ after every command, totally redundant, and most likely part of higher level framework.
  • The ‘COMMIT’ is used in conjunction with a number of statements, however when I mentioned earlier some tables were MyISAM, these were the JMS tables, so in this situation the commit is useless as this is not a transactional storage engine.

A number of decisions are needed to correct this problem, however the point of raising this is, always look at the your SQL.

Watching a slave catchup

This neat one line command can be of interest when you are rebuilding a MySQL slave and replication is currently catching up.

$ watch --interval=1 --differences 'mysql -uuser -ppassword -e "SHOW SLAVE STATUS\G"'

You will see the standard SHOW SLAVE STATUS output, but the watch command presents an updated view every second, and highlights differences. This can be useful in a background window to keep an eye on those ‘Seconds Behind Master’.

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.10.10.10
                Master_User: slave
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000626
        Read_Master_Log_Pos: 88159239
             Relay_Log_File: slave-relay.000005
              Relay_Log_Pos: 426677632
      Relay_Master_Log_File: mysql-bin.000621
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 426677495
            Relay_Log_Space: 2714497549
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 24131

Some Drupal observations

I had the opportunity to review a client’s production Drupal installation recently. This is a new site and traffic is just starting to pick up. Drupal is a popular LAMP stack open source CMS system using the MySQL Database.

Unfortunately I don’t always have the chance to focus on one product when consulting, sometimes the time can be minutes to a few hours. Some observations from looking at Drupal.

Disk footprint

Presently, volume and content is of a low volume, but expecting to ramp up. I do however find 90% of disk volume in one table called ‘watchdog';


+--------------+--------------+--------------+-------------+--------+
| table_schema | total_mb     | data_mb      | index_mb    | tables |
+--------------+--------------+--------------+-------------+--------+
| xxxxx        | 812.95555878 | 745.34520721 | 67.61035156 |    191 |
+--------------+--------------+--------------+-------------+--------+

+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| table_name                                | engine | row_format | table_rows | avg_row_length | total_mb     | data_mb      | index_mb    |
+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| watchdog                                  | MyISAM | Dynamic    |      63058 |            210 | 636.42242813 | 607.72516251 | 28.69726563 |
| cache_menu                                | MyISAM | Dynamic    |        145 |         124892 |  25.33553696 |  25.32577133 |  0.00976563 |
| search_index                              | MyISAM | Dynamic    |     472087 |             36 |  23.40134048 |  16.30759048 |  7.09375000 |
| comments                                  | MyISAM | Dynamic    |      98272 |            208 |  21.83272934 |  19.58272934 |  2.25000000 |

Investigating the content of the ‘watchdog’ table shows detailed logging. Drilling down just on the key ‘type’ records shows the following.

mysql> select message,count(*) from watchdog where type='page not found' group by message order by 2 desc limit 10;
+--------------------------------------+----------+
| message                              | count(*) |
+--------------------------------------+----------+
| content/images/loadingAnimation.gif  |    17198 |
| see/images/loadingAnimation.gif      |     6659 |
| images/loadingAnimation.gif          |     6068 |
| node/images/loadingAnimation.gif     |     2774 |
| favicon.ico                          |     1772 |
| sites/all/modules/coppa/coppa.js     |      564 |
| users/images/loadingAnimation.gif    |      365 |
| syndicate/google-analytics.com/ga.js |      295 |
| content/img_pos_funny_lowsrc.gif     |      230 |
| content/google-analytics.com/ga.js   |      208 |
+--------------------------------------+----------+
10 rows in set (2.42 sec)

Some 25% of rows is just the reporting one missing file. Correcting this one file cuts down a pile of unnecessary logging.

Repeating Queries

Looking at just 1 random second of SQL logging shows 1200+ SELECT statements.
355 are SELECT changed FROM node

$ grep would_you_rather drupal.1second.log
              7 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              5 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              3 Query       SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1
              4 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              6 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
             10 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              9 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              8 Query       SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
              9 Query       SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1

There is plenty of information regarding monitoring the Slow Queries in MySQL, but I have also promoted that’s it not the slow queries that ultimately slow a system down, but the 1000’s of repeating fast queries.

MySQL of course has the Query Cache to assist, but this is a course grade solution, and a high volume read/write environment this is meaningless.

There is a clear need for either a application level caching, or a database redesign to pull rather then poll this information, however without more in depth review of Drupal I can not make any judgment calls.

Best Practices in Migrating to MySQL

This week I was the invited speaker to give a 4 hr presentation to the Federal Government Sector in Washington DC on “Best Practices in Migrating to MySQL“. This was a followup to my day long “MySQL for the Oracle DBA Bootcamp” which I presented in Washington DC last year. It was good to see a number of attendees from my first DC presentation.

There was good attendance across various government departments and companies providing services to the government sector, as well a variety of job descriptions.

Thanks to Carahsoft and Sun/MySQL for organizing and sponsoring the event. Thanks also to Phil Hildebrand who provided fantastic support during my preparation answering all my SQL Server questions.

Thanks also to Baron Schwartz creator of Maatkit who as my invited guest was nice enough to table a list of attendee questions, which is always a good reference for revising slides and writing more blog posts.

You can find the first of seven sessions online in my presentations section.

Updated
Thanks to Baron Schwartz for his follow-up blog posts Migrating US Government applications from Oracle to MySQL and 50 things to know before migrating Oracle to MySQL.

Strict mode can still throw warnings

MySQL by default is vary lax with data validation. Silent conversions is a concept that is not a common practice in other databases. In MySQL, instead of throwing an error, a warning was thrown and many applications simply did not handle warnings. With the introduction of sql_mode=STRICT_ALL_TABLES (or TRADITIONAL), in MySQL 5, a better level of validation now exists.

My understanding was that Warnings are now thrown as Errors, therefore eliminating the need to do a SHOW WARNINGS to confirm any problems after every query (this is a performance overhead on a high volume system due to the round trip latency).

However I found an instance where MySQL in STRICT Mode still throws warnings, leading to the question, are there any other areas, and does the earlier statement “Warnings are now thrown as Errors” hold true.

Here is my seeding process to showing the problem.

mysql> create table i(i tinyint, unique key( i));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into i values(999);
Query OK, 1 rows affected (0.00 sec)

Using default settings, attempting to INSERT a duplicate row throws an error, using INSERT IGNORE does not.

mysql> insert into i values(999);
ERROR 1062 (23000): Duplicate entry '127' for key 'i'
mysql> insert ignore into i values(999);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

When using a Strict Mode, a recommendation for all new systems, it is generally accepted that warnings are translated into errors, which implies your could should never have to consider checking for warnings.

mysql> truncate table i;
mysql> set sql_mode=strict_all_tables;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into i values(999);
ERROR 1264 (22003): Out of range value for column 'i' at row 1
mysql> insert ignore into i values(999);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)
mysql> set sql_mode=traditional;
Query OK, 0 rows affected (0.00 sec)

mysql> insert ignore into i values(9990);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1264 | Out of range value for column 'i' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

I should caveat this post also with using caution with INSERT IGNORE. This should only be used if you never care about errors which I would never consider as a best practice design approach.

Reducing the MySQL 5.1.30 disk footprint

The current size of a MySQL 5.1.30 installation is around 420M.

$ du -sh .
426M	.

A further breakdown.

$ du -sh *
213M	bin
20K	COPYING
9.8M	docs
8.0K	EXCEPTIONS-CLIENT
436K	include
12K	INSTALL-BINARY
121M	lib
504K	man
4.0K	my.cnf
77M	mysql-test
4.0K	README
20K	scripts
2.3M	share
2.9M	sql-bench
100K	support-files

A means to reduce the footprint by 25% is to delete some unused stuff.

$ rm -rf docs/ mysql-test/ sql-bench/
$ du -sh .
337M	.

It’s no big deal, however it certainly does cut down on verbose output in the backup logs removing the mysql-test directory and files.

Best practices for migrating applications to MySQL

In just over 2 weeks I’ll be the invited speaker in Washington DC to Best practices for migrating applications to MySQL. This workshop is being held in conjunction with Carahsoft and Sun/MySQL and aims to provide to the Federal sector valuable information for the continued usage and uptake of Open Source and specifically MySQL.

As part of my preparation I’m happy to hear from any organizations that have successfully migrated from Oracle/SQL Server/Informix/Sybase etc to MySQL and would like to be cited.

While I have been involved in the process I am also happy to hear of reasons why a migration failed, was aborted or postponed. This is all valuable information in determining what are the most ideal applications.

Extending the MySQL Data Landscape

Learn how to extend your existing MySQL based website to leverage the power of MySQL variants, AWS cloud based MySQL deployments and RDBMS alternatives. Evaluate how to integrate and use these different various technologies such as MySQL based variations KickFire, a column based optimization and InfoBright, a data warehousing solution. Understand the means of approach towards data synchronization between various database solutions in your business.

At the MySQL Meetup in New York this month, I spoke on “Extending the MySQL Data Landscape“. A MySQL centric view on an earlier work, “The Data Landscape” which I presented at a recent GoDaddy Tech Day.

,